CMDB Application Mapping: Techniques & Best Practices
CMDB application mapping is the process of associating software and hardware assets, also known as configuration items (CI), stored in a configuration management database (CMDB) with the applications running in that environment. The diagram below is a simple example of a custom business application running on a virtual machine supported by an NGINX web server farm running on Kubernetes and a MySQL database installed on a dedicated server.
A diagram of an IT environment that includes NGINX, Kubernetes, MySQL, and other configuration items (CI)
CMDB application mapping and CMDB discovery help enterprises automate labor-intensive CMDB implementation tasks and streamline dependency tracking across IT infrastructure.
The most common approach requires two steps. The first step involves discovering configuration items, such as instances of operating systems, applications, and middleware components, and the second step entails uncovering relationships between the IP endpoints using techniques such as monitoring TCP and UDP communication flows.
An advanced CMDB application mapping tool includes an asset discovery functionality and a dependency mapping component. Once such a tool discovers the applications and establishes the relationships, it stores the data in a CMDB as attributes associated with configuration items. Users access this information to visualize application maps on dashboards or import the attribute information via APIs to third-party tools supporting IT practices such as vulnerability assessment and configuration management.
An overview of the CMDB application mapping proces
This article will explain how application mapping works, list the most common CMDB application mapping techniques, discuss their value in ITIL processes, and share several industry best practices for tackling a new CMDB implementation project.
CMDB application mapping explained
CMDB application mapping techniques fit into three general categories:
- Configuration-based. This category uses agentless or agent-based techniques using commands or parsing configuration files to gather insights based on configuration parameters.
- Traffic-based. This category relies on standards like Netflow, tools like NMAP, and techniques like packet capture. Packet capture is functionality implemented in hardware or software appliances designed to spoof in-flight network traffic, usually IP packets, and analyze them.
- Tag-based. This category uses tags to label assets. Even though tags often aren’t consistently applied to every CI, they are still essential to the CMDB application mapping process and are a popular method for mapping public cloud assets.
Application mapping may seem complex at first. However, the process becomes more intuitive once broken down into each technique’s basic mechanisms.
Below, we use the packet capture approach to application mapping as an example and explain the steps involved in analyzing network traffic to define an application map.
Packet capture example
The diagram below shows two hosts communicating over a network. The application mapping functionality can identify this traffic flow by relying on a hardware appliance placed in the network to capture and analyze the content of IP packets transferred over that network.
The captured packet reveals the source and destination IP addresses used by the two traffic endpoints. In our example, the source IP address is 220.127.116.11, and the destination IP address is 18.104.22.168.
A diagram illustrating a network traffic flow between two hosts and associated packet capture.
The dependency mapping program searches the CMDB database to identify the two corresponding CIs based on their IP address attributes. Once it finds the IP addresses in the CMDB, it will identify the two endpoint hostnames as “abc” and “xyz”, as illustrated in the diagram above.
The packet capture also identifies the TCP port used for this transfer as port number 3306, the default port for MySQL. With that in mind, one end of this traffic flow is likely a MySQL database instance.
The application mapping program also uses a configuration-based discovery technique with agents installed on the hosts to gather complementary insights. In our example, the agent used the command
ps aux | grep mysql to determine that MySQL runs on host xyz. In this case, capturing packets worked with agents installed on the hosts to discover additional information and confirm that MySQL is running on one of the endpoints.
The other endpoint of the traffic flow is the host named “abc” which also relies on an agent to execute the command
systemctl status nginx to verify that an NGINX web server is installed on it and running.
The CMDB application mapping function can now illustrate the relationships between the two endpoints as shown below by triangulating the attribute information gathered from the packet capture appliance and the installed agents.
A basic CMDB application mapping of two hosts (ABC and XYZ) and the applications installed on them (NGINX and MySQL).
This diagram doesn’t reveal a great deal of information yet. However, it’s the beginning of a CMDB application map that can expand to automatically uncover hundreds of other similar relationships using CMDB application mapping techniques and CI attributes from a CMDB.
This example discovered two hosts communicating over a network, but application mapping can also discover more permanent logical connections between two CIs using configuration-based (vs. traffic-based) techniques.
For example, consider a Linux server that uses a networked-mounted storage volume. While the storage volume is physically connected to a network-attached storage (NAS) system, it is logically mounted on the Linux server. The CMDB application mapping function can uncover this logical relationship by remotely connecting to the host via SSH and running the
$ sudo fdisk -l command.
In the next section will review several techniques for uncovering relationships between CIs.
Summary of CMDB application mapping techniques
As explained in our earlier example, the CMDB discovery process creates an inventory of configuration items with as many attributes as possible for each CI.
The application mapping function uses various techniques to find evidence that configuration items are either physically or logically connected or communicating with each other over a network.
CMDB Application Mapping Techniques
|Agentless||Configuration-based||Remotely connects to hosts and runs commands to discover its configuration|
|Agent-based||Configuration-based||Installs a light program on the host to collect information not remotely available|
|Application Programming Interfaces (API)||Configuration-based||Most applications and platforms have APIs that can be queried for relationship information|
|NetFlow||Traffic-based||A protocol designed by Cisco in the 1990s to record and store network flow records|
|NMAP||Traffic-based||An open-source utility used for security audits that scans TCP ports on hosts|
|Packet Capture||Traffic-based||A software or hardware appliance that catches and analyzes the content of IP packets|
|Tagging||Tag-based||CIs hosted in public cloud such as AWS are often appended with tags that can be used for application mapping|
CMDB application mapping techniques explained
IT organizations don’t always use all of these CMDB application mapping techniques. However, that’s typically due to time and resource limitations. The techniques are not mutually exclusive. In fact, they are often complementary and best deployed in conjunction with one another.
Using multiple techniques has the following benefits:
- Completeness of information. Discovery techniques can complement each other in gathering dependency relationship information. For example, agentless operating system scans may miss intermittent connections that NetFlow would capture as communication packets flowing between IP endpoints. Another example is how agentless and agent-based discovery approaches complement each other. An agentless discovery rapidly identifies all of the instances connected to a network. Agents can be used to gather information from laptops and tablets that may be off during agentless scans.
- Accuracy of information. Certain techniques infer information based on standard assumptions. For example, the NMAP utility scans the open TCP ports on a host to determine the likely applications installed on that host. By looking up a list of over 2,000 TCP and UDP ports and the applications that typically communicate via those ports. For example, it will assume that TCP port 25 is usually associated with an SMTP mail server. However, the systems administrators may have configured a different application to use port 25. It’s better to confirm this assumption using a second technique, such as an agent installed on the host, to execute the command
$ lsof -i :25and list the application(s) running on port 25. Once again, one technique triggers a second. This time the second technique validates the original assumptions.
The table below highlights the pros and cons of each technique. Remember these CMDB application mapping techniques are not mutually exclusive.
Pros and Cons of CMDB Application Mapping Techniques
|Agentless||This technique involves remotely accessing a host via a secure connection (SSH for Linux and Powershell Web Access for Windows) to execute commands meant to gather information and running processes (or services) and open TCP/UDP port information.||It saves time by avoiding the overhead of installing agents on all of the hosts in a large IT environment.
It would discover IP endpoints where no agent was deployed either due to a mistake or a policy.
|Not all configuration items are remotely accessible.
Some administrations disable remote access for security reasons.
|Agent-based||A lightweight program is installed on hosts as part of this approach providing access to a wide variety of information available on the hosts, such as the content of configuration files or the results of executing a customized command.||Agents can gather information from laptops and tablets that are often turned off.
They can gather information from servers that may be disconnected for security reasons.
|In an environment with thousands of hosts, installing agents may take months of testing and deployment.
Agents rarely have 100% coverage.
Agents must support different types of operating systems in a legacy application environment.
|Application Programming Interfaces (API)||Infrastructure platforms such as Kubernetes, VMWare, or NetApp and applications such as SAP and Oracle provide APIs that can be queried remotely to gather information about inventory and configuration.||APIs provide the most detailed information about an environment.||Using APIs would require software development unless the chosen CMDB platform supports integration with leading applications and infrastructure platforms.|
|NetFlow||NetFlow and its latest version known as Internet Protocol Flow Information eXport (IPFIX) capture packets traveling across the network to extract source and destination IP addresses and TCP or UDP port numbers among other pieces of information.||The source and destination TCP or UDP port number is a great source of information for discovering the corresponding applications at the endpoints of the traffic flow.||NetFlow has a high overhead and is typically used at a low sampling rate (e.g., capturing one packet out of every 1000) which can miss short-lived communication flows.|
|NMAP||Since its launch in 1997, NMAP has been at its core, a tool for scanning TCP or UDP ports on servers and firewalls. The open ports are mapped to application names using a list of over 2,000 applications and corresponding ports.||It’s a free and stable tool that can quickly gather information highly relevant to CMDB application mapping.||It infers application names by associating TCP or UDP port numbers with applications. It may provide incorrect application names in environments where administrations have reconfigured the port numbers, so its output should be validated with other techniques.
NMAP scans can take a long time and be CPU-intensive as it scans over 1,000 ports.
|Packet Capture||Hardware and software appliances capture IP packets as they travel across a network to extract helpful information about traffic flows.||Packet capture appliances provide more information than NetFlow because they analyze the entire packet, including its payload and not only the headers of the packets.||Implementing appliances to capture packets is a project of significant size if they haven’t been deployed before a CMDB project.|
|Tagging||Most public clouds encourage admins to tag their assets which is aligned with modern continuous code delivery (CI/CD) and infrastructure as code (Iac) models||Tags are free-form key-value pairs that can contain CI information such as a CI’s affiliation with departments, teams, and applications||The flexibility of tags is also a weakness when they are usually not consistently used by various teams.
Tags are usually appended manually and may have typos or be missing.
Use cases for CMDB application mapping
Application mapping greatly enhances a CMDB’s value by quantifying the hardware, software, and connections in an IT environment. The table below lists some of the use cases that benefit from application mapping.
CMDB Application Mapping Use Cases
|Datacenter or cloud migrations||Move groups are clusters of inter-dependent CIs that should be migrated as a group to avoid an outage.|
|Application modernization||Refactoring an application starts by discovering all of its dependencies.|
|IT automation||IT automation such as an automated database upgrade can benefit from identifying the applications that depend on a database and must be tested before the upgrade.|
|Security and regulatory audits||Audits are typically performed by application, making a dependency map critical for verifying the configuration of an application’s supporting components.|
|Finance and budgeting||Financial managers allocate a budget to a department based on their application needs, so allocating the hardware and software costs to applications is important for the budgeting process.|
|Mergers and acquisitions||M&A transactions result in application migration and rationalization, which require application maps.|
|Performance and security incident response||The process of fault isolation, root cause analysis, and impact analysis all depend on up-to-date application maps to help with troubleshooting.|
|Capacity planning||IT departments allocate more hardware capacity to mission-critical applications, which means they must be able to identify the bottlenecks affecting them.|
Best practices for CMDB application mapping
A CMDB project’s success depends on its implementation strategy and the features of the CMDB platform. Here are a few tips related to each of the categories.
Adopt a top-down approach.
IT enterprise environments have thousands if not millions of configuration items. Mapping all of them would take months, so it’s best practice to identify the company’s mission-critical applications with the highest economic impact and start mapping them first.
Use dynamic, not static, mapping.
Avoid manual mapping because application environments constantly change. It’s worth investing time upfront to implement a dynamic mapping technique that would reap benefits for years.
Prioritize the use cases.
A CMDB supports many use cases ranging from data center migration and security management to regulatory compliance and financial audits. Starting with the use case most important to senior management helps prioritize the application mapping techniques. For example, security management would emphasize port scans, while data center migration would require detailed analysis of communication flows to avoid outages due to network misconfigurations.
Choose a feature-rich CMDB platform.
A CMDB project’s success mostly depends on the automated discovery functionality provided by the vendor chosen for the implementation. The chosen platform must support fully automated CMDB discovery and application mapping techniques for all flavors of operating systems, device types, and public cloud services.
CMDB application mapping functionality discovers the applications running on hosts, the logical and physical connections between hardware and software components, and the communication flow between configuration items (CIs).
It triangulates this information with the CI attributes already stored in a CMDB to create dependency maps that can be queried by other tools or used for visualization and troubleshooting purposes. The accuracy of application maps depends on the automation features and range of platforms supported by the CMDB tool chosen for the project.