Automating Nagios Device Creation With Device42 Webhooks & Logstash

Many of you are very familiar with Device42’s powerful autodiscovery. It’s arguably one of the most important features of Device42. Similarly, many of you already know that autodiscovery eliminates a lot of error prone manual labor, and is rather essential to get initial documentation squared away for large IT deployments, and to keep up with large changes alike.

Today we’ll be demonstrating a use case that uses discovery events to drive other automation, namely, device creation in the Nagios monitoring system!

Driving Automation with Device42 Webhooks

Automating other functions based on discovery events in Device42 is the next logical step, and the following is a real-life example that one of our customers kindly shared with us! In this case, the Device42 “Device added / removed” event has been configured to fire a webhook, and upon receipt of this webhook, the new device is parsed by a logstash filter, and automatically added to (or optionally, removed from) the Nagios monitoring system!

To accomplish this, the process leverages Device42’s webhook functionality pointed towards a custom Logstash filter, and that filter parses the incoming discovery events from Device42, indexing them by hostname.

Scripts then determine whether the event was a device addition or deletion from Device42, and execute the appropriate action. After we discuss how the scripts accomplish this, we’ll discuss some alternate approaches to automating device deletion (which you may or may not want to do), too.

The entire process is broken down into two parallel processes that run independently. Namely, webhooks are sent and received as one part of the process, while crontab jobs that run every minute drive the rest. We’ll start with pre-requisites, then a high level overview of the process. Then, we’ll dig deeper.

Prerequisites:

  • A functional Logstash instance needs to be accessible on your network. Logstash will utilize the HTTP input plugin on an open port, so ensure that the port you’ll be using is open.
  • Nagios is expected to be preconfigured on your network and reachable as well.  
  • In Device42:
    • A webhook endpoint endpoint pointing towards your Logstash input location ( http://ip_of_logstash:port/ )
    • A webhook action for this endpoint that triggers upon device addition and device deletion events.  

A high-level view of the automation – two parts running in parallel:

  1. Webhooks:  Each webhook event emitted by Device42 will be parsed by Logstash, creating a .log file in the “incoming” folder.
  2. Crontab: kicks off the script [called rotate-upload.sh] that:
    1. rotates logs through three folders [ incoming / process / archive]
    2. handles the actual log file processing
    3. …and makes the Nagios API call that actually creates a device

A bit more background information

We’ll do this by examining the logic behind both of the scripts, verbally and pictorially:

The Webhooks: 

  1. Device42 fires a webhook event to the Logstash endpoint when a device is added or deleted.
  2. Logstash accepts the incoming webhook from Device42 as configured in logstash.conf. Each time a webhook is received, Logstash saves the incoming event data to a file named device.log → “INCOMING” folder 

Crontab (called every minute):

Crontab calls scripts that handle the processing, and eventual insertion into Nagios. The scripts’ functions are explored below, but basically both rotate and interact with the log files stored in one of three directories: incoming, process, and archive.

The “incoming” directory contains all the device addition/deletion events emitted by Device42 and processed by Logstash.  The “process” directory is where the “upload.sh” script will look for events to update Nagios by.  Lastly, the “archive” directory will contain all the events processed by this process.  

Exploring the Two Processes in Detail

The Logstash Filter (webhook endpoint):

  1. Parses the webhook for the device hostname and action flag – depending on weather Device42 just added the device or removed it [I (‘inserted’) or D (‘deleted’)], and passes that data to Elasticsearch for logging
  2. Outputs the hostname to stdout for Nagios

Rotate-upload.sh simply calls the following two scripts:

Rotate.sh: The “rotate” script [rotate.sh] moves log data along throughout three directories: ( archive / incoming / process )

  1. A file from the “Process” folder is moved to the “archive folder”, renamed to include a timestamp.

  2. New incoming data is moved to “process” still called device.log

Upload.sh: Processing of the log data itself, eventually inserting each new device into Nagios by:

  • Upload.sh looks for and reads a file it called [device.log] in folder “Process”

  1. If the device.log exists, it looks for the flag included with the webhook to decide if the device was “I” (newly added, or ‘D’: deleted). If the flag is “I” , the script will then:\
  • Retrieving the OS details of the new machine via a D42 API call
  1. Sets a variable to classify the OS for the Nagios insertion string “$HOSTGROUPS”
  2. Retrieves the device’s IP from D42 via a second API call [API = $IPADDR]

  3. Assembles a URL from the Nagios endpoint URL + Device Hostname + Device IP, and inserts it all into Nagios using CURL via an HTTP POST to the assembled URL:

msg=`curl -XPOST “${URLnag}” -d “${OPTIONS}”`; echo ${msg} >> ${MSGS}

Device42 Can Drive Automated Deletion from Nagios, too.

Automated deletion is an optional step, and can be accomplished in more than one way. Take a look at the upload.sh file excerpt below for an example.

The two if statements are already checking for the “I” or “D” status:


If the status of “D” is found, the script is already issuing a call to delete the device from Nagios, and there’s already a variable there with the hostname {$HOST_NAME} — and, if you look towards the top of the script, the API endpoint information was already defined in another variable called “{$URLd42}”.

There is another, optional way to accomplish the same task:

  1. Simply look for webhook w/ status “d” and call it with the hostname variable → delete-device42.sh
    1. Call this when a webhook is received
    2. or, as is currently being done in upload.sh following this line:

[if (echo $line | grep -o ‘”action”: “D”‘); then]

Final Thoughts and Possible Improvements

Pretty cool, right? It’s not the most graceful implementation ever, but we absolutely love the functionality! The idea of automatic documentation and automatic addition to monitoring is simply awesome, and of course we really enjoy seeing people use Device42 webhooks to drive their automation processes!

You may have also taken notice that with this workflow, it would appear only one webhook is processed per minute (per cron execution), and you’d be correct!

If you are adding and receiving devices more frequently than once per minute, the processing script will have to be adapted to use loops, iterating through all the files in the folder. However, if you only make changes a few at a time, or in bursts, the crontab will chew through them one per minute, so 10 machines would be added in 10 executions — Which just so happens to be 10 minutes time.

It’s also possible and potentially valuable to bring an integration platform like Stackstorm into the mix. That way, one could leverage scripts written in higher level languages like Python or Ruby, or integrate with any other tool.

What Would You Do Differently?

We would love to hear your thoughts and observations, and especially suggestions for improvements or to expand it! What would you do differently? How would you handle file rotation, or processing without relying on timed events? Please leave comments and feedback below, or if you have a question and need to talk to an engineer, email support@device42.com.  

And of course, if you aren’t already using Device42, download a free 30-day trial now, and get started with your own automation projects!