Logstash

As mentioned in my previous post, Crash Course Into Open Source Monitoring - Part 3 (Centralized Logging), Logstash is a great open source system for aggregating/indexing log and event data. Although the idea is pretty simple, the actual overview of a logstash deployment is fairly complex due to all the various inputs and ouptuts, possible configurations, and use cases the system can cover.

Components

A centralized logstash deployment includes a few components:

  • Collector/indexer
  • Shipper
  • Web interface (Kibana)
  • Elasticsearch
  • Redis (optional)

Collector/Indexer

The collector/indexer is the central process of logstash, and it’s written in Java. It’s where all the data gets piped through, collected, processed, and then stored into Elasticsearch (for searching later). The configuration language is very powerful, and comes with many input/filter/output plugins. In the Example Topologies section, I’ll go over a few different deployments and tie together the plugins with their configurations.

Shipper

A logstash shipper is a process responsible for shipping logs. There are many implementations (including the one built into the logstash software package) that each address different needs such as:

  • Configurability
  • Low resource footprint
  • Encryption
  • Ease of deployment
  • Language support, e.g. if your team primarily uses Python, it might be a factor for choosing a Python based shipper.
  • Extensibility

Within the logstash software package, a shipper is the same executable as the collector/indexer, but configured with a different set of inputs/outputs. The one invariant is that the collector/indexer will always have Elasticsearch configured as an output, whereas the shipper does not always.

Web interface

Logstash comes with a great web interface called Kibana, written completely in JavaScript. It connects directly to the Elasticsearch server, and visually surfaces up some of the query capabilities, allowing you to search the logs stored by logstash.

Kibana web interface

Kibana web interface demo running at: http://demo.kibana.org/#/dashboard

Elasticsearch

Logstash is built on top of Elasticsearch and as a result, can leverage many of the tools and powerful searching functionality from it, as well as its horizontal scalability. Since you can search across multiple indices in Elasticsearch, logstash uses this functionality and creates a new index every day. This way, you can optimize it when it rolls over (since it becomes immutable at this point), remove it after a certain number of days, or archive it for long term storage.

Redis (optional)

In the Getting Started (Centralized) guide, Redis is recommended to be set up as a broker between the shipper and collector/indexer. A broker can be thought of as a FIFO buffer, such as the one between CPU and RAM. Ideally, the collector/indexer should be able to process the Redis queue as quickly as it gets filled by the shippers, i.e. the queue should usually be empty, but is useful when bursts of activity happen.

Example Topologies

Below are just a few topologies for logstash deployments. Given its configurability, extensive plugin list, and the ease at which custom plugins can be created, there are many ways to “wire-up” logstash.

Redis Broker Setup

Redis broker setup

In this setup, each of the application and database servers output their different logs directly into redis01. The collector/indexer then connects to this Redis server, ingests/processes the data, then outputs it to elasticsearch01.

Users then view this data through the Kibana UI, which is simply a JavaScript application that connects directly to Elasticsearch.

input {
  redis {
    host => "redis01"
    data_type => "list"
    key => "logstash"
  }
}
output {
  elasticsearch {
    host => "elasticsearch01"
  }
}

logstash-indexer.conf example

Redis Broker and Lumberjack Setup

Redis broker and Lumberjack setup

This is a similar setup to the previous except that logstash01 is configured with a new input type, lumberjack, that’s bound (arbitrarily) to port 12345. Other shippers are chosen when the Logstash built-in shipper isn’t a good fit.

lumberjack, for example, enforces SSL encryption, is a very light-weight daemon written in Go, runs with minimal resource usage, and deploys very easily. In this topology, Lumberjack’s encryption feature is very useful since it’s used to transmit sensitive security logs from the db servers to the indexer.

input {
  redis {
    host => "redis01"
    data_type => "list"
    key => "logstash"
  }
  lumberjack {
    port => 12345
    ssl_certificate => "/etc/ssl/lumberjack.cert"
    ssl_key => "/etc/ssl/lumberjack.key"
  }
}
output {
  elasticsearch {
    host => "elasticsearch01"
  }
}

logstash-indexer.conf example

Syslog-ng Setup

Syslog-ng Setup

For some companies, a Syslog based setup might already be in place, and aggregating logs to a central server. It is usually desired to maintain the current setup, at least for some time.

In this example, the syslog receiver syslog01, and the methods for shipping logs, remains unchanged. However, to leverage logstash in this environment, we can run the collector/indexer as a process on the same server and have it output to elasticsearch01. The configured input used here can simply be the file input, which is similar to tail in Linux. We’d then configure the filter section to the Syslog message format, as explained in this logstash cookbook: parsing syslog messages.

input {
  file {
    path => "/var/syslog/*.log"
  }
}
filter {
# see "parsing syslog messages" cookbook
}
output {
  elasticsearch {
    host => "elasticsearch01"
  }
}

logstash-indexer.conf example

Logstash as the Syslog Server

In the last setup, we maintained the existing syslog receiver and had the logstash indexer process the collected log files. However, doesn’t it seem Logstash has the same functionality as syslog-ng? In fact it suppports1 being a receiver by configuring it with the syslog input, thereby replacing the existing receiver completely. Just simply bind it to the syslog port (defaults to 514), and no further changes should be needed. The diagram looks similar to the previous setup, but with a slight change:

Logstash as the Syslog Server

input {
  syslog {
    port => 514
  }
}
output {
  elasticsearch {
    host => "elasticsearch01"
  }
}

logstash-indexer.conf example

Integration with Graphite and statsd

Logstash has a great tutorial titled Pull metrics from logs, which begins with the following:

Logs are more than just text. How many customers signed up today? How many HTTP errors happened this week? When was your last puppet run?

Apache logs give you the http response code and bytes sent - that’s useful in a graph. Metrics occur in logs so frequently there are piles of tools available to help process them.

Logstash can help (and even replace some tools you might already be using).

The tutorial explains how to configure logstash to process apache logs and emit counts of HTTP response codes to statsd using the plugin. Here’s what the topology could look like:

Integration with Graphite and statsd

Notice how logstash01 outputs directly to both statsd01 and graphite01. The tutorial outlines the reasons for outputting to statsd, so when would you want to output to graphite? Well it turns out there are a few reasons:

  • Since statsd is UDP based, there’s no guarantee on message delivery. In some cases, ensuring that a metric gets transmitted is needed so outputting straight to Graphite via TCP would be better.
  • Graphite has an input protocol called pickle which is much more efficient for sending batches of metrics. Depending on how data arrives into logstash and processed, this might be the best method to use.
  • Your infrastructure is not running statsd.

More Resources


  1. As mentioned in the logstash docs for syslog input: “This input only supports RFC3164 syslog with some small modifications. The date format is allowed to be RFC3164 style or ISO8601. Otherwise the rest of the RFC3164 must be obeyed. If you do not use RFC3164, do not use this input.”