Collection, processing and forwarding of log data from various sources using Fluent Bit and Fluentd


Estafet consultants occasionally produce short, practical tech notes designed to help the broader software development community and colleagues.
 

If you would like to have a more detailed discussion about any of these areas and/or how Estafet can help your organisation and teams with best practices in improving your SDLC, you are very welcome to contact us at enquiries@estafet.com

Introduction

Fluent Bit and Fluentd are two different software solutions that serve to collect logs and metrics from multiple sources, enrich them with filters and redirect data to any defined destination. They have a fully event driven design, leveraging the operating system API for performance and reliability. All operations to collect and deliver data are asynchronous. They both are:

  • Licensed under the terms of Apache License v2.0
  • Graduated Hosted projects by the Cloud Native Computing Foundation (CNCF)
  • Production Grade solutions: deployed million of times every single day.
  • Vendor neutral and community driven projects
  • Widely Adopted by the Industry: trusted by all major companies like AWS, Microsoft, Google Cloud and hundreds of others.

Both projects share a lot of similarities, Fluent Bit is fully designed and built on top of the best ideas of Fluentd architecture and general design. Choosing which one to use depends on the end-user needs.

Fluentd

Fluentd is an open source data collector, which lets you unify the collection and consumption for a better use and understanding of data.

This is a robust and widely used data centralization platform. Fluentd is built to collect, process and send large volumes of data to various centralised systems such as Elasticsearch, MongoDB, Amazon S3 and others. It is built to work with a variety of data sources and has a large number of plugins available to extend its functionality.

It has more than 500 plugins to connect various data sources and outputs. At the same time core functionality is pretty simple.

Fluentd relies on JSON to structure data and has pluggable architecture

This allows the community to extend its functionality.

It uses buffering (both in memory and file based) to prevent internode data loss. 

Fluent Bit

This is a lighter and more compact version of Fluentd, with a focus on efficiency and scalability. Fluent Bit is targeted for use in resource-constrained environments such as IoT devices or microservice architectures. It provides basic functionalities for data collection, transformation and sending to centralised systems, but with lower resource consumption compared to Fluentd. It’s compatible with most of x86, x86_64, arm32v7, arm64v8 based platforms.

In recent years, Cloud Providers switched from Fluentd to Fluent Bit for performance and compatibility reasons. Fluent Bit is now considered the next generation solution.

Installation 

Detailed installation instructions for various platforms are provided on the site including Docker, Kubernetes, AWS containers and many more.

Configuration

One of the ways to configure Fluent Bit is using a main configuration file. Fluent Bit allows one configuration file which works at a global scope and uses the Format and Schema defined previously.

The schema is defined by three concepts:

  • Sections – A section is defined by a name or title inside brackets. It cannot be empty.
  • Entries: Key/Value –  The key is Log_Level with the value debug. Multiple keys with the same name can exist. 
  • Indented Configuration Mode – Fluent Bit configuration files are based in a strict Indented Mode, that means that each configuration file must follow the same pattern of alignment from left to right when writing text. By default an indentation level of four spaces from left to right is suggested. 

The main configuration file supports four types of sections:

  • Service – The Service section defines global properties of the service
  • Input – An INPUT section defines a source (related to an input plugin), here we will describe the base configuration for each INPUT section. Note that each input plugin may add it own configuration keys

  • Filter – A FILTER section defines a filter (related to a filter plugin), here we will describe the base configuration for each FILTER section. Note that each filter plugin may add it own configuration keys

  • Output – The OUTPUT section specifies a destination that certain records should follow after a Tag match. Currently, Fluent Bit can route up to 256 OUTPUT plugins.

It’s possible to split the main configuration file (to avoid long configurations) in multiple files using the feature to include external files:

  • Include File – Starting from Fluent Bit 0.12 the new configuration command @INCLUDE has been added and can be used in the following way:

@INCLUDE input_*.conf

The @INCLUDE command only works at top-left level of the configuration line, it cannot be used inside sections. Wildcard(*) is supported as shown to include multiple files.

How it works

I can illustrate stages as a sequence where logs flow through.

Input plugins are how logs are read or accepted into Fluent Bit. Common examples are syslog or tail. Syslog listens on a port for syslog messages, and tail follows a log file and forwards logs as they are added. The in_tail input plugin allows you to read from a text log file as though you were running the tail -f command

In this example we parse Apache error log file with predefined parser apache2. It’s recommend to use the DB option to keep track of what you have monitored, and to set the Path_Key so that an attribute is populated in the output that will help you differentiate the file source of the logs you aggregate.

Http plugin captures logs from a REST endpoints

Parsers are how unstructured logs are organized or how JSON logs can be transformed. There are a number of existing parsers most of which are done using regex. There is also an option to use Lua for parsing and filtering.

Filter plugins transform the data generated by the input plugins. 

  • grep: matches or excludes log records, similar to the grep command.
  • modify: changes log records based on specified conditions or rules.

A list of available filter plugins can be found here.

Routing is a core feature that allows you to route your data through Filters and finally to one or multiple destinations. The Tag value you set on an input is used to fine tune the matching data.

Output – Once you have collected and filtered data , you will want to send it somewhere. That is what the output plugins are used for. 

This example matches on everything so it’s a good idea to use optional API_KEY and maxBufferSize and maxRecords. This is also an example where references to environment variables are used in the configuration.

Example 2: 

This configuration instructs Fluent Bit to flush every 24 seconds, on foreground and capture debug log level.

The [INPUT] uses the tail plugin to read logs from the specified file at /var/log/logify/app.log. 

The [OUTPUT] component uses the stdout plugin to forward logs to the console. The Match parameter ensures only logs with thefilelogs tag (from the input) are delivered to the console.

If we run the fluent Bit with this configuration,  the output should be something like 

If we are using local installation on linux this can be done with

With option -c we can provide a different configuration.

Transforming the logs

When collecting logs, it is often necessary to enhance them. Fluent Bit provides a powerful filter plugins designed to transform event streams effectively. In this section, I will show you various essential log transformation tasks:

  • Parsing JSON logs.
  • Removing unwanted fields(GDPR, sensitive data for example).
  • Adding new fields.

To do that we have to define a new file /etc/fluent-bit/parser_json.conf

And update our fluent-bit.conf to the content

The Parsers_File parameter references the parser_json.conf file, which defines the json_parser for parsing JSON logs.

In the [INPUT] component, I have the Parser parameter with the value json_parser. This specifies that the incoming logs should be parsed using the JSON parser defined in parser_json.conf.

In the [OUTPUT] section, I set the format parameter to json, ensuring that the logs forwarded to the output are in the JSON format.

In the [FILTER] section, the name parameter denotes that the record_modifier plugin is being used. To exclude the emailAddress field, I’ve used the Remove_key parameter. The Record parameter also introduces a new field called hostname, which is automatically populated with the system’s hostname information.

When Fluent Bit runs, you will observe the log events without the emailAddress field, and the hostname field will be incorporated into the log events. And now You can now observe that the logs are formatted in the JSON format:

Similar applications

There are several software solutions designed for collecting, processing, and directing logs from various sources to different destinations. Here are some notable ones:

  • Logstash: Part of the Elastic Stack, Logstash is a versatile log collection pipeline tool. It can collect logs from multiple sources, transform them, and then send the data to various destinations such as Elasticsearch, databases, or other analytics tools.
  • Apache Kafka: Kafka is more of a distributed streaming platform than just a log collector. It allows you to build real-time data pipelines and streaming applications. Kafka can handle high volumes of log data and is often used as a central bus to move data between systems.
  • Rsyslog: Rsyslog is a high-performance, feature-rich logging system that can receive log messages from various sources like network devices, Unix-based systems, applications, and more. It can process logs and direct them to multiple outputs based on configurable rules.
  • Splunk: Splunk is an analytics and monitoring tool that can collect and index log and machine data from various sources. It offers powerful search, visualization, and alerting capabilities to help users analyze and make sense of the collected data.

These solutions vary in terms of their capabilities, scalability, and ease of use. The choice of the appropriate solution often depends on the specific requirements, scale, and complexity of the logging infrastructure within an organization.

Conclusion

Both Fluentd and Fluent Bit can work as Aggregators or Forwarders, they both can complement each other or use them as standalone solutions. 

Fluent Bit and Fluentd can be classified according to several main characteristics:

  • Functionality and flexibility: Fluentd offers a wider range of features and plugins, while Fluent Bit is aimed at more basic functionality with fewer extensions.
  • Resources and performance: Fluent Bit is lighter and consumes fewer resources than Fluentd, making it more suitable for more constrained environments or situations where a low level of system resources is needed.
  • Speed and Scalability: Fluent Bit is often chosen for its speed and ability to work efficiently with large volumes of data or in situations where scalability is important.

In general, Fluentd is used for broader scenarios with a lot of functionality, while Fluent Bit is aimed at lightness, speed and efficiency in limited environments.

By Delcho Delov, Consultant @Estafet

Stay Informed with Our Newsletter!

Get the latest news, exclusive articles, and updates delivered to your inbox.