Introduction
Traditionally, designing and implementing centralized logging is an after-thought. It is not until problems arise that priorities shift to a centralized logging solution to query, view, and analyze the logs so the root-cause of the problem can be found. However, in the container era, when designing a Containers-as-a-Service (CaaS) platform with Docker Enterprise Edition (Docker EE), it is critical to prioritize centralized logging. As the number of micro-services deployed in containers increases, the amount of data produced by them in the form of logs (or events) exponentially increases.
What You Will Learn
This reference architecture provides an overview of how Docker logging works, explains the two main categories of Docker logs, and then discusses Docker logging best practices.
Understanding Docker Logging
Before diving into design considerations, it's important to start with the basics of Docker logging.
Docker supports different logging drivers used to store and/or stream container stdout
and stderr
logs of the main container process (pid 1
). By default, Docker uses the json-file
logging driver, but it can be configured to use many other drivers by setting the value of log-driver
in /etc/docker/daemon.json
followed by restarting the Docker daemon to reload its configuration.
The logging driver settings apply to ALL containers launched after reconfiguring the daemon (restarting existing containers after reconfiguring the logging driver does not result in containers using the updated config). To override the default container logging driver run the container with --log-driver
and --log-opt
options. Swarm-mode services, on the other hand, can be updated to use a different logging driver on the go by using docker service update --log-driver <DRIVER_NAME> --log-opt <LIST OF OPTIONS> <SERVICE NAME>
.
What about Docker Engine logs? These logs are typically handled by the default system manager logger. Most of the modern distros (CentOS 7, RHEL 7, Ubuntu 16, etc.) use systemd
, which uses journald
for logging and journalctl
for accessing the logs. To access the Engine logs use journalctl -u docker.service
.
Docker Logs Categories and Sources
Now that the basics of Docker logging have been covered, this section explains their categories and sources.
Docker logs typically fall into one of two categories: Infrastructure Management or Application logs. Most logs naturally fall into these categories based on the roles of who needs access to the logs.
Operators are mostly concerned with the stability of the platform as well as the availability of the services.
Developers are more concerned with their application code and how their service is performing.
In order to have a self-service platform, both operators and developers should have access to the logs they need in order to perform their role. DevOps practices suggest that there is an overall, shared responsibility when it comes to service availability and performance. However, everyone shouldn't need access to every log on the platform. For instance, developers should only need access to the logs for their services and the integration points. Operators are more concerned with Docker daemon logs, UCP and DTR availability, as well as service availability. There is a bit of overlap since developers and operators both should be aware of service availability. Having access to the logs that each role needs allows for simpler troubleshooting when an issues occurs and a decreased Mean Time To Resolve (MTTR).
Infrastructure Management Logs
The infrastructure management logs include the logs of the Docker Engine, containers running UCP or DTR, and any containerized infrastructure services that are deployed (think containerized monitoring agents).
Docker Engine Logs
As previously mentioned, Docker Engine logs are captured by the OS's system manager by default. These logs can be sent to a centralized logging server.
UCP and DTR System Logs
UCP and DTR are deployed as Docker containers. All their logs are captured in the containers' STDOUT
/STDERR
. The default logging driver for Docker Engine captures these logs.
UCP can be configured to use remote syslog logging. This can be done post-installation from the UCP UI for all of its containers.
Note: It is recommended that the Docker Engine default logging driver be configured before installing UCP and DTR so that their logs are captured by the chosen logging driver. This is due to the inability to change a container's logging driver once it had been created. The only exception to this is
ucp-agent
, which is a component of UCP that gets deployed as a Swarm service.
Infrastructure Services
Infrastructure operation teams deploy containerized infrastructure services used for various infrastructure operations such as monitoring, auditing, reporting, config deployment, etc. These services also produce important logs that need to be captured. Typically, their logs are limited to the STDOUT
/STDERR
of their containers, so they are also captured by the Docker Engine default logging driver. If not, they need to be handled separately.
Application Logs
Application-produced logs can be a combination of custom application logs and the STDOUT
/STDERR
logs of the main process of the application. As described earlier, the STDOUT
/STDERR
logs of all containers are captured by the Docker Engine default logging driver. So, no need to do any custom configuration to capture them. If the application has custom logging ( e.g. writes logs to /var/log/myapp.log
within the container), it's important to take that into consideration.
Docker Logging Design Considerations
Understanding the types of Docker logs is important. It is also important to define which entities are best suited to consume and own them.
Categorizing the Docker Logs
Mainly, there are two categories: infrastructure logs and application logs.
Defining the Organizational Ownership
Based on the organization's structure and policies, decide if these categories have a direct mapping to existing teams. If they do not, then it is important to define the right organization or team responsible for these log categories:
Category | Team |
---|---|
System and Management Logs | Infrastructure Operations |
Application Logs | Application Operations |
If the organization is part of a larger organization, these categories may be too broad. Sub-divide them into more specific ownership teams:
Category | Team |
---|---|
Docker Engine Logs | Infrastructure Operations |
Infrastructure Services | Infrastructure Operations |
UCP and DTR Logs | UCP/DTR Operations |
Application A Logs | Application A Operations |
Application B Logs | Application B Operations |
Some organizations don't distinguish between infrastructure and application operations, so they might combine the two categories and have a single operations team own them.
Category | Team |
---|---|
System and Management Logs | Infrastructure Operations |
Application Logs | Infrastructure Operations |
Pick the right model to clearly define the appropriate ownership for each type of log, resulting in decreased mean time to resolve (MTTR). Once organizational ownership has been determined for the type of logs, it is time to start investigating the right logging solution for deployment.
Picking a Logging Infrastructure
Docker can easily integrate with existing logging tools and solutions. Most of the major logging utilities in the logging ecosystem have developed Docker logging or provided proper documentation to integrate with Docker.
Pick the logging solution that:
- Allows for the implementation of the organizational ownership model defined in the previous section. For example, some organizations may choose to send all logs to a single logging infrastructure and then provide the right level of access to the functional teams.
- The organization is most familiar with! This is a must!
- Has Docker integration: pre-configured dashboards, stable Docker plugin, proper documentation, etc.
Docker Enterprise Edition Logs
With Docker Enterprise Edition, it's a good idea to store all of the container logs for historical and system maintenance purposes. It is recommended that you collect the output of some of the containers in an indexable way, mostly for policy reasons and to quickly understand cluster events. In the following sections we'll break down some Docker EE components and certain logs that could be useful to an organization.
UCP
Container Name | Information in Logs |
---|---|
ucp-controller | UCP API logs, users, and logins |
ucp-auth-api | Centralized service for identity and authentication used by UCP and DTR |
ucp-auth-store | Stores authentication configurations as well as data for users, organizations, and teams |
ucp-auth-worker | Performs scheduled LDAP synchronizations and cleans authentication and authorization data |
ucp-client-root-ca | Certificate authority to sign client bundles |
ucp-cluster-root-ca | Certificate authority to sign client bundles |
ucp-metrics | Used for metrics gathering |
ucp-kv | etcd service used to store the UCP configurations |
ucp-proxy | TLS proxy that allows secure access from the local Docker Engine to UCP components |
ucp-swarm-manager | Manager container for Docker Swarm (classic), provides backwards compatibility |
Notes:
- ucp-controller - This container logs all login attempts and general usage of the cluster. For auditing purposes, this contains the most important logs in terms of logging usage of the cluster.
- ucp-kv - This container is good to monitor to make sure quorum is not lost on the cluster. If the quorum is lost, it is good practice to setup an alert.
DTR
Name | Information in Logs |
---|---|
dtr-api- |
Executes the DTR business logic and serves the DTR web application and API |
dtr-garant- |
Manages DTR authentication |
dtr-jobrunner- |
Runs cleanup jobs in the background |
dtr-nautilusstore- |
Stores security scanning data |
dtr-nginx- |
Receives HTTP and HTTPS requests and proxies them to other DTR components, by default listens to ports 80 and 443 of the host |
dtr-notary-server- |
Receives, validates, and serves content trust metadata, and is consulted when pushing or pulling to DTR with Content Trust enabled |
dtr-notary-signer- |
Performs server-side timestamp and snapshot signing for content trust metadata |
dtr-registry- |
Implements the functionality for pulling and pushing Docker images, also handles how images are stored |
dtr-rethinkdb- |
Stores persisting repository metadata in a database |
Some notable logs to possibly setup regular expressions on:
- dtr-registry-
- Parsing this container will show the client IP's, the user, and general usage of the cluster. - dtr-nginx-
- This container logs all pushes, pulls, and API calls to the cluster. - dtr-rethinkdb-
- The logs in this container contain information about the quorum state of RethinkDB. This is good to monitor and be alerted on any loss of quorum.
HTTP Routing Mesh
By default the HTTP Routing Mesh doesn't log any requests to STDOUT
. To log from HRM run the following command:
docker service update --env-add DEBUG=1 ucp-hrm
Logs will then output from the HTTP Routing Mesh. If a log driver is configured at the Engine level, the log output follows the configuration at the Engine level.
Application Log Drivers
Docker has several available logging drivers that can be used for the management of application logs. Check the Docker docs for the complete list as well as detailed information on how to use them.
As a general rule, if you already have logging infrastructure in place, then you should use the logging driver for that existing infrastructure. If you don't have an existing logging system in place, there's a few areas worth highlighting advantages and disadvantages.
Driver | Advantages | Disadvantages |
---|---|---|
none | Ultra-secure, since nothing gets logged | Much harder to troubleshoot issues with no logs |
json-file | The default, docker logs works with it, supports tags |
Logs reside locally and not aggregated, logs can fill up local disk if no restrictions in place |
syslog | Most machines come with syslog, only driver that supports TLS for encrypted log shipping, supports tags | Needs to be set up as highly available (HA) or else there can be issues on container start if it's not available |
journald | docker logs also works with this driver, since it logs locally - log aggregator can be down without impact, this also collects Docker daemon logs |
Since journal logs are in binary format, extra steps need to be taken to ship them off to the log collector, no tag support |
gelf | Provides indexable fields by defaults (container id, host, container name, etc.), tag support | Only supports the UDP protocol, no docker logs support |
fluentd | Provides container_name and container_id fields by default, fluentd supports multiple outputs, |
No docker logs support for local logs |
awslogs | Easy integration when using Amazon Web Services, less infrastructure to maintain, tag support | Not the most ideal for hybrid cloud configurations or on-premise installations, no docker logs support |
splunk | Easy integration with Splunk, TLS support, highly configurable, tag support, additional metrics | Splunk needs to be highly available or possible issues on container start, no docker logs support |
etwlogs | Common framework for logging on Windows, default indexable values | Only works on Windows, those logs have to be shipped from Windows machines to a log aggregator with a different utility |
gcplogs | Simple integration with Google Compute, less infrastructure to maintain, tag support | Not the most ideal for hybrid cloud configurations or on-premise installations, no docker logs support |
Application Log Driver Considerations
Consider the following when selecting application log drivers:
- If log data is highly sensitive, then syslog and splunk are good options since they can be configured to use TLS for transporting logs.
- The journald log driver is great for retaining the usage of
docker logs
as well as logging Docker daemon logs. This driver allows for easier troubleshooting and log portability at the same time. Another advantage of this driver is that logs will write first locally, so that there is less reliance on logging infrastructure. - If the Docker EE cluster exist solely on a single cloud provider, then awslogs or gcplogs can be used.
- If there's an existing Splunk installation, then use the splunk log driver.
- The gelf and fluentd log drivers are a good choice if there's a NoSQL database somewhere in the environment where the logs can be stored.
- For development or test environments, using json-file or journald could be useful where it's more useful to view a log stream rather than index and search the logs. (If json-file is used consider passing the
max-size
andmax-file
options so that logs won't fill up the filesystem.)
Logging Driver Setup Walkthrough
To implement system-wide logging, creating an entry in /etc/docker/daemon.json
. For example, use the following to enable the gelf
output plugin:
{
"log-driver": "gelf",
"log-opts": {
"gelf-address": "udp://1.2.3.4:12201",
"tag":"{{.ImageName}}/{{.Name}}/{{.ID}}"
}
}
And then restart the Docker daemon. All of the logging drivers can be configured in a similar way, by using the /etc/docker/daemon.json
file. In the previous example using the gelf
log driver, the tag
field sets additional data that can be searched and indexed when logs are collected. Please refer to the documentation for each of the logging drivers to see what additional fields can be set from the log driver.
Setting logs using the /etc/docker/daemon.json
file will set the default logging behavior on a per-node basis. This can be overwritten on a per-service or a per-container level. Overwriting the default logging behavior can be useful for troubleshooting so that the logs can be viewed in real-time.
If a service is created on a system where the daemon.json
file is configured to use the gelf
log driver, then all container logs running on that host will go to where the gelf-address
config is set.
If a different logging driver is preferred, for instance to view a log stream from the stdout
of the container, then it's possible to override the default logging behavior ad-hoc.
docker service create \
-–log-driver json-file --log-opt max-size=10m \
nginx:alpine
This can then be coupled with Docker service logs to more readily identify issues with the service.
Docker Service Logs
docker service logs
was introduced in version 17.05 of and version 17.06 of Docker EE. It provides a multiplexed stream of logs when a service has multiple replica tasks. By entering in docker service logs <service_id>
, the logs show the originating task name in the first column and then real-time logs of each replica in the right column. For example:
$ docker service create -d --name ping --replicas=3 alpine:latest ping 8.8.8.8
5x3enwyyr1re3hg1u2nogs40z
$ docker service logs ping
ping.2.n0bg40kksu8e@m00 | 64 bytes from 8.8.8.8: seq=43 ttl=43 time=24.791 ms
ping.3.pofxdol20p51@w01 | 64 bytes from 8.8.8.8: seq=44 ttl=43 time=34.161 ms
ping.1.o07dvxfx2ou2@w00 | 64 bytes from 8.8.8.8: seq=44 ttl=43 time=30.111 ms
ping.2.n0bg40kksu8e@m00 | 64 bytes from 8.8.8.8: seq=44 ttl=43 time=25.276 ms
ping.3.pofxdol20p51@w01 | 64 bytes from 8.8.8.8: seq=45 ttl=43 time=24.239 ms
ping.1.o07dvxfx2ou2@w00 | 64 bytes from 8.8.8.8: seq=45 ttl=43 time=26.403 ms
This command is useful when trying to view the log output of a service that contains multiple replicas. Viewing the logs in real time, streamed across multiple replicas allows for instant understanding and troubleshooting of service issues across the entire cluster.
ELK Setup
Please refer to the swarm-elk repository for information on how to send logs to ELK on a Docker Swarm. This repo contains a Docker Compose file that sets up a complete ELK stack. The repository is a good starting point for experimenting with Docker logging with ELK, but also consider high availability and a queuing system before it's used in a production manner.
Splunk Setup
Splunk is another popular logging utility. To set up Splunk follow the steps using the Docker Compose files available in this swarm-splunk repository.
Each Splunk forwarder connects to the local Docker socket, so it doesn't need additional log driver configuration at the daemon level. Connecting to the local socket also allows Splunk to pull out Docker container statistics in addition to logs.
Modernize Traditional Applications
There's no reason to think that logging using containers is only meant for modern applications. It's possible to modernize traditional applications and still have the added benefit of modernized logging without changing any application code.
Ideally, applications log to stdout
/stderr
, and Docker sends those logs to the configured logging destination. Sometimes certain applications are configured to log to multiple locations. How can those logs be captured without having to change any of the application source code?
It's possible to create a Docker volume (or many volumes) directed at where those log files reside within the application. By leveraging Docker templating, it's possible to suffix each volume with the service task ID (an integer). Placing a suffix on the volume name prevents any collisions in regards to logging should multiple service tasks run on the same host. A global service needs to be created that runs a logging agent with directory wildcard support. Finally, additional regex can be setup via the logging utility that turns the source directory of the file, into an indexed value.
The following example shows how this can be accomplished using the official Tomcat image. The official Tomcat image logs several files in /usr/local/tomcat/logs
, much like most Java applications would do. In that path, files such as catalina.2017-07-06.log
, host-manager.2017-07-06.log
, localhost.2017-07-06.log
, localhost_access_log.2017-07-06.txt
, and manager.2017-07-06.log
can be found.
Create a global service for the logging utility that mounts
/var/lib/docker/volumes:/logs/volumes
.Create a logging rule for the logging agent that logs using a rule similar to this generic example:
"/log/volumes/*/_data/*.log"
.Launch a service using go based templating on the volumes:
When launching the service, use these parameters:
docker service create \ -d \ --name prod-tomcat \ --label version=1.5 \ --label environment=prod \ --mount type=volume,src="{{.Task.Name}}",dst=/usr/local/tomcat/logs \ --replicas 3 \ tomcat:latest
If both replicas schedule on the same node, then two volumes containing the logs will be create on the host
prod-tomcat.1.oro7h0e5yow69p9yumaetor3l
andprod-tomcat.2.ez0jpuqe2mkl6ppqnuffxqagl
.As long as the logging agent supports wildcards and handles any log rotation by checking the inode (not the file), then the logs should be collected.
If the application logs to multiple locations, then try to symlink the logs to a single directory or add a descriptive name to the volume. If a descriptive name is added to the volume name, then any sort of extraction will need to be updated to accommodate that change. (i.e. with a
grok
)Most loggers should collect the file path as well as the log contents. By turning the volumes where the log files reside into indexable fields it's possible to search and aggregate information from these types of applications. Here is an example that uses a
grok
pattern and creates two new indexable fields,CONTAINER_NAME
andFILENAME
.match => { "path" => "/log/volumes/%{DATA:CONTAINER_NAME}/_data/%{GREEDYDATA:FILE_NAME}" }
The
CONTAINER_NAME
will match the output of thestdout
stream from the container, making it easy to filter based on the container's logs.
More information to a repository with a working example can be found in the swarm-elk repo.
Windows Logging
The ETW logging driver forwards container logs as ETW events. ETW stands for Event Tracing in Windows and is the common framework for tracing applications in Windows. Each ETW event contains a message with both the log and its context information. A client can then create an ETW listener to listen to these events and forward them to a location where the logs can be collected and analyzed.
To log using the ETW driver on Windows, create a service or run a container with the flag --log-driver=etwlogs
.
Here's is an example event message:
container_name: backstabbing_spence,
image_name: windowsservercore,
container_id: f14bb55aa862d7596b03a33251c1be7dbbec8056bbdead1da8ec5ecebbe29731,
image_id: sha256:2f9e19bd998d3565b4f345ac9aaf6e3fc555406239a4fb1b1ba879673713824b,
source: stdout,
log: Hello world!
Conclusion
Docker provides many options when it comes to logging, and it's helpful to have a logging strategy when adopting the platform. For most systems, leaving the log data on the host isn't adequate. Being able to index, search, and have a self-service platform allows for a smoother experience for both operators and developers.