Reading logs from multiple Kubernetes Pods using kubectl can become cumbersome fast. What if there was a way to collect logs from across the cluster in a single place and make them easy to filter, query and analyze? Enter Promtail, Loki, and Grafana.
By default, logs in Kubernetes only last a Pod’s lifetime. In order to keep logs for longer than a single Pod’s lifespan, we use log aggregation. This means we store logs from multiple sources in a single location, making it easy for us to analyze them even after something has gone wrong. While the ELK stack (short for Elasticsearch, Logstash, Kibana) is a popular solution for log aggregation, we opted for something a little more lightweight: Loki.
Developed by Grafana Labs, ‘Loki is a horizontally scalable, highly available, multi-tenant log aggregation system inspired by Prometheus’. Loki allows for easy log collection from different sources with different formats, scalable persistence via object storage and some more cool features we’ll explain in detail later on. For now, let’s take a look at the setup we created.
Follow the instructions that show up after the installation process is complete in order to log in to Grafana and start exploring.
In this article, we’ll focus on the Helm installation. Grafana Labs offers a bunch of other installation methods.
In Grafana’s Helm chart repository , you’ll find 5 charts related to Loki. Loki-canary allows you to install canary builds of Loki to your cluster. Loki-distributed installs the relevant components as microservices, giving you the usual advantages of microservices, like scalability, resilience etc. while allowing you to configure them independently of one another. Loki-simple-scalable is similar - however, some of the components are always on, taking away a number of the configuration possibilities. The chart named Loki will deploy a single StatefulSet to your cluster containing everything you need to run Loki. The last of the bunch is loki-stack, which deploys the same StatefulSet as the Loki chart in addition to Promtail, Grafana and some others. For our use case, we chose the Loki chart. In addition to Loki itself, our cluster also runs Promtail and Grafana. In the following section, we’ll show you how to install this log aggregation stack to your cluster!
To follow along, you need a Kubernetes cluster that you have kubectl access to and Helm needs to be set up on your machine.
First of all, we need to add Grafana’s chart repository to our local helm installation and fetch the latest charts like so:
Once that’s done, we can start the actual installation process.
Let’s start by getting Loki running in our cluster. To configure your installation, take a look at the values the Loki Chart accepts via the ‘helm show values’ command, and save that to a file.
We won’t go over the settings in detail, as most values can be left at their defaults. However, you should take a look at the persistence key in order to configure Loki to actually store your logs in a PersistentVolume.
Once you’re done adapting the values to your preferences, go ahead and install Loki to your cluster via the following command:
After that’s done, you can check whether everything worked using kubectl:
If the output looks similar to this, congratulations! That’s one out of three components up and running.
Next, let’s look at Promtail. Promtail has 3 main features that are important for our setup:
To install it, we first need to get a values file, just like we did for Loki:
Like for Loki, most values can be left at their defaults to get Promtail working. However, we need to tell Promtail where it should push the logs it collects by doing the following:
We ask kubectl about services in the Loki namespace, and we’re told that there is a service called Loki, exposing port 3100. To get Promtail to ship our logs to the correct destination, we point it to the Loki service via the ‘config’ key in our values file.
Under ‘lokiAddress’, we specify that we want Promtail to send logs to ‘
If you don’t want Promtail to run on your master/control plane nodes, you can change that here.
Now that we set the most important values, let’s get this thing installed!
Verify that everything worked as expected:
You can also take a look at the Pods with the ‘-o wide’ flag to see what node they’re running on:
Last but not least, let’s get an instance of Grafana running in our cluster.
The following values will enable persistence. If you want your Grafana instance to be able to send emails, you can configure SMTP as shown below. Just add your SMTP host and ‘from_address’ to create a secret containing your credentials.
Once you’re done configuring your values, you can go ahead and install Grafana to your cluster like so:
Verify everything went smoothly:
All three components are up and running, sweet! Now that we’re all set up, let’s look at how we can actually put this to use.
Connecting your newly created Loki instance to Grafana is simple. All you need to do is create a data source in Grafana. Under Configuration → Data Sources, click ‘Add data source’ and pick Loki from the list. You’ll be presented with this settings panel, where all you need to configure, in order to analyze your logs with Grafana, is the URL of your Loki instance. Since Grafana is running in the same namespace as Loki, specifying ‘
When you’re done, hit ‘Save & test’ and voilà, you’re ready to run queries against Loki.
‘LogQL is Grafana Loki’s PromQL-inspired query language. Queries act as if they are a distributed grep to aggregate log sources. LogQL uses labels and operators for filtering.’
With LogQL, you can easily run queries against your logs. You can either run log queries to get the contents of actual log lines, or you can use metric queries to calculate values based on results.
If you just want logs from a single Pod, it’s as simple as running a query like this:
Grafana will automatically pick the correct panel for you and display whatever your Loki Pod logged.
This query will filter logs from a given namespace that contain the word ‘error’ It will count them over the range selected in the dashboard and return the sum, giving you a simple overview of what’s going on across your cluster.
This query is as complex as it will get in this article. It collects logs from a namespace before applying multiple neat features LogQL offers, like pattern matching, regular expressions, line formatting and filtering. In the end, you’ll receive the average response time of apps running in the given namespace within the selected interval. You’ll effectively be filtering out the log lines that are generated by Kubernetes liveness and readiness probes, grouped by app label and path. Note: this exact query will work for
If you don’t want to store your logs in your cluster, Loki allows you to send whatever it collects to S3-compatible storage solutions like Amazon S3 or MinIO. The log analysing/viewing process stays the same.
File system storage does not work when using the distributed chart, as it would require multiple Pods to do read/write operations to the same PV. This is documented in the