- What is Prometheus?
- What are Prometheus Exporters?
- What are OpenTelemetry Metrics and Exporters
One surprising thing about the software instrumentation world, a world that helps us see what our systems are doing, is how opaque the instrumentation tools themselves can be.
Take the Prometheus project. It has solid engineering behind it, and even has some pretty good docs — but those docs are written from the point of view of someone who’s already well versed in the how and why of performance monitoring, software engineering, and site reliability engineering.
The getting started docs have you both installing Prometheus, and then monitoring that same just-installed Prometheus software using the built in Prometheus metrics that Prometheus engineers have instrumented their own system with. The document doesn’t make any of that obvious. At the end you’re left with a demo of a functioning Prometheus node, but not a clear understanding of how you’d use the software itself in your day to day job(s).
The instructions on how to write a client library or an exporter are more philosophical documents — there isn’t any mention of what a Prometheus client library or exporter actually are.
This isn’t meant as a harsh critique of Prometheus — but more a gentle critique of the software instrumentation world at large. It’s my hope that as we start seeing instrumentation systems moving towards more open models that we’ll also see less gatekeeping around the software that makes these systems run.
In this three part series we’re going to take a look at Prometheus. We’ll explain what I see as its four pillars, what does and doesn’t work about those pillars, and how the new OpenTelemetry project might solve some of the challenges faced by the Prometheus project.
Prerequisites
This article will feature some actual typing into files and running of software. It presumes you’ve installed the Prometheus software — either via direct download, a package manager like homebrew (brew install prometheus
), or by compiling the source yourself.
We’ll also be writing some code with NodeJS, so ideally you’ll want to have Node installed as well.
We also assume you have access to a web browser and the command line program curl
(or a similar tool for fetching URLs)
While you’ll certainly learn something if you just read the article, we recommend actually working through the code examples to get a real feel for how things work.
First Pillar: Data Model
So what is Prometheus? For me, first and foremost, Prometheus is a data model for describing and recording metrics over time.
So what’s a metric? A metric is a data object that allows folks to record things about their system’s performance. The Prometheus data model allows you to record
- That something happened
- How many times it happened
- When it happened
- How long that something took to finish
Second Pillar: Library for Creating Metrics
The second pillar is the Prometheus code libraries. These libraries allow end-user-programmers to use their favorite programming language to add code to their applications, services, and systems that will measure their applications, services, and systems.
These libraries also allow end-user-programmers to generate a plain text description of the current metric values in their system.
Prometheus provides official libraries for Go, Java/Scala, Python, and Ruby. There are a plethora of similar unofficial libraries for over a dozen other languages.
Third Pillar: Instrumentations
Third — Prometheus is a community of users who chose to use these libraries to add instrumentation to their applications and systems.
The Prometheus binary that you run doesn’t actually monitor anything itself. Instead, developers use client libraries to add instrumentation to their applications and systems, and it’s these developers who ultimately decide what sort of metrics to expose.
When we say expose, we mean that literally. In addition to recording metrics, these instrumentations are also responsible for exposing an HTTP(s) URL (usually /metrics
). This URL is responsible for printing the collected metric data in a simple plain-text format.
Fourth Pillar: Prometheus Itself
The final pillar is the Prometheus software itself. Prometheus is a piece of software that can fetch (or, in their language, “scrape”) the plain text Prometheus metrics exported by instrumentations at the /metrics
URL endpoint. The Prometheus software stores these metrics, and provides a web application to display the metrics back to end-users.
Display is a broad term: There’s a query language for searching metrics, as well as visualization tools for creating graphs from those metrics. The Prometheus application also allows you to produce alerts based on these metric values.
Hello World for Programmers
One of the reasons instrumentation systems are hard to understand or write about is that they’re used by a wide variety of people who have very different roles. Non-technical people may use the application to get a feel for how a system is performing. A site reliability engineer may use the application to respond to a specific incident or help prevent future incidents. Developers and programmers may use the application to make decisions about performance trade-offs, and its developers and programmers who are the the ones who write the instrumentation for systems.
It’s this last audience we’re interested in today. Let’s pretend we’re a developer responsible for creating a JSON based web service delivered via HTTP. We’ll say it’s a service written in NodeJS (because that’s where my head is these days), and we’ll say we want to know how often end users call a specific endpoint in the service.
What we’re Going to Do
So — we have our four pillars: Data Model, Library, Instrumentation, and the Prometheus Software.
Data Model
We need to pick a Prometheus data type that best models what we want to do. Prometheus has four broad metric types: Counters, Gauges, Histograms, and Summaries. It’s beyond the scope of this article to explain what each of these metric types does. For our specific use case (counting how many times a service endpoint is called) the Counter metric type is perfect.
Library
We mentioned this is a NodeJS service. Unfortunately — Prometheus doesn’t have an official code library for NodeJS. Fortunately for us, there is an unofficial library with an open source license we can use.
This library will let us add code to our service that will create a counter metric as well as expose these metrics to the outside world.
Prometheus Itself
Once we’ve added instrumentation to our service, we’ll need to configure the Prometheus software to scrape these metrics from our service.
Creating a Service
So let’s start with a simple, hello world-ish service for NodeJS. We’ll create a new project and add the express
web framework as a dependency.
$ mkdir our-example-project
$ cd our-example-project
$ npm init -y
Wrote to /.../our-example-project/package.json:
{
"name": "our-example-project",
"version": "1.0.0",
"description": "",
"main": "index.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
"keywords": [],
"author": "",
"license": "ISC"
}
$ npm install express
Once you’ve run the four commands above, you should have a package.json
file installed along with a node_modules
folder that contains express
and all its sub-dependencies.
Next, create an index.js
file with the following contents.
// File: index.js
const express = require('express')
const app = express()
const port = 3000
app.get('/stuff', function (req, res) {
res.type('json')
res.send(JSON.stringify({hello:"world"}))
})
app.listen(
port,
function() {
console.log(`Example app listening at http://localhost:${port}`)
}
)
This program creates a small service that responds to the /stuff
URL. You can run the program by typing
$ node index.js
Finally, after running the program and starting your service, access the following URL in a browser or via a command line program like curl
$ curl -i http://localhost:3000/stuff
HTTP/1.1 200 OK
X-Powered-By: Express
Content-Type: application/json; charset=utf-8
Content-Length: 17
ETag: W/"11-IkjuL6CqqtmReFMfkkvwC0sKj04"
Date: Sat, 02 May 2020 19:19:46 GMT
Connection: keep-alive
{"hello":"world"}
So far there’s nothing Prometheus specific here — we’ve just setup a simple web service. If what we’ve done above is new to you and you’d like to learn more, the MDN Network tutorial on Express looks like a good place to start.
Instrumenting our Service
Now that we have our service, let’s use the prom-client
library to
- Create a Metric
- Record that Metric
- Export that metric
First, we’ll need to install the prom-client
library
$ npm install prom-client
Once installed, we’ll want to modify our service so it looks like this
// File: index.js
const express = require('express')
// NEW: get the prom-client object
const promClient = require('prom-client')
const app = express()
const port = 3000
// NEW: create our metric object. This object defines
// a named metric in our system, and other code
// will use this object.
const counter = new promClient.Counter({
name: 'my_metric_identifier',
help: 'an example counter metric for this tutorial',
});
app.get('/stuff', function (req, res) {
// NEW: here we _use_ the metric object. Counter metric
// are _incremented_ -- i.e. we "inc"ement the same
// way a bouncer or fire marshall at bar would increment
// a counter to keep track of how many people are inside.
counter.inc()
res.type('json')
res.send(JSON.stringify({hello:"world"}))
})
// NEW: here we expose a `/metrics` endpoint in our service
// that will return _all_ the Prometheus metrics. The
// "register" object is the global registry that
// all metrics will be added to by default. The `metrics()`
// method is a method that returns all the current metric
// values in the system in a custom plain text format
app.get('/metrics', function (req, res) {
res.send(
require('prom-client').register.metrics()
)
})
app.listen(
port,
function() {
console.log(`Example app listening at http://localhost:${port}`)
}
)
OK — that’s a lot of NEW
code. We’ll get to explaining it in a moment. Before we do that though, lets start up our service again
$ node index.js
and then call our endpoint a few times
$ curl 'http://localhost:3000/stuff'
{"hello":"world"}
$ curl 'http://localhost:3000/stuff'
{"hello":"world"}
$ curl 'http://localhost:3000/stuff'
{"hello":"world"}
and then take a look at our new /metrics
endpoint.
$ curl http://localhost:3000/metrics
# HELP my_metric_identifier an example counter metric for this tutorial
# TYPE my_metric_identifier counter
my_metric_identifier 3
We can see that /metrics
is reporting our metric has counted up three times (my_metric_identifier 3
). Congratulations — you just instrumented your first service using Prometheus.
What Just Happened
Let’s take a look at our first bit of new code
const promClient = require('prom-client')
/* ... */
const counter = new promClient.Counter({
name: 'my_metric_identifier',
help: 'an example counter metric for this tutorial',
});
Here we’ve loaded the prom-client
library per NodeJS’s standard require
mechanism, and then used that library’s Counter
method to create a new counter metric object. The metric’s unique identifier is my_metric_identifier
, and its help
text describes what it’s used for.
Creating this object does not actually record anything. This object is what we’ll use to provide our actual instrumentation, and this object will contain the data needed to store that metric’s value in memory.
Next, we have this
app.get('/stuff', function (req, res) {
counter.inc()
/* ... */
})
This is where we’re actually incrementing the counter, or recording a value to our instrumentation, or measuring the performance of our application, etc. Whenever our /stuff
endpoint is hit, we’ll increment our counter.
Finally, we have this
app.get('/metrics', function (req, res) {
res.type('text')
res.send(
require('prom-client').register.metrics()
)
})
Here we’ve created a new route in our service. This route generates a plain text version of the current metric values in the system with this code here.
require('prom-client').register.metrics()
This metrics()
method is provided by the client library. The register
property is a global registry object that all metrics are added to. The semantics here are specific to this client library. The semantics of your language’s library may differ, but most should provide a way to output the values of the metrics as plain text.
When we called the metrics
URL
$ curl http://localhost:3000/metrics
# HELP my_metric_identifier an example counter metric for this tutorial
# TYPE my_metric_identifier counter
my_metric_identifier 3
We saw a text representation of the metric value. This format comes from Prometheus — the important part for us is my_metric_identifier 3
— this is our metric’s identifier, with the current count value.
Scraping Values
With an instrumented service, our next and final step is to tell the Prometheus software about it. Create the following configuration file
# File: prometheus-config.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: our-first-service
static_configs:
- targets: ['localhost:3000']
Then, start a Prometheus instance with this configuration file
$ prometheus --config.file=prometheus-config.yml
# ... blast of logging info ...
With Prometheus running, hit our service’s endpoint a few times
$ curl http://localhost:3000/stuff
{"hello":"world"}
$ curl http://localhost:3000/stuff
{"hello":"world"}
$ curl http://localhost:3000/stuff
{"hello":"world"}
Then, load the Prometheus UI in a browser (http://localhost:9090/graph
) and search for my_metric_identifier
in the Console
While the UI isn’t going to win any Apple Design Awards™ we can see our metric displayed. From here we could graph this count over time, or setup alerts based on this count’s value. Both these topics are beyond the scope of what we’re trying to accomplish today — we’ll leave them as an exercise for the reader.
The Configuration
If we take a look at our configuration file again.
global:
scrape_interval: 15s
scrape_configs:
- job_name: our-first-service
static_configs:
- targets: ['localhost:3000']
This file is yaml formatted. You can use the top level global
section to set default values. The scrape_configs
section is where we point Prometheus at the instrumented system we want to monitor.
The job_name
field (our-first-service
above) is a unique identifier for the service we want to instrument.
The static_configs
field contains a list of all the services this particular Prometheus instance will monitor. That http://localhost:3000
entry points at the service we setup in previous sections. Prometheus will look for a /metrics
endpoint on this service, consume those metrics, and store them in a local folder. By default this local folder is the /data
folder
$ find data
data
data/wal
data/wal/00000000
data/lock
data/queries.active
This data itself is a custom-to-Prometheus binary format
$ cat data/wall/00000000
$ cat data/wal/00000000
V
?
__name__my_metric_identifieinstancelocalhost:3000jobour-fi...
Extra NodeJS Metrics
The above was a simple example meant to ease you into some general Prometheus concepts. The value of a counter that counts how often each endpoint is/was called is somewhat dubious. However, many libraries also offer a set of default metrics for their language’s runtime. The prom-client
library we’ve been using is no exception. If you want to include these metrics in your service, just call the collectDefaultMetrics
method after requiring
the library.
const promClient = require('prom-client')
promClient.collectDefaultMetrics();
After adding the above to your service and restarting it, try loading the /metrics
URL again.
$ curl -i http://localhost:3000/metrics
# HELP process_cpu_user_seconds_total Total user CPU time spent in seconds.
# TYPE process_cpu_user_seconds_total counter
process_cpu_user_seconds_total 0.49347700000000005
# HELP process_cpu_system_seconds_total Total system CPU time spent in seconds.
# TYPE process_cpu_system_seconds_total counter
process_cpu_system_seconds_total 0.546932
# ... holy cow so many metrics ...
# HELP my_metric_identifier an example counter metric for this tutorial
# TYPE my_metric_identifier counter
my_metric_identifier 0
The my_metric_identifier
we setup is still there, but along with a multitude of other metrics created by the prom-client
library author. If you’re curious how this is done, you can start following the code here.
In addition to creating these metric objects, the library author(s) are the ones who decide how-often and when these metrics are recorded (i.e. inc
in our examples above). The Prometheus project does provide some guidance for library authors, but there’s a lot of leeway in these guidelines. In the case of prom-client
, it appears these metrics are collected whenever a registry’s /metric
endpoint is scraped. Exploring this code in full is another exercise we’ll leave for the reader.
Wrap Up
So those are my four pillars of Prometheus, and how they might be applied to a service written using NodeJS. Experienced Prometheus users are probably anxiously typing out an email to me because I’ve skipped one important Prometheus concept. Don’t worry, dear reader, in our next article we’ll explore Prometheus exporters — what they are and, why they exist.