Monitoring AIStore with Prometheus

AIStore tracks a growing list of performance counters, utilization percentages, latency and throughput metrics, transmitted and received stats (total bytes and numbers of objects), error counters, and more.

Viewership is equally supported via:

On the monitoring backend side, AIS equally supports:

This document mostly talks about the “Prometheus” option. Other related documentation includes AIS metrics readme that provides general background, naming conventions and examples, and also have a separate section on aisloader metrics - the metrics generated by aisloader when running its benches.

For aisloader, please refer to Load Generator and How To Benchmark AIStore.

Prometheus Exporter

AIStore is a fully compliant Prometheus exporter that natively supports Prometheus stats collection. There’s no special configuration - the only thing required to enable the corresponding integration is letting AIStore know whether to publish its stats via StatsD or Prometheus.

The corresponding binary choice between StatsD and Prometheus is a deployment-time switch that is a single environment variable: AIS_PROMETHEUS. When a starting-up AIS node (gateway or storage target) sees AIS_PROMETHEUS in the environment it registers all its metric descriptions (names, labels, and helps) with Prometheus and provides HTTP endpoint /metrics for subsequent collection (aka “scraping”) by Prometheus.

With no AIS_PROMETHEUS in the environment, AIS nodes default to StatsD.

Here’s a simplified example:

$ AIS_PROMETHEUS=true aisnode -config=/etc/ais/ais.json -local_config=/etc/ais/ais_local.json -role=target

# Assuming the target with hostname "hostname" listens on port 8081:
$ curl http://hostname:8081/metrics | grep ais

# A sample output follows below (note the metric names that must be self-explanatory):

  # TYPE ais_target_DFIltrTgz_disk_sda_avg_rsize gauge
  ais_target_DFIltrTgz_disk_sda_avg_rsize 23560
  # HELP ais_target_DFIltrTgz_disk_sda_avg_wsize average write size (bytes)
  # TYPE ais_target_DFIltrTgz_disk_sda_avg_wsize gauge
  ais_target_DFIltrTgz_disk_sda_avg_wsize 63120
  # HELP ais_target_DFIltrTgz_disk_sda_util gauge
  # TYPE ais_target_DFIltrTgz_disk_sda_util gauge
  ais_target_DFIltrTgz_disk_sda_util 42
  # HELP ais_target_DFIltrTgz_get_mbps throughput (MB/s)
  # TYPE ais_target_DFIltrTgz_get_mbps gauge
  ais_target_DFIltrTgz_get_mbps 72.65
  # HELP ais_target_DFIltrTgz_get_ms latency (milliseconds)
  # TYPE ais_target_DFIltrTgz_get_ms gauge
  ais_target_DFIltrTgz_get_ms 2
  # HELP ais_target_DFIltrTgz_get_n total number of operations
  # TYPE ais_target_DFIltrTgz_get_n counter
  ais_target_DFIltrTgz_get_n 155431
  # HELP ais_target_DFIltrTgz_get_redir_ms latency (milliseconds)
  # TYPE ais_target_DFIltrTgz_get_redir_ms gauge
  ais_target_DFIltrTgz_get_redir_ms 0
  # HELP ais_target_DFIltrTgz_kalive_ms latency (milliseconds)
  # TYPE ais_target_DFIltrTgz_kalive_ms gauge
  ais_target_DFIltrTgz_kalive_ms 1
  # HELP ais_target_DFIltrTgz_lst_ms latency (milliseconds)
  # TYPE ais_target_DFIltrTgz_lst_ms gauge
  ais_target_DFIltrTgz_lst_ms 2
  # HELP ais_target_DFIltrTgz_lst_n total number of operations
  # TYPE ais_target_DFIltrTgz_lst_n counter
  ais_target_DFIltrTgz_lst_n 120
  # HELP ais_target_DFIltrTgz_put_ms latency (milliseconds)
  # TYPE ais_target_DFIltrTgz_put_ms gauge
  ais_target_DFIltrTgz_put_ms 5
  # HELP ais_target_DFIltrTgz_put_n total number of operations
  ...

References:

  • https://prometheus.io/docs/instrumenting/writing_exporters/
  • https://prometheus.io/docs/concepts/data_model/
  • https://prometheus.io/docs/concepts/metric_types/

StatsD Exporter for Prometheus

If, for whatever reason, you decide to use the “StatsD” option, you can still send AIS stats to Prometheus - via its own generic statsd_exporter extension that on-the-fly translates StatsD formatted metrics.

Note: while native Prometheus integration (the previous section) is the preferred and recommended option statsd_exporter can be considered a backup plan for deployments with very special requirements.

First, the picture:

AIStore monitoring with Prometheus

The diagram depicts AIS cluster that runs an arbitrary number of nodes with each node periodically sending its StatsD metrics to a configured UDP address of any compliant StatsD server. In fact, statsd_exporter is one such compliant StatsD server that happens to be available out of the box.

To deploy statsd_exporter:

  • you could either use prebuilt container image;
  • or, git clone or go install the exporter’s own repository at https://github.com/prometheus/statsd_exporter and then run it as shown above. Just take a note of the default StatsD port: 8125.

To test a combination of AIStore and statsd_exporter without Prometheus, run the exporter with debug:

$ statsd_exporter --statsd.listen-udp localhost:8125 --log.level debug

The resulting (debug) output will look something like:

level=info ts=2021-05-13T15:30:22.251Z caller=main.go:321 msg="Starting StatsD -> Prometheus Exporter" version="(version=, branch=, revision=)"
level=info ts=2021-05-13T15:30:22.251Z caller=main.go:322 msg="Build context" context="(go=go1.16.3, user=, date=)"
level=info ts=2021-05-13T15:30:22.251Z caller=main.go:361 msg="Accepting StatsD Traffic" udp=localhost:8125 tcp=:9125 unixgram=
level=info ts=2021-05-13T15:30:22.251Z caller=main.go:362 msg="Accepting Prometheus Requests" addr=:9102
level=debug ts=2021-05-13T15:30:27.811Z caller=listener.go:73 msg="Incoming line" proto=udp line=aistarget.pakftUgh.kalive.latency:1|ms
level=debug ts=2021-05-13T15:30:29.891Z caller=listener.go:73 msg="Incoming line" proto=udp line=aisproxy.qYyhpllR.pst.count:77|c
level=debug ts=2021-05-13T15:30:37.811Z caller=listener.go:73 msg="Incoming line" proto=udp line=aistarget.pakftUgh.kalive.latency:1|ms
level=debug ts=2021-05-13T15:30:39.892Z caller=listener.go:73 msg="Incoming line" proto=udp line=aisproxy.qYyhpllR.pst.count:78|c
level=debug ts=2021-05-13T15:30:47.811Z caller=listener.go:73 msg="Incoming line" proto=udp line=aistarget.pakftUgh.kalive.latency:1|ms
level=debug ts=2021-05-13T15:30:49.892Z caller=listener.go:73 msg="Incoming line" proto=udp line=aisproxy.qYyhpllR.pst.count:79|c
...

Finally, point any available Prometheus instance to poll the listening port - 9102 by default - of the exporter.

Note that the two listening ports mentioned - StatsD port 8125 and Prometheus port 9102 - are both configurable via the exporter’s command line. To see all supported options, run:

$ statsd_exporter --help