Hashi @ Home

Logging Hashi@Home

consul logging loki nomad observability promtail terraform vault

In this post, we look at deploying a logging layer on (and for) Hashi@Home. As one of the commonly-identified triad of an observability platform, it is an important part of any production environment. In Monitoring we went through the motions of selecting what we would use for monitoring, and Prometheus came out on top.

After doing a bit more research, I decided that the logging component for Hashi at Home would be based on Grafana Loki, which would essentially play nicely with Prometheus.

Logging Game Plan

Loki works similarly to Prometheus in that an agent on an endpoint collects data (logs in this case, metrics in the case of Prometheus), which are then centralised to a central storage system.

The overall plan then is to deploy the log scrapers first as a system job, and then the actual logging system, Loki itself, once we are actually ready to collect data.

So, we would have a promtail.nomad system job and a loki.nomad service job. Let’s take a closer look.

Promtail

We want the promtail job to run on all of the nodes, with a restart if the job fails. While the job could be run as a container, that would mean a bunch of extra configuration that I didn’t want to deal with initially, so I chose the raw_exec driver so that the job would have direct access to system logs as well as container logs.

The job would take only one variable the Promtail version. As mentioned before, we need to use the system job type so that Nomad would schedule executions across all available nodes.

Since promtail exposes a server, to which Loki will eventually be a client, we can also define a service which can be registered with Consul. These will be the HTTP endpoint, which has its readiness probe – we’ll use that to register a health check. Finally, the task needs an artifact to actually execute and a configuration file to launch it with. We will define the Nomad task with these parameters and use a Nomad template to deliver it to the execution environment when the job is scheduled.

Putting it all together, the Promtail Nomad job looks as follows:

# promtail.nomad -- job definition for Promtail

variable "promtail_version" {}

job "promtail" {

  type = "system"
  group "promtail" {
    network {
      port "http" {}
      port "grpc" {}
    }

    service {
      name = "promtail"
      port = "http"

      check {
        name     = "promtail-alive"
        type     = "tcp"
      }

      check {
        name = "Promtail HTTP"
        type = "http"
        path = "/ready"
        port = "http"
      }
    }

    task "promtail" {
      driver = "raw_exec"
      config {
        command = "promtail"
        args = ["-config.file=local/promtail.yml"]
      }

      artifact {
        source = "https://github.com/grafana/loki/releases/download/v${var.promtail_version}/promtail-linux-${attr.cpu.arch}.zip"
        destination = "local/promtail"
        mode = "file"
      }
      template {
         data          = file("templates/promtail.yml.tpl")
         destination   = "local/promtail.yml"
         change_mode   = "signal"
         change_signal = "SIGHUP"
      }
    }
  }
}

Promtail configuration template

In the task stanza above, we have two pieces of input data which are required in order to run promtail:

  1. The promtail executable itself
  2. The promtail configuration file

The former is obtained by selecting the version we want to deploy, and constructing a URL to retrieve it from; the promtail binaries are conveniently packaged by the maintainer and published via a Github releases URL. All we need to decide on is the version and the build architecture. We use the Nomad node attributes to ensure that the job is portable across the whole set of computatoms, which may have ARM or ARM64 CPU architectures.

The configuration file however needs to be created for our specific deployment. We need to tell promtail which port to serve on, and the logging level we desire for the service, and which clients1 it should eventually send data to.

For this we use a consul template to allow the job to be dynamically configure.

The server definition takes into account the dynamic port assigned to the job by Nomad itself, by using the Nomad runtime variables:

server:
  log_level: info
  http_listen_port: {{ env "NOMAD_PORT_http" }}
  grpc_listen_port: {{ env "NOMAD_PORT_grpc" }}

For the client definition, we use a different trick, relying on the Fabio load balancer deployed across all nodes. Taking advantage of the integration between Consul and Fabio, this allows services to be called via the proxy running on localhost2, simply by service name:

clients:
  - url: http://localhost:9999/loki/loki/api/v1/push

Most importantly though, we need to tell promtail which logs to scrape. Somewhat ironically, this is the least “templated” part of the configuration, with just a static definition of scrape configs to pass to promtail. I wanted to start with monitoring the infrastructure layers, which includes Consul and Nomad, so I defined those scrape configuration targets with a hardcoded __path__ value. Of course, OS-level logging is also interesting, so I wanted to scrape journal logs. Putting it all together, we get:

scrape_configs:
  - job_name: consul
    static_configs:
    - targets:
        - localhost
      labels:
        job: consul
        __path__: /home/consul/*.log
  - job_name: nomad
    static_configs:
    - targets:
        - localhost
      labels:
        job: nomad
        __path__: /var/log/nomad*.log
  - job_name: journal
    journal:
      json: false
      max_age: 12h
      path: /var/log/journal
      matches: _TRANSPORT=kernel
      labels:
        job: systemd-journal
    relabel_configs:
      - source_labels: ['__journal__systemd_unit']
        target_label: 'unit'
      - source_labels: ['__journal_syslog_identifier']
        target_label: 'syslog_identifier'
      - source_labels:
          - __journal__hostname
        target_label: nodename

Unfortunately, promtail doesn’t support scraping of Nomad job logs just yet, but I suppose I could scrape the allocations directory, according to this post on the forum3.

Promtail deployment

Great, now we have a job definition, a somewhat Submitting this to our Nomad cluster at home, we get the following plan:

+ Job: "promtail"
+ Task Group: "promtail" (11 create)
  + Task: "promtail" (forces create)

Scheduler dry-run:
- All tasks successfully allocated.

Job Modify Index: 0
To submit the job with version verification run:

nomad job run -check-index 0 promtail.nomad

When running the job with the check-index flag, the job will only be run if the
job modify index given matches the server-side version. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.

After the deployment, we can see that it’s running on all nodes:

nomad status promtail
ID            = promtail
Name          = promtail
Submit Date   = 2023-10-22T22:46:34+02:00
Type          = system
Priority      = 50
Datacenters   = dc1
Namespace     = default
Node Pool     = default
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost  Unknown
promtail    0       0         12       8       14        3     0

Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created    Modified
ad425924  35249003  promtail    2        run      running   3m8s ago   2m37s ago
7e477038  d198c25e  promtail    2        run      running   3m39s ago  3m26s ago
412271ec  02195421  promtail    2        run      running   4m40s ago  4m22s ago
91afc845  6ce27c1c  promtail    2        run      running   4m45s ago  4m30s ago
e6e6e97c  786edced  promtail    2        run      running   1d3h ago   1d3h ago
4d9cf48d  f92bedb4  promtail    2        run      running   1d22h ago  1d22h ago
6c6d1711  ffbe483d  promtail    2        run      running   1d22h ago  1d22h ago
0885514a  cd25a978  promtail    2        run      running   1d22h ago  1d22h ago
a538b97d  0243e238  promtail    2        run      running   1d22h ago  1d22h ago
13e91861  9f73ab3a  promtail    2        run      running   1d22h ago  1d22h ago
e79f7f66  7e421864  promtail    2        run      running   1d22h ago  1d22h ago
ef84159a  a0ee4694  promtail    2        run      running   1d22h ago  1d22h ago

Nice. Let’s move on to Loki proper.

Loki

Loki is a distributed application which can be deployed in microservices. In a client’s production environment, we would definitely be designing a highly-available and performant deployment, taking advantage of the Loki microservices by deploying them into a service mesh, but in Hashi@Home that is overkill. We will start off by deploying Loki in monolithic mode – the services will still be micro, but they’ll all be deployed on the same computatoms together.

We will be therefore running a single binary which will need to talk to all of the promtail endpoints to g

Backing storage

As the documentation states:

Grafana Loki is built around the idea of only indexing metadata about your logs: labels (just like Prometheus labels). Log data itself is then compressed and stored in chunks in object stores such as S3 or GCS, or even locally on the filesystem.

It took a while to come to a decision on which object store to use when shipping the logs, once they had been ingested. I had originally wanted to do everything entirely on prem, by deploying a Minio cluster – but I would need to monitor and collect logs for that, so I ended up in a Catch-224.

The driving principle here was to not spend any money, so I needed an reliable object store that was also free to use, not just for storing data, but also didn’t have any egress fees or other lock-in shenaningans. With AWS S3 the obvious choice, it would have inevitably ended up costing me money, and I just wasn’t down for that. A Digital Ocean space would have been ok, I guess. Although not exactly free, and definite not at home, at least the pricing is predictable and simple. Even the Grafana Free Tier would have been a great choice with other priorities – really simple to create a set of servers that I could just shunt data too – but I’m here to learn by doing. One day a client might ask me to do something weird with Loki so I might as well fiddle at home!

Eventually, I settled with a Cloudflare R2 bucket. The bucket was created separately, along with an Access Token and Key ID which were promptly stored in Vault. Loki would need these secrets to eventually authenticate to the R2 bucket and send data.

Job specification

Next, I set about writing the job template. The basis for the job is a service with a single instance of a task running Loki, with a properly provided configuration file. We also define a single variable which allows us to select a Loki version, and give it a high priority since it’s a piece of infrastructure:

variable "loki_version" {
  type = string
  default = "v2.7.5"
}

job "loki" {
  datacenters = ["dc1"]
  type = "service"
  name = "loki"
  priority = 80
  .
  .
  .
}

Since this is quite a crucial service, we want it to recover from failure and be highly available during changes, so we will immediately set a update, migrate and reschedule:

update {
  max_parallel = 2
  health_check = "checks"
  min_healthy_time = "5s"
  healthy_deadline = "300s"
  progress_deadline = "10m"
  auto_revert = true
  auto_promote = true
  canary = 1
}
reschedule {
  interval       = "30m"
  delay          = "30s"
  delay_function = "exponential"
  max_delay      = "1200s"
  unlimited      = true
}

migrate {
  max_parallel     = 2
  health_check     = "checks"
  min_healthy_time = "10s"
  healthy_deadline = "5m"
}

We then define a group, with dynamic port assignment and a service for the HTTP and gRPC endpoints:

group "log-server" {
  count = 1

  network {
    port "http" {}
    port "grpc" {}
  }
  service {
    name = "loki-http-server"
    tags = ["urlprefix-/loki strip=/loki"]
    port = "http"
    on_update = "require_healthy"

    check {
      name = "loki_ready"
      type = "http"
      path = "/ready"
      port = "http"
      interval = "10s"
      timeout = "3s"
    }
  }

  service {
    name = "loki-grpc"
    port = "grpc"
  }
.
.
.
}

Finally, the task definition itself. Since we are using a Vault lookup in the job template, we need to issue the job a Vault token, using a vault stanza in the task definition. I also opted to use the isolated exec driver, assigning a single CPU and 200MB of RAM. The task is provided with a template stanza to deliver the Loki configuration and an artifact stanza using the loki_version variable declared above. As usual, I use the Nomad node attributes to select the correct build artifact from Github Releases. Putting it together, we get:

task "server" {
  driver = "exec"
  vault {
    policies = ["read-only"]
    change_mode = "signal"
    change_signal = "SIGHUP"
  }
  config {
    command = "loki"
    args = [
      "-config.file=local/loki.yml"
    ]
  }
  resources {
    cpu = 1000
    memory = 200
  }
  template {
    data = file("loki.yml.tpl")
    destination = "local/loki.yml"
    change_mode = "restart"
  }
  artifact {
    source = "https://github.com/grafana/loki/releases/download/${var.loki_version}/loki-linux-${attr.cpu.arch}.zip"
    options { # checksum depends on the cpu arch
    }
    destination = "local/loki"
    mode = "file"
  }
}

Cofiguration Template

The configuration file passed to Loki at startup needed to take a few secrets and dynamic variables which made it a perfect candidate for a dynamic template. I hadn’t had a chance yet to really write a good Nomad template with functions and secrets and all, so I was pretty chuffed when the chance came along with Loki.

The first thing I’d need was the aforementioned S3 credentials. At the very start of the template:

{{ with secret "hashiatho.me-v2/loki_logs_bucket" }}

Since Nomad will be assigning ports to the service, the server configuration needed to read runtime variables in order to properly expose the service:

server:
  http_listen_port: {{ env "NOMAD_PORT_http" }}
  grpc_listen_port: {{ env "NOMAD_PORT_grpc" }}

Next, we define the distributor and ingester configuration. For the distribuor we’re going to use the Consul KV store5, and the ingester needs to define the address and port for its lifecycler. We ‘ll use the Nomad runtime variables for this too, looking them up from the environment:

distributor:
  ring:
    kvstore:
      store: consul
      prefix: loki/collectors

ingester:
  lifecycler:
    address: {{ env "NOMAD_IP_http" }}
    port: {{ env "NOMAD_PORT_http" }}
    ring:
      kvstore:
        store: consul
        prefix: loki/collectors
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 5m
  chunk_retain_period: 30s
  flush_op_timeout: 20m

Storage configuration

Here comes the fun part - the storage definition. I can’t say how long it took for me to finally get how this damn thing works, but I finally came to understand that the storage_config key defines a few kinds of storage which can be referred to in other parts of the configuration. In my config, I define three kinds:

  • boltdb_shipper
  • filesystem
  • aws

So, we declare the schema_config key to use the aws store:

schema_config:
  configs:
    - from: 2022-01-01
      store: boltdb-shipper
      object_store: aws
      schema: v11
      index:
        prefix: loki_
        period: 24h

See what I did there? Saved the best for last! The actual AWS configuration was the hardest part, because I needed to consume secrets and understand that the dynamonodb configuration should be set to inmemory, since R2 does not have DynamoDB. Since we declared that the whole template is within a secret, we can make references to that secret’s .Data:

storage_config:
  boltdb_shipper:
    active_index_directory: local/index
    cache_location: local/index_cache
  filesystem:
    directory: local/index
  aws:
    bucketnames: hah-logs
    endpoint: {{ .Data.data.account_id }}.r2.cloudflarestorage.com
    region: auto
    access_key_id: {{ .Data.data.access_key_id }}
    secret_access_key: {{ .Data.data.secret_access_key }}
    insecure: false
    sse_encryption: false
    http_config:
      idle_conn_timeout: 90s
      insecure_skip_verify: false
    s3forcepathstyle: true
    dynamodb:
      dynamodb_url: inmemory

Loki deployment

Finally we can deploy Loki:

nomad plan loki.nomad
+ Job: "loki"
+ Task Group: "log-server" (1 create)
  + Task: "server" (forces create)

Scheduler dry-run:
- All tasks successfully allocated.

Job Modify Index: 0
To submit the job with version verification run:

nomad job run -check-index 0 loki.nomad

When running the job with the check-index flag, the job will only be run if the
job modify index given matches the server-side version. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.

Wrapping up

So, we have a logging layer implemented. Using Promtail to serve logs, and Loki to consume them, shipping them to a Cloudflare R2 bucket. We’ve used the Nomad runtime variables to render configuration templates for promtail and loki, as well as the Vault lookup features of the Consul template language in order to inject secrets into the job configuration instead of writing them in the job definition.

There are a definitely a few things we can improve on here, especially in terms of log discovery, and consuming the logs of Nomad jobs themselves. Sooner or later I’ll get around to splitting up the Loki deployment into groups of independent jobs using the microservices mode and f course we still need to visualise the logs with Grafana. For now however, it’s a pretty good startThe log s I collect on Hashi@Home are safely stored in an object store, which I can later attach to whatever I want to use to view and analyse them. Happy Days!


Footnotes and References

  1. One may imagine several log aggregators, serving partitions. We only define one, but where there are multiple datacentres in a hybrid setup, we might want to consider making promtail targets exposing data only to specific Loki instances. 

  2. This trick is decidedly less elegant when running the jobs in Docker containers, since they don’t see the Fabio service running in localhost, another reason to love Nomad and use simple execution models. 

  3. Nomad’s documentation mentions the log shipper sidecar pattern. I guess one could collect logs and ship them to loki like that. 

  4. To be perfecly honest, I also codn’t really rely on Minio deployed on Hashi@Home, since the hardware isn’t reliable enough. 

  5. I mean - who would want to give up an opportunity to use Consul, it’s the bees knees. 

Hashi@Home is personal side-project by brucellino. It's ok to watch, but don't touch. Get your own damn side project.