Prometheus on K8s

08 Mar 2021 » system configuration, sysadmin, metrics

You can’t manage what you don’t track. Elk is great for my non kubernetes servers, my hardware and for the ilos and for visuzliaing events in my homelab environment, but I want to use prometheus for monitoring events within kubernetes across pods and nodes.

Logs vs Metrics

A log is an event that happened and a metric is a measurement of the health of a system.

What are Logs?

A log message is a system generated set of data when an event has happened to describe the event.

What are Metrics?

A metric is about a specific point in time for the system.

Setting up Prometheus and Grafana for K8s

I basically followed the Github Repo for setting this up.

The documentation dives into jsonnet, but I was able to copy down the .tar.gz 0.7.0 release and run via the quickstart just fine.

# Create the namespace and CRDs, and then wait for them to be availble before creating the remaining resources
kubectl create -f manifests/setup
until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
kubectl create -f manifests/

Making the note for the manifests/setup/*-service.yaml files I added type: LoadBalancer to the services so they picked up a LoadBalancer IP from metallb. This way they were able to pick up external-ips. Of note when I say “external” they are ips that are on the local network but the services are now accessible from outside the kubernetes cluster.

NAME                    TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
alertmanager-main       LoadBalancer    9093:30466/TCP               3h10m
grafana                 LoadBalancer    3000:32017/TCP               3h10m
prometheus-adapter      LoadBalancer    443:31764/TCP                3h10m
prometheus-k8s          LoadBalancer    9090:31696/TCP               3h9m

Next Steps

After I set this up for metrics, there were a few things that came to my mind and I have added to the list of things I would like to implement. These items include:

  • Setting up rsyslog for servers diving deeper into the ELK stack and logging
  • Better ELK Stack logging visualization and tracking for servers/ I know logging cleaning needs to be fixed.
  • Diving deeper into Grafana and alerting
    • I’d like to get alerting working for Matrix
