You can’t manage what you don’t track. Elk is great for my non kubernetes servers, my hardware and for the ilos and for visuzliaing events in my homelab environment, but I want to use prometheus for monitoring events within kubernetes across pods and nodes.
Logs vs Metrics
A log is an event that happened and a metric is a measurement of the health of a system.
What are Logs?
A log message is a system generated set of data when an event has happened to describe the event.
What are Metrics?
A metric is about a specific point in time for the system.
Setting up Prometheus and Grafana for K8s
I basically followed the Github Repo for setting this up.
The documentation dives into jsonnet, but I was able to copy down the .tar.gz 0.7.0 release and run via the quickstart just fine.
# Create the namespace and CRDs, and then wait for them to be availble before creating the remaining resources
kubectl create -f manifests/setup
until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
kubectl create -f manifests/
Making the note for the manifests/setup/*-service.yaml
files I added type: LoadBalancer
to the services so they picked up a LoadBalancer IP from metallb. This way they were able to pick up external-ips.
Of note when I say “external” they are ips that are on the local network but the services are now accessible from outside the kubernetes cluster.
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main LoadBalancer 10.111.75.89 10.0.1.205 9093:30466/TCP 3h10m
grafana LoadBalancer 10.103.23.188 10.0.1.203 3000:32017/TCP 3h10m
prometheus-adapter LoadBalancer 10.109.195.51 10.0.1.206 443:31764/TCP 3h10m
prometheus-k8s LoadBalancer 10.103.109.85 10.0.1.204 9090:31696/TCP 3h9m
Next Steps
After I set this up for metrics, there were a few things that came to my mind and I have added to the list of things I would like to implement. These items include:
- Setting up rsyslog for servers diving deeper into the ELK stack and logging
- Better ELK Stack logging visualization and tracking for servers/ I know logging cleaning needs to be fixed.
- Diving deeper into Grafana and alerting
- I’d like to get alerting working for Matrix