Use Datadog to Monitor Your Cluster Build by Rke
#Rke #Kubernetes #K8s #Datadog
There are many tools you can choose when you want to build your kubernetes cluster, we use Rancher Kubernetes Engine (RKE) to build our kubernetes cluster.
We run datadog as daemonset in our cluster, and datadog has auto discovery feature to discovery pods/containers need to check. When we deployed a redis database, datadog will notice that and run checks against the redis pods, we didn't need to do any configurations.
Datadog auto discovery also supports core kubernetes components, like APIServer, KubeScheduler, KubeProxy, etc. But when you setup you cluster by using RKE, you will find the auto discovery didn't work for these components.
The auto discovery feature for these core components relies on autodiscovery container identifiers(ad_identifiers), the image name or image short name need to match the default ad_identifiers
settings for these components. Unfortunately, rancher uses rancher/hyperkube
to build most of the core components, they all have the same image name.
The ad_identifiers
also support to set to a container label, but that will need use to rebuild the container image to add the label, it's a mission impossible too. After some tests, I found the way to run checks against these containers by use annotations.
Datadog supports us to use annotations to notify datadog that we need to run check on some urls.
1apiVersion: v1
2kind: Pod
3# (...)
4metadata:
5 name: '<POD_NAME>'
6 annotations:
7 ad.datadoghq.com/<CONTAINER_IDENTIFIER>.check_names: '[<INTEGRATION_NAME>]'
8 ad.datadoghq.com/<CONTAINER_IDENTIFIER>.init_configs: '[<INIT_CONFIG>]'
9 ad.datadoghq.com/<CONTAINER_IDENTIFIER>.instances: '[<INSTANCE_CONFIG>]'
10 # (...)
11spec:
12 containers:
13 - name: '<CONTAINER_IDENTIFIER>'
14# (...)
Here is an example for apache. Did you see the "url": "http://%%host%%/website_1"
in the instances
settings? You can imagine that what will happen if we change this url to a service exposed by kubernetes.
1apiVersion: v1
2kind: Pod
3metadata:
4 name: apache
5 annotations:
6 ad.datadoghq.com/apache.check_names: '["apache","http_check"]'
7 ad.datadoghq.com/apache.init_configs: '[{},{}]'
8 ad.datadoghq.com/apache.instances: |
9 [
10 [
11 {
12 "apache_status_url": "http://%%host%%/server-status?auto"
13 }
14 ],
15 [
16 {
17 "name": "<WEBSITE_1>",
18 "url": "http://%%host%%/website_1",
19 "timeout": 1
20 },
21 {
22 "name": "<WEBSITE_2>",
23 "url": "http://%%host%%/website_2",
24 "timeout": 1
25 }
26 ]
27 ]
28 labels:
29 name: apache
30spec:
31 containers:
32 - name: apache
33 image: httpd
34 ports:
35 - containerPort: 80
Actually, datadog didn't care about you container. It only cares about settings you put in the annotations. I use this feature
to add checks to my RKE built cluster.
Here is an example for monitoring components runs on controlplan. Don't forget to allow your datadog daemonset run on your master nodes first. And please take notice about the tolerations
and nodeSelector
I added in the yaml.
1apiVersion: apps/v1
2kind: DaemonSet
3metadata:
4 name: controlplane-monitor
5spec:
6 selector:
7 matchLabels:
8 name: controlplane-monitor
9 template:
10 metadata:
11 labels:
12 name: controlplane-monitor
13 annotations:
14 ad.datadoghq.com/kube-scheduler.check_names: '["kube_scheduler"]'
15 ad.datadoghq.com/kube-scheduler.init_configs: '[{}]'
16 ad.datadoghq.com/kube-scheduler.instances: |-
17 [{"prometheus_url": "http://%%host%%:10251/metrics", "leader_election": "true"}]
18
19 ad.datadoghq.com/kube-controller-manager.check_names: '["kube_controller_manager"]'
20 ad.datadoghq.com/kube-controller-manager.init_configs: '[{}]'
21 ad.datadoghq.com/kube-controller-manager.instances: |-
22 [{"prometheus_url": "http://%%host%%:10251/metrics", "leader_election": "true"}]
23
24 ad.datadoghq.com/kube-apiserver.check_names: '["kube_apiserver_metrics"]'
25 ad.datadoghq.com/kube-apiserver.init_configs: '[{}]'
26 ad.datadoghq.com/kube-apiserver.instances: |-
27 [{"prometheus_url": "https://%%host%%:6443/metrics", "tls_ca_cert":"/etc/kubernetes/ssl/kube-ca.pem"}]
28
29 spec:
30 hostNetwork: true
31 nodeSelector:
32 "node-role.kubernetes.io/controlplane": "true"
33 tolerations:
34 - key: "node-role.kubernetes.io/controlplane"
35 value: "true"
36 effect: "NoSchedule"
37 restartPolicy: Always
38 terminationGracePeriodSeconds: 0
39 containers:
40 - image: busybox
41 command:
42 - sleep
43 - infinity
44 name: kube-scheduler
45 - image: busybox
46 command:
47 - sleep
48 - infinity
49 name: kube-controller-manager
50 - image: busybox
51 command:
52 - sleep
53 - infinity
54 name: kube-apiserver