Use Datadog to Monitor Your Cluster Build by Rke
There are many tools you can choose when you want to build your kubernetes cluster, we use Rancher Kubernetes Engine (RKE) to build our kubernetes cluster.
We run datadog as daemonset in our cluster, and datadog has auto discovery feature to discovery pods/containers need to check. When we deployed a redis database, datadog will notice that and run checks against the redis pods, we didn't need to do any configurations.
Datadog auto discovery also supports core kubernetes components, like APIServer, KubeScheduler, KubeProxy, etc. But when you setup you cluster by using RKE, you will find the auto discovery didn't work for these components.
The auto discovery feature for these core components relies on autodiscovery container identifiers(ad_identifiers), the image name or image short name need to match the default ad_identifiers
settings for these components. Unfortunately, rancher uses rancher/hyperkube
to build most of the core components, they all have the same image name.
The ad_identifiers
also support to set to a container label, but that will need use to rebuild the container image to add the label, it's a mission impossible too. After some tests, I found the way to run checks against these containers by use annotations.
Datadog supports us to use annotations to notify datadog that we need to run check on some urls.
apiVersion: v1
kind: Pod
# (...)
metadata:
name: '<POD_NAME>'
annotations:
ad.datadoghq.com/<CONTAINER_IDENTIFIER>.check_names: '[<INTEGRATION_NAME>]'
ad.datadoghq.com/<CONTAINER_IDENTIFIER>.init_configs: '[<INIT_CONFIG>]'
ad.datadoghq.com/<CONTAINER_IDENTIFIER>.instances: '[<INSTANCE_CONFIG>]'
# (...)
spec:
containers:
- name: '<CONTAINER_IDENTIFIER>'
# (...)
Here is an example for apache. Did you see the "url": "http://%%host%%/website_1"
in the instances
settings? You can imagine that what will happen if we change this url to a service exposed by kubernetes.
apiVersion: v1
kind: Pod
metadata:
name: apache
annotations:
ad.datadoghq.com/apache.check_names: '["apache","http_check"]'
ad.datadoghq.com/apache.init_configs: '[{},{}]'
ad.datadoghq.com/apache.instances: |
[
[
{
"apache_status_url": "http://%%host%%/server-status?auto"
}
],
[
{
"name": "<WEBSITE_1>",
"url": "http://%%host%%/website_1",
"timeout": 1
},
{
"name": "<WEBSITE_2>",
"url": "http://%%host%%/website_2",
"timeout": 1
}
]
]
labels:
name: apache
spec:
containers:
- name: apache
image: httpd
ports:
- containerPort: 80
Actually, datadog didn't care about you container. It only cares about settings you put in the annotations. I use this feature
to add checks to my RKE built cluster.
Here is an example for monitoring components runs on controlplan. Don't forget to allow your datadog daemonset run on your master nodes first. And please take notice about the tolerations
and nodeSelector
I added in the yaml.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: controlplane-monitor
spec:
selector:
matchLabels:
name: controlplane-monitor
template:
metadata:
labels:
name: controlplane-monitor
annotations:
ad.datadoghq.com/kube-scheduler.check_names: '["kube_scheduler"]'
ad.datadoghq.com/kube-scheduler.init_configs: '[{}]'
ad.datadoghq.com/kube-scheduler.instances: |-
[{"prometheus_url": "http://%%host%%:10251/metrics", "leader_election": "true"}]
ad.datadoghq.com/kube-controller-manager.check_names: '["kube_controller_manager"]'
ad.datadoghq.com/kube-controller-manager.init_configs: '[{}]'
ad.datadoghq.com/kube-controller-manager.instances: |-
[{"prometheus_url": "http://%%host%%:10251/metrics", "leader_election": "true"}]
ad.datadoghq.com/kube-apiserver.check_names: '["kube_apiserver_metrics"]'
ad.datadoghq.com/kube-apiserver.init_configs: '[{}]'
ad.datadoghq.com/kube-apiserver.instances: |-
[{"prometheus_url": "https://%%host%%:6443/metrics", "tls_ca_cert":"/etc/kubernetes/ssl/kube-ca.pem"}]
spec:
hostNetwork: true
nodeSelector:
"node-role.kubernetes.io/controlplane": "true"
tolerations:
- key: "node-role.kubernetes.io/controlplane"
value: "true"
effect: "NoSchedule"
restartPolicy: Always
terminationGracePeriodSeconds: 0
containers:
- image: busybox
command:
- sleep
- infinity
name: kube-scheduler
- image: busybox
command:
- sleep
- infinity
name: kube-controller-manager
- image: busybox
command:
- sleep
- infinity
name: kube-apiserver