Description of problem:
Currently, on the OpenShift worker nodes, kubelet executes an instance of kuryr CNI for each addNetwork and delNetwork operation that it needs. This means that if 300 pods get scheduled on a node, that node will start 30 processes of the Kuryr CNI executable which imply:
a) 30 separate new SSL connections to the OpenShift API
b) 30 different selector watches to the OpenShift API to wait for the VIF to appear in the pod
c) Deletion needs to go to the OpenShift API as well since its a new process with no in-memory information.
This Feature consists on implementing a lightweight CNI executable that just passes the request along to a CNI Daemon that is managed as a DaemonSet by OpenShift. (NOTE: The Kubernetes upstream community is considering adding support for CNI daemons and having the communication between Kubelet and the CNI Daemons happen over gRPC. This would mean we could eventually drop the small executable).
Having the CNI daemon will allow each worker node to have a single watch for Pods selecting on the basis of the scheduled host. This should help a lot with resource usage and with latency not only on the addNetwork flow, but even more on the delNetwork flow since it will already have all the information it needs without having to perform GETs on the Kubernetes API.
This RFE also opens the door for having the Pools be managed on the CNI Daemon side, which means that the Kuryr controller would be responsible for creating and deleting vifs from the pool, but assigning them to a Pod would be up to the CNI Daemon. Effectively, this would lead to near-instantaneous addNetwork operations since there would not be waiting for the controller to place the vif annotation on a pod.
Steps to Reproduce:
1. Set Kuryr options for OpenShift-Ansible
2. ansible-playbook playbooks/byo/config.yml -vvv
3. oc -n kube-system get ds
Kuryr daemonset is present and running a process, not just copying the executable to the host.
Verified in version openstack-kuryr-kubernetes-cni-0.4.3-1.el7ost.noarch from puddle 20180502.1.
daemon_enabled is set to True in kuryr-config configmap. From 'oc -n openshift-infra get cm kuryr-config -o yaml':
# From kuryr_kubernetes
# Enable CNI Daemon configuration. (boolean value)
daemon_enabled = true
Kuryr daemon processes are running:
$ oc -n openshift-infra get ds kuryr-cni-ds
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kuryr-cni-ds 4 4 4 4 4 <none> 44m
Running processes inside kuryr-cni-ds pods:
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 13:43 ? 00:00:00 /bin/bash -x /usr/bin/cni_ds_init
root 17 1 0 13:43 ? 00:00:29 kuryr-daemon: master process [/usr/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf]
root 29 17 0 13:43 ? 00:00:00 kuryr-daemon: master process [/usr/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf]
root 33 17 0 13:43 ? 00:00:00 kuryr-daemon: watcher worker(0)
root 34 17 0 13:43 ? 00:00:00 kuryr-daemon: server worker(0)
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.