Bug 1504013

Summary: [RFE] CNI Daemon
Product: Red Hat OpenStack Reporter: Antoni Segura Puimedon <asegurap>
Component: openstack-kuryr-kubernetesAssignee: MichaƂ Dulko <mdulko>
Status: CLOSED ERRATA QA Contact: Jon Uriarte <juriarte>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: asegurap, jschluet, juriarte, lpeer, sgordon
Target Milestone: Upstream M2Keywords: FutureFeature, Triaged
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: openstack-kuryr-kubernetes-0.4.2-0.20180322192255.138c253.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 13:37:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Antoni Segura Puimedon 2017-10-19 09:57:22 UTC
Description of problem:
Currently, on the OpenShift worker nodes, kubelet executes an instance of kuryr CNI for each addNetwork and delNetwork operation that it needs. This means that if 300 pods get scheduled on a node, that node will start 30 processes of the Kuryr CNI executable which imply:

a) 30 separate new SSL connections to the OpenShift API
b) 30 different selector watches to the OpenShift API to wait for the VIF to appear in the pod
c) Deletion needs to go to the OpenShift API as well since its a new process with no in-memory information.

This Feature consists on implementing a lightweight CNI executable that just passes the request along to a CNI Daemon that is managed as a DaemonSet by OpenShift. (NOTE: The Kubernetes upstream community is considering adding support for CNI daemons and having the communication between Kubelet and the CNI Daemons happen over gRPC. This would mean we could eventually drop the small executable).

Having the CNI daemon will allow each worker node to have a single watch for Pods selecting on the basis of the scheduled host. This should help a lot with resource usage and with latency not only on the addNetwork flow, but even more on the delNetwork flow since it will already have all the information it needs without having to perform GETs on the Kubernetes API.

This RFE also opens the door for having the Pools be managed on the CNI Daemon side, which means that the Kuryr controller would be responsible for creating and deleting vifs from the pool, but assigning them to a Pod would be up to the CNI Daemon. Effectively, this would lead to near-instantaneous addNetwork operations since there would not be waiting for the controller to place the vif annotation on a pod.

Steps to Reproduce:
1. Set Kuryr options for OpenShift-Ansible
2. ansible-playbook playbooks/byo/config.yml -vvv
3. oc -n kube-system get ds

Actual results:
Not implemented

Expected results:
Kuryr daemonset is present and running a process, not just copying the executable to the host.

Comment 10 Jon Uriarte 2018-05-04 14:47:59 UTC
Verified in version openstack-kuryr-kubernetes-cni-0.4.3-1.el7ost.noarch from puddle 20180502.1.

daemon_enabled is set to True in kuryr-config configmap. From 'oc -n openshift-infra get cm kuryr-config -o yaml':

    # From kuryr_kubernetes

    # Enable CNI Daemon configuration. (boolean value)
    daemon_enabled = true

Kuryr daemon processes are running:
$ oc -n openshift-infra get ds kuryr-cni-ds
kuryr-cni-ds   4         4         4         4            4           <none>          44m

Running processes inside kuryr-cni-ds pods:
root         1     0  0 13:43 ?        00:00:00 /bin/bash -x /usr/bin/cni_ds_init
root        17     1  0 13:43 ?        00:00:29 kuryr-daemon: master process [/usr/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf]
root        29    17  0 13:43 ?        00:00:00 kuryr-daemon: master process [/usr/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf]
root        33    17  0 13:43 ?        00:00:00 kuryr-daemon: watcher worker(0)
root        34    17  0 13:43 ?        00:00:00 kuryr-daemon: server worker(0)

Comment 12 errata-xmlrpc 2018-06-27 13:37:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.