1504013 – [RFE] CNI Daemon

Bug 1504013 - [RFE] CNI Daemon

Summary: [RFE] CNI Daemon

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-kuryr-kubernetes
Sub Component:
Version:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	Upstream M2
Target Release:	13.0 (Queens)
Assignee:	Michał Dulko
QA Contact:	Jon Uriarte
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-10-19 09:57 UTC by Antoni Segura Puimedon
Modified:	2018-06-27 13:38 UTC (History)
CC List:	5 users (show)
Fixed In Version:	openstack-kuryr-kubernetes-0.4.2-0.20180322192255.138c253.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-06-27 13:37:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
OpenStack gerrit	480028	0	None	master: ABANDONED	kuryr-kubernetes: CNI split - introducing CNI daemon (I129958aaf420f7b7294861537786cde6126e679c)	2018-04-04 12:40:40 UTC
Red Hat Product Errata	RHEA-2018:2086	0	None	None	None	2018-06-27 13:38:42 UTC

Description Antoni Segura Puimedon 2017-10-19 09:57:22 UTC

Description of problem:
Currently, on the OpenShift worker nodes, kubelet executes an instance of kuryr CNI for each addNetwork and delNetwork operation that it needs. This means that if 300 pods get scheduled on a node, that node will start 30 processes of the Kuryr CNI executable which imply:

a) 30 separate new SSL connections to the OpenShift API
b) 30 different selector watches to the OpenShift API to wait for the VIF to appear in the pod
c) Deletion needs to go to the OpenShift API as well since its a new process with no in-memory information.

This Feature consists on implementing a lightweight CNI executable that just passes the request along to a CNI Daemon that is managed as a DaemonSet by OpenShift. (NOTE: The Kubernetes upstream community is considering adding support for CNI daemons and having the communication between Kubelet and the CNI Daemons happen over gRPC. This would mean we could eventually drop the small executable).

Having the CNI daemon will allow each worker node to have a single watch for Pods selecting on the basis of the scheduled host. This should help a lot with resource usage and with latency not only on the addNetwork flow, but even more on the delNetwork flow since it will already have all the information it needs without having to perform GETs on the Kubernetes API.

This RFE also opens the door for having the Pools be managed on the CNI Daemon side, which means that the Kuryr controller would be responsible for creating and deleting vifs from the pool, but assigning them to a Pod would be up to the CNI Daemon. Effectively, this would lead to near-instantaneous addNetwork operations since there would not be waiting for the controller to place the vif annotation on a pod.

Steps to Reproduce:
1. Set Kuryr options for OpenShift-Ansible
2. ansible-playbook playbooks/byo/config.yml -vvv
3. oc -n kube-system get ds

Actual results:
Not implemented

Expected results:
Kuryr daemonset is present and running a process, not just copying the executable to the host.

Comment 10 Jon Uriarte 2018-05-04 14:47:59 UTC

Verified in version openstack-kuryr-kubernetes-cni-0.4.3-1.el7ost.noarch from puddle 20180502.1.

daemon_enabled is set to True in kuryr-config configmap. From 'oc -n openshift-infra get cm kuryr-config -o yaml':
    ...
    [cni_daemon]

    #
    # From kuryr_kubernetes
    #

    # Enable CNI Daemon configuration. (boolean value)
    daemon_enabled = true
    ...

Kuryr daemon processes are running:
$ oc -n openshift-infra get ds kuryr-cni-ds
NAME           DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
kuryr-cni-ds   4         4         4         4            4           <none>          44m

Running processes inside kuryr-cni-ds pods:
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 13:43 ?        00:00:00 /bin/bash -x /usr/bin/cni_ds_init
root        17     1  0 13:43 ?        00:00:29 kuryr-daemon: master process [/usr/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf]
root        29    17  0 13:43 ?        00:00:00 kuryr-daemon: master process [/usr/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf]
root        33    17  0 13:43 ?        00:00:00 kuryr-daemon: watcher worker(0)
root        34    17  0 13:43 ?        00:00:00 kuryr-daemon: server worker(0)

Comment 12 errata-xmlrpc 2018-06-27 13:37:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086

Note You need to log in before you can comment on or make changes to this bug.