2017882 – multus: add handling of pod UIDs passed from runtime

Bug 2017882 - multus: add handling of pod UIDs passed from runtime

Summary: multus: add handling of pod UIDs passed from runtime

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Douglas Smith
QA Contact:	Weibin Liang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2017881
TreeView+	depends on / blocked

Reported:	2021-10-27 15:56 UTC by Douglas Smith
Modified:	2022-03-10 16:23 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: pod UID is passed via K8S_POD_UID in CRIO Consequence: Adds additional meta data to use in cases where pods are deleted and recreated while Multus was attempting to set up networking for the old pod instance's sandbox. Fix: Handle pod UID via multus. Result: Pointless processing is avoided in cases where pods are deleted and recreated while Multus was attempting to set up networking for the old pod instance's sandbox.
Clone Of:	2017881
Environment:
Last Closed:	2022-03-10 16:22:54 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift multus-cni pull 112	0	None	open	Bug 2017882: Upstream sync (includes handling for pod UIDs passed from runtime)	2021-11-12 20:46:30 UTC
Red Hat Product Errata	RHSA-2022:0056	0	None	None	None	2022-03-10 16:23:10 UTC

Description Douglas Smith 2021-10-27 15:56:40 UTC

+++ This bug was initially created as a clone of Bug #2017881 +++

From upstream PR: https://github.com/k8snetworkplumbingwg/multus-cni/pull/742

"If the runtime passes a pod UID via K8S_POD_UID (which both CRIO and
containerd do as of mid-2021) then fail if the pod we get from the
Kube API has a different UID. This would indicate that the pod was
deleted and recreated while Multus was attempting to set up
networking for the old pod instance's sandbox, and it's pointless
to continue setting up a sandbox for a dead pod instance."

Comment 2 Douglas Smith 2021-11-19 14:27:56 UTC

Looks like this had some unintended fallout which appears to be primarily related to throwing an error on the CNI DEL path.

Comment 4 Douglas Smith 2021-12-01 20:30:27 UTC

To verify this BZ, it was suggested by Dan Williams that we check to see that the POD_UID is passed to Multus, this can be accomplished by:

1. Enabling logging on a node: https://github.com/k8snetworkplumbingwg/multus-cni/blob/master/docs/configuration.md#logging 


```
oc debug node/foo
chroot /host
vi /etc/kubernetes/cni/net.d
```

Add logging set to debug level to the configuration.

2. Start up a pod, but use a node selector.

First add a label to a node 


```
oc label nodes node-name-here multus=debug
```

Use the Multus quickstart: https://github.com/k8snetworkplumbingwg/multus-cni/blob/master/docs/quickstart.md

Just create the net-attach-def and use the pod definition (NOTE: make sure to make the master interface match the interface on the host, or use a different net-attach-def, say bridge)

Then... add the node selector to the pod: 

```
spec: 
  [...]
  nodeSelector:
    multus: "debug"
```


3. Start the pod... then `oc debug` the node and `chroot /host`

Grep through the multus log file on disk, and look for "POD_UID", if that string is present and appears set, the bug is verified.

Comment 5 Douglas Smith 2021-12-01 20:37:31 UTC

Correction: vi /etc/kubernetes/cni/net.d/00-multus.conf

Comment 7 Nikhil Simha 2021-12-03 00:42:41 UTC

I'm not seeing the string "POD_UID" in the logs. 

My steps are mostly the same as yours, except the following that may or may not be different:

1. I set the "LogFile": "/var/logs/multus.log" instead of using STDERR logs. 

2. I set the nodeSelector for the pod in the pod definition itself, rather than add it to a Daemonset or something similar.

These aren't huge deviations, but worth noting.

Also worth noting, the command:

`cat /var/logs/multus.log | grep POD_UID`

returns many results with `K8S_POD_UID=some_value_here`.

But running:

`cat /var/logs/multus.log | grep -v K8S_POD_UID | grep POD_UID` 
or 
`cat /var/log/multus.log | grep POD_UID | grep -v K8S_POD_UID`

returns nothing.

Additional details: 
- trying the steps from scratch with bridge-cni instead had the same results.
- the pod's node selector is working as intended
- bridge-cni and macvlan-conf were both attached to the pod as intended in each case.

Let me know if I missed anything or did a step wrong, thank you.

Comment 8 Douglas Smith 2021-12-03 13:41:39 UTC

Actually Nikhil, this is great news, all the steps performed as right on.

The K8S_POD_UID is the proper full name of the value! We can consider this verified.

The CRIO change that introduces pod UID is found in the CRIO release notes @ https://cri-o.github.io/cri-o/v1.21.2.html

Thank you!

Comment 11 errata-xmlrpc 2022-03-10 16:22:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Note You need to log in before you can comment on or make changes to this bug.