Bug 2017882 - multus: add handling of pod UIDs passed from runtime
Summary: multus: add handling of pod UIDs passed from runtime
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.9
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.10.0
Assignee: Douglas Smith
QA Contact: Weibin Liang
Depends On:
Blocks: 2017881
TreeView+ depends on / blocked
Reported: 2021-10-27 15:56 UTC by Douglas Smith
Modified: 2022-03-10 16:23 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: pod UID is passed via K8S_POD_UID in CRIO Consequence: Adds additional meta data to use in cases where pods are deleted and recreated while Multus was attempting to set up networking for the old pod instance's sandbox. Fix: Handle pod UID via multus. Result: Pointless processing is avoided in cases where pods are deleted and recreated while Multus was attempting to set up networking for the old pod instance's sandbox.
Clone Of: 2017881
Last Closed: 2022-03-10 16:22:54 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift multus-cni pull 112 0 None open Bug 2017882: Upstream sync (includes handling for pod UIDs passed from runtime) 2021-11-12 20:46:30 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:23:10 UTC

Description Douglas Smith 2021-10-27 15:56:40 UTC
+++ This bug was initially created as a clone of Bug #2017881 +++

From upstream PR: https://github.com/k8snetworkplumbingwg/multus-cni/pull/742

"If the runtime passes a pod UID via K8S_POD_UID (which both CRIO and
containerd do as of mid-2021) then fail if the pod we get from the
Kube API has a different UID. This would indicate that the pod was
deleted and recreated while Multus was attempting to set up
networking for the old pod instance's sandbox, and it's pointless
to continue setting up a sandbox for a dead pod instance."

Comment 2 Douglas Smith 2021-11-19 14:27:56 UTC
Looks like this had some unintended fallout which appears to be primarily related to throwing an error on the CNI DEL path.

Comment 4 Douglas Smith 2021-12-01 20:30:27 UTC
To verify this BZ, it was suggested by Dan Williams that we check to see that the POD_UID is passed to Multus, this can be accomplished by:

1. Enabling logging on a node: https://github.com/k8snetworkplumbingwg/multus-cni/blob/master/docs/configuration.md#logging 

oc debug node/foo
chroot /host
vi /etc/kubernetes/cni/net.d

Add logging set to debug level to the configuration.

2. Start up a pod, but use a node selector.

First add a label to a node 

oc label nodes node-name-here multus=debug

Use the Multus quickstart: https://github.com/k8snetworkplumbingwg/multus-cni/blob/master/docs/quickstart.md

Just create the net-attach-def and use the pod definition (NOTE: make sure to make the master interface match the interface on the host, or use a different net-attach-def, say bridge)

Then... add the node selector to the pod: 

    multus: "debug"

3. Start the pod... then `oc debug` the node and `chroot /host`

Grep through the multus log file on disk, and look for "POD_UID", if that string is present and appears set, the bug is verified.

Comment 5 Douglas Smith 2021-12-01 20:37:31 UTC
Correction: vi /etc/kubernetes/cni/net.d/00-multus.conf

Comment 7 Nikhil Simha 2021-12-03 00:42:41 UTC
I'm not seeing the string "POD_UID" in the logs. 

My steps are mostly the same as yours, except the following that may or may not be different:

1. I set the "LogFile": "/var/logs/multus.log" instead of using STDERR logs. 

2. I set the nodeSelector for the pod in the pod definition itself, rather than add it to a Daemonset or something similar.

These aren't huge deviations, but worth noting.

Also worth noting, the command:

`cat /var/logs/multus.log | grep POD_UID`

returns many results with `K8S_POD_UID=some_value_here`.

But running:

`cat /var/logs/multus.log | grep -v K8S_POD_UID | grep POD_UID` 
`cat /var/log/multus.log | grep POD_UID | grep -v K8S_POD_UID`

returns nothing.

Additional details: 
- trying the steps from scratch with bridge-cni instead had the same results.
- the pod's node selector is working as intended
- bridge-cni and macvlan-conf were both attached to the pod as intended in each case.

Let me know if I missed anything or did a step wrong, thank you.

Comment 8 Douglas Smith 2021-12-03 13:41:39 UTC
Actually Nikhil, this is great news, all the steps performed as right on.

The K8S_POD_UID is the proper full name of the value! We can consider this verified.

The CRIO change that introduces pod UID is found in the CRIO release notes @ https://cri-o.github.io/cri-o/v1.21.2.html

Thank you!

Comment 11 errata-xmlrpc 2022-03-10 16:22:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.