Bug 2017882

Summary: multus: add handling of pod UIDs passed from runtime
Product: OpenShift Container Platform Reporter: Douglas Smith <dosmith>
Component: NetworkingAssignee: Douglas Smith <dosmith>
Networking sub component: multus QA Contact: Weibin Liang <weliang>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: nsimha, weliang
Version: 4.9   
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: pod UID is passed via K8S_POD_UID in CRIO Consequence: Adds additional meta data to use in cases where pods are deleted and recreated while Multus was attempting to set up networking for the old pod instance's sandbox. Fix: Handle pod UID via multus. Result: Pointless processing is avoided in cases where pods are deleted and recreated while Multus was attempting to set up networking for the old pod instance's sandbox.
Story Points: ---
Clone Of: 2017881 Environment:
Last Closed: 2022-03-10 16:22:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2017881    

Description Douglas Smith 2021-10-27 15:56:40 UTC
+++ This bug was initially created as a clone of Bug #2017881 +++

From upstream PR: https://github.com/k8snetworkplumbingwg/multus-cni/pull/742

"If the runtime passes a pod UID via K8S_POD_UID (which both CRIO and
containerd do as of mid-2021) then fail if the pod we get from the
Kube API has a different UID. This would indicate that the pod was
deleted and recreated while Multus was attempting to set up
networking for the old pod instance's sandbox, and it's pointless
to continue setting up a sandbox for a dead pod instance."

Comment 2 Douglas Smith 2021-11-19 14:27:56 UTC
Looks like this had some unintended fallout which appears to be primarily related to throwing an error on the CNI DEL path.

Comment 4 Douglas Smith 2021-12-01 20:30:27 UTC
To verify this BZ, it was suggested by Dan Williams that we check to see that the POD_UID is passed to Multus, this can be accomplished by:

1. Enabling logging on a node: https://github.com/k8snetworkplumbingwg/multus-cni/blob/master/docs/configuration.md#logging 


```
oc debug node/foo
chroot /host
vi /etc/kubernetes/cni/net.d
```

Add logging set to debug level to the configuration.

2. Start up a pod, but use a node selector.

First add a label to a node 


```
oc label nodes node-name-here multus=debug
```

Use the Multus quickstart: https://github.com/k8snetworkplumbingwg/multus-cni/blob/master/docs/quickstart.md

Just create the net-attach-def and use the pod definition (NOTE: make sure to make the master interface match the interface on the host, or use a different net-attach-def, say bridge)

Then... add the node selector to the pod: 

```
spec: 
  [...]
  nodeSelector:
    multus: "debug"
```


3. Start the pod... then `oc debug` the node and `chroot /host`

Grep through the multus log file on disk, and look for "POD_UID", if that string is present and appears set, the bug is verified.

Comment 5 Douglas Smith 2021-12-01 20:37:31 UTC
Correction: vi /etc/kubernetes/cni/net.d/00-multus.conf

Comment 7 Nikhil Simha 2021-12-03 00:42:41 UTC
I'm not seeing the string "POD_UID" in the logs. 

My steps are mostly the same as yours, except the following that may or may not be different:

1. I set the "LogFile": "/var/logs/multus.log" instead of using STDERR logs. 

2. I set the nodeSelector for the pod in the pod definition itself, rather than add it to a Daemonset or something similar.

These aren't huge deviations, but worth noting.

Also worth noting, the command:

`cat /var/logs/multus.log | grep POD_UID`

returns many results with `K8S_POD_UID=some_value_here`.

But running:

`cat /var/logs/multus.log | grep -v K8S_POD_UID | grep POD_UID` 
or 
`cat /var/log/multus.log | grep POD_UID | grep -v K8S_POD_UID`

returns nothing.

Additional details: 
- trying the steps from scratch with bridge-cni instead had the same results.
- the pod's node selector is working as intended
- bridge-cni and macvlan-conf were both attached to the pod as intended in each case.

Let me know if I missed anything or did a step wrong, thank you.

Comment 8 Douglas Smith 2021-12-03 13:41:39 UTC
Actually Nikhil, this is great news, all the steps performed as right on.

The K8S_POD_UID is the proper full name of the value! We can consider this verified.

The CRIO change that introduces pod UID is found in the CRIO release notes @ https://cri-o.github.io/cri-o/v1.21.2.html

Thank you!

Comment 11 errata-xmlrpc 2022-03-10 16:22:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056