+++ This bug was initially created as a clone of Bug #2017881 +++
From upstream PR: https://github.com/k8snetworkplumbingwg/multus-cni/pull/742
"If the runtime passes a pod UID via K8S_POD_UID (which both CRIO and
containerd do as of mid-2021) then fail if the pod we get from the
Kube API has a different UID. This would indicate that the pod was
deleted and recreated while Multus was attempting to set up
networking for the old pod instance's sandbox, and it's pointless
to continue setting up a sandbox for a dead pod instance."
Looks like this had some unintended fallout which appears to be primarily related to throwing an error on the CNI DEL path.
To verify this BZ, it was suggested by Dan Williams that we check to see that the POD_UID is passed to Multus, this can be accomplished by:
1. Enabling logging on a node: https://github.com/k8snetworkplumbingwg/multus-cni/blob/master/docs/configuration.md#logging
oc debug node/foo
Add logging set to debug level to the configuration.
2. Start up a pod, but use a node selector.
First add a label to a node
oc label nodes node-name-here multus=debug
Use the Multus quickstart: https://github.com/k8snetworkplumbingwg/multus-cni/blob/master/docs/quickstart.md
Just create the net-attach-def and use the pod definition (NOTE: make sure to make the master interface match the interface on the host, or use a different net-attach-def, say bridge)
Then... add the node selector to the pod:
3. Start the pod... then `oc debug` the node and `chroot /host`
Grep through the multus log file on disk, and look for "POD_UID", if that string is present and appears set, the bug is verified.
Correction: vi /etc/kubernetes/cni/net.d/00-multus.conf
I'm not seeing the string "POD_UID" in the logs.
My steps are mostly the same as yours, except the following that may or may not be different:
1. I set the "LogFile": "/var/logs/multus.log" instead of using STDERR logs.
2. I set the nodeSelector for the pod in the pod definition itself, rather than add it to a Daemonset or something similar.
These aren't huge deviations, but worth noting.
Also worth noting, the command:
`cat /var/logs/multus.log | grep POD_UID`
returns many results with `K8S_POD_UID=some_value_here`.
`cat /var/logs/multus.log | grep -v K8S_POD_UID | grep POD_UID`
`cat /var/log/multus.log | grep POD_UID | grep -v K8S_POD_UID`
- trying the steps from scratch with bridge-cni instead had the same results.
- the pod's node selector is working as intended
- bridge-cni and macvlan-conf were both attached to the pod as intended in each case.
Let me know if I missed anything or did a step wrong, thank you.
Actually Nikhil, this is great news, all the steps performed as right on.
The K8S_POD_UID is the proper full name of the value! We can consider this verified.
The CRIO change that introduces pod UID is found in the CRIO release notes @ https://cri-o.github.io/cri-o/v1.21.2.html
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.