Description of problem: On the CNI DEL path, Multus CNI should exit cleanly, otherwise, pods can wind up in a crash loop.
How reproducible: Difficult, requires API server to be unreachable.
Still can not verify this bug from the steps in https://gist.github.com/dougbtv/da3ab605c2fd9845cdc018f07b02ce51, still wait for update from dev.
@dosmith @tohayash Do we have any new way to verify this bug? Thanks!
I could test for Bug 2071799 in baremetal UPI. Here is the steps.
Step1) Deploy OCP in baremetal UPI (assume that haproxy is used for load-balancing)
Step2) Create a pod
- Get pod deployed node IP by 'oc get node' (assume that 10.2.1.21 in this case)
Step3) In haproxy node (or pod, depends on your deploy), add iptable rules in 'iptables -I INPUT 1 -s 10.2.1.21/32 -m conntrack --ctstate NEW -j DROP'
Step4) Delete the pod (and you can see the message by 'oc describe pod' command)
Tested and verified in 4.11.0-0.nightly-2022-06-21-040754
sh-4.4# journalctl -xe -u crio | grep 'but continue to delete'
Jun 21 13:59:41 weliang-621-jhsdj-compute-1 crio: 2022-06-21T13:59:41Z [error] Multus: failed to get delegates: TryLoadPodDelegates: error in getting k8s network for pod: GetNetworkDelegates: failed getting the delegate: getKubernetesDelegate: cannot find a network-attachment-definition (bridge-conf) in namespace (test): network-attachment-definitions.k8s.cni.cncf.io "bridge-conf" not found, but continue to delete clusterNetwork
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.