Description of problem: On the CNI DEL path, Multus CNI should exit cleanly, otherwise, pods can wind up in a crash loop. How reproducible: Difficult, requires API server to be unreachable.
Still can not verify this bug from the steps in https://gist.github.com/dougbtv/da3ab605c2fd9845cdc018f07b02ce51, still wait for update from dev.
@dosmith @tohayash Do we have any new way to verify this bug? Thanks!
Hi Weibin, I could test for Bug 2071799 in baremetal UPI. Here is the steps. Step1) Deploy OCP in baremetal UPI (assume that haproxy is used for load-balancing) Step2) Create a pod - Get pod deployed node IP by 'oc get node' (assume that 10.2.1.21 in this case) Step3) In haproxy node (or pod, depends on your deploy), add iptable rules in 'iptables -I INPUT 1 -s 10.2.1.21/32 -m conntrack --ctstate NEW -j DROP' Step4) Delete the pod (and you can see the message by 'oc describe pod' command)
Tested and verified in 4.11.0-0.nightly-2022-06-21-040754 sh-4.4# journalctl -xe -u crio | grep 'but continue to delete' Jun 21 13:59:41 weliang-621-jhsdj-compute-1 crio[1524]: 2022-06-21T13:59:41Z [error] Multus: failed to get delegates: TryLoadPodDelegates: error in getting k8s network for pod: GetNetworkDelegates: failed getting the delegate: getKubernetesDelegate: cannot find a network-attachment-definition (bridge-conf) in namespace (test): network-attachment-definitions.k8s.cni.cncf.io "bridge-conf" not found, but continue to delete clusterNetwork
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069