Bug 2071799

Summary: Multus CNI should exit cleanly on CNI DEL when the API server is unavailable
Product: OpenShift Container Platform Reporter: Douglas Smith <dosmith>
Component: NetworkingAssignee: Tomofumi Hayashi <tohayash>
Networking sub component: multus QA Contact: Weibin Liang <weliang>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: tohayash
Version: 4.11   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2071800 (view as bug list) Environment:
Last Closed: 2022-08-10 11:03:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2071800    

Description Douglas Smith 2022-04-04 20:13:35 UTC
Description of problem: On the CNI DEL path, Multus CNI should exit cleanly, otherwise, pods can wind up in a crash loop.

How reproducible: Difficult, requires API server to be unreachable.

Comment 3 Weibin Liang 2022-05-05 13:51:55 UTC
Still can not verify this bug from the steps in https://gist.github.com/dougbtv/da3ab605c2fd9845cdc018f07b02ce51, still wait for update from dev.

Comment 4 Weibin Liang 2022-06-01 14:11:33 UTC
@dosmith @tohayash Do we have any new way to verify this bug? Thanks!

Comment 5 Tomofumi Hayashi 2022-06-13 16:51:39 UTC
Hi Weibin,

I could test for Bug 2071799 in baremetal UPI. Here is the steps.

Step1) Deploy OCP in baremetal UPI (assume that haproxy is used for load-balancing)
Step2) Create a pod
  - Get pod deployed node IP by 'oc get node' (assume that 10.2.1.21 in this case)
Step3) In haproxy node (or pod, depends on your deploy), add iptable rules in 'iptables -I INPUT 1 -s 10.2.1.21/32 -m conntrack --ctstate NEW -j DROP'
Step4) Delete the pod (and you can see the message by 'oc describe pod' command)

Comment 9 Weibin Liang 2022-06-21 16:31:55 UTC
Tested and verified in 4.11.0-0.nightly-2022-06-21-040754

sh-4.4# journalctl -xe -u crio | grep 'but continue to delete'
Jun 21 13:59:41 weliang-621-jhsdj-compute-1 crio[1524]: 2022-06-21T13:59:41Z [error] Multus: failed to get delegates: TryLoadPodDelegates: error in getting k8s network for pod: GetNetworkDelegates: failed getting the delegate: getKubernetesDelegate: cannot find a network-attachment-definition (bridge-conf) in namespace (test): network-attachment-definitions.k8s.cni.cncf.io "bridge-conf" not found, but continue to delete clusterNetwork

Comment 10 errata-xmlrpc 2022-08-10 11:03:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069