Bug 2071799 - Multus CNI should exit cleanly on CNI DEL when the API server is unavailable
Summary: Multus CNI should exit cleanly on CNI DEL when the API server is unavailable
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.11
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.11.0
Assignee: Tomofumi Hayashi
QA Contact: Weibin Liang
URL:
Whiteboard:
Depends On:
Blocks: 2071800
TreeView+ depends on / blocked
 
Reported: 2022-04-04 20:13 UTC by Douglas Smith
Modified: 2022-08-10 11:03 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2071800 (view as bug list)
Environment:
Last Closed: 2022-08-10 11:03:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift multus-cni pull 124 0 None open Bug 2071799: Remove error handling for getPod to force to proceed cmdDel. 2022-04-04 20:16:43 UTC
Github openshift multus-cni pull 130 0 None open Bug 2071799: Skip status update in CmdDel if getPod is failed 2022-06-13 17:19:38 UTC

Description Douglas Smith 2022-04-04 20:13:35 UTC
Description of problem: On the CNI DEL path, Multus CNI should exit cleanly, otherwise, pods can wind up in a crash loop.

How reproducible: Difficult, requires API server to be unreachable.

Comment 3 Weibin Liang 2022-05-05 13:51:55 UTC
Still can not verify this bug from the steps in https://gist.github.com/dougbtv/da3ab605c2fd9845cdc018f07b02ce51, still wait for update from dev.

Comment 4 Weibin Liang 2022-06-01 14:11:33 UTC
@dosmith @tohayash Do we have any new way to verify this bug? Thanks!

Comment 5 Tomofumi Hayashi 2022-06-13 16:51:39 UTC
Hi Weibin,

I could test for Bug 2071799 in baremetal UPI. Here is the steps.

Step1) Deploy OCP in baremetal UPI (assume that haproxy is used for load-balancing)
Step2) Create a pod
  - Get pod deployed node IP by 'oc get node' (assume that 10.2.1.21 in this case)
Step3) In haproxy node (or pod, depends on your deploy), add iptable rules in 'iptables -I INPUT 1 -s 10.2.1.21/32 -m conntrack --ctstate NEW -j DROP'
Step4) Delete the pod (and you can see the message by 'oc describe pod' command)

Comment 9 Weibin Liang 2022-06-21 16:31:55 UTC
Tested and verified in 4.11.0-0.nightly-2022-06-21-040754

sh-4.4# journalctl -xe -u crio | grep 'but continue to delete'
Jun 21 13:59:41 weliang-621-jhsdj-compute-1 crio[1524]: 2022-06-21T13:59:41Z [error] Multus: failed to get delegates: TryLoadPodDelegates: error in getting k8s network for pod: GetNetworkDelegates: failed getting the delegate: getKubernetesDelegate: cannot find a network-attachment-definition (bridge-conf) in namespace (test): network-attachment-definitions.k8s.cni.cncf.io "bridge-conf" not found, but continue to delete clusterNetwork

Comment 10 errata-xmlrpc 2022-08-10 11:03:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.