2071799 – Multus CNI should exit cleanly on CNI DEL when the API server is unavailable

Bug 2071799 - Multus CNI should exit cleanly on CNI DEL when the API server is unavailable

Summary: Multus CNI should exit cleanly on CNI DEL when the API server is unavailable

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Tomofumi Hayashi
QA Contact:	Weibin Liang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2071800
TreeView+	depends on / blocked

Reported:	2022-04-04 20:13 UTC by Douglas Smith
Modified:	2022-08-10 11:03 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2071800 (view as bug list)
Environment:
Last Closed:	2022-08-10 11:03:38 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift multus-cni pull 124	0	None	open	Bug 2071799: Remove error handling for getPod to force to proceed cmdDel.	2022-04-04 20:16:43 UTC
Github	openshift multus-cni pull 130	0	None	open	Bug 2071799: Skip status update in CmdDel if getPod is failed	2022-06-13 17:19:38 UTC

Description Douglas Smith 2022-04-04 20:13:35 UTC

Description of problem: On the CNI DEL path, Multus CNI should exit cleanly, otherwise, pods can wind up in a crash loop.

How reproducible: Difficult, requires API server to be unreachable.

Comment 3 Weibin Liang 2022-05-05 13:51:55 UTC

Still can not verify this bug from the steps in https://gist.github.com/dougbtv/da3ab605c2fd9845cdc018f07b02ce51, still wait for update from dev.

Comment 4 Weibin Liang 2022-06-01 14:11:33 UTC

@dosmith @tohayash Do we have any new way to verify this bug? Thanks!

Comment 5 Tomofumi Hayashi 2022-06-13 16:51:39 UTC

Hi Weibin,

I could test for Bug 2071799 in baremetal UPI. Here is the steps.

Step1) Deploy OCP in baremetal UPI (assume that haproxy is used for load-balancing)
Step2) Create a pod
  - Get pod deployed node IP by 'oc get node' (assume that 10.2.1.21 in this case)
Step3) In haproxy node (or pod, depends on your deploy), add iptable rules in 'iptables -I INPUT 1 -s 10.2.1.21/32 -m conntrack --ctstate NEW -j DROP'
Step4) Delete the pod (and you can see the message by 'oc describe pod' command)

Comment 9 Weibin Liang 2022-06-21 16:31:55 UTC

Tested and verified in 4.11.0-0.nightly-2022-06-21-040754

sh-4.4# journalctl -xe -u crio | grep 'but continue to delete'
Jun 21 13:59:41 weliang-621-jhsdj-compute-1 crio[1524]: 2022-06-21T13:59:41Z [error] Multus: failed to get delegates: TryLoadPodDelegates: error in getting k8s network for pod: GetNetworkDelegates: failed getting the delegate: getKubernetesDelegate: cannot find a network-attachment-definition (bridge-conf) in namespace (test): network-attachment-definitions.k8s.cni.cncf.io "bridge-conf" not found, but continue to delete clusterNetwork

Comment 10 errata-xmlrpc 2022-08-10 11:03:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Note You need to log in before you can comment on or make changes to this bug.