1812710 – When CNV insalled the kubemacpool mutatingwebhook interferes with openshift-ovn-kubernetes ns

Bug 1812710 - When CNV insalled the kubemacpool mutatingwebhook interferes with openshift-ovn-kubernetes ns

Summary: When CNV insalled the kubemacpool mutatingwebhook interferes with openshift-o...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	2.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	2.3.0
Assignee:	Petr Horáček
QA Contact:	Meni Yakove
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1771572
TreeView+	depends on / blocked

Reported:	2020-03-11 22:35 UTC by William Caban
Modified:	2023-09-07 22:22 UTC (History)
CC List:	13 users (show)
Fixed In Version:	kubemacpool-container-v2.3.0-24
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-05-04 19:10:58 UTC
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	ncredi: needinfo+

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	k8snetworkplumbingwg kubemacpool pull 107	None	closed	[release-v0.8] Skip pods in critical namespaces	2020-10-06 07:03:26 UTC
Red Hat Issue Tracker	CNV-32781	None	None	None	2023-09-07 22:22:51 UTC
Red Hat Product Errata	RHEA-2020:2011	None	None	None	2020-05-04 19:11:10 UTC

Description William Caban 2020-03-11 22:35:48 UTC

Description of problem:

When CNV is installed the kubemacpool mutatingwebhook interferes with Pods openshift-ovn-kubernetes because the namespace is not tagged with the kubemacpool/ignoreAdmission:""


If the kubemacpool controller is down or not available and a pod is deleted or node is restarted the pods can not be recreated or restarted waiting for the kubemacpool.


Version-Release number of selected component (if applicable):
OCP 4.3.5
CNV 2.2

Steps to Reproduce:
1. kubemacpool in CrashLoopBackOff
2. delete one of the ovnkube-node pods in the openshift-ovn-kubernetes namespace
3. container won't be able to run


Expected results:

Two options:
1) CNV operator should label the openshift-ovn-kubernetes with label kubemacpool/ignoreAdmission:"" during deployment
2) The kubemacpool should use a whitelist model instead of a blacklist model when determining in which namespaces to apply

Comment 1 Petr Horáček 2020-03-12 08:38:46 UTC

We need to backport https://github.com/k8snetworkplumbingwg/kubemacpool/commit/02a7388b7c98336674f7425aab30686e69536966 to 2.3 seems like. I'm on it.

Comment 3 Geetika Kapoor 2020-03-27 10:47:45 UTC

Test Environment :
==================

$ oc version
Client Version: 4.4.0-0.nightly-2020-02-17-022408
Server Version: 4.4.0-rc.4
Kubernetes Version: v1.17.1

CNV Version
$ oc get csv -n openshift-cnv | awk ' { print $4 } ' | tail -n1
2.3.0

Steps:
=====

Bug Summary: When our kubemacpool is broken, it blocks openshift-ovn-kubernetes and the cluster may become dead.
Fix: Even when kubemacpool is in CrashLoopBackOff, pods when gets killed/deleted under namespace openshift-ovn-kubernetes should be started again.

1. Put kubemacpool in CrashLoopBackOff

-- oc edit -n openshift-cnv deployment kubemacpool-mac-controller-manager

change /manager to "false" or anything.Save the file.

Now try to get status of pods using and they should move to CrashLoopBackOff state.

-- $ oc get pods -n openshift-cnv | grep -i crash
kubemacpool-mac-controller-manager-6767f6c687-g98n5   0/1     CrashLoopBackOff   13         43m

2. delete one of the ovnkube-node pods in the openshift-ovn-kubernetes namespace


-- oc delete pods ovnkube-node-kjx8z -n openshift-ovn-kubernetes


3. Make sure the pod comes up.

oc get pods -n openshift-ovn-kubernetes


Test Case 2: try to kill all pods and make sure they come up .
============

for i in ovnkube-master-l8tc9 ovnkube-master-sgx6b ovnkube-master-zsxlf ovnkube-node-625gd ovnkube-node-6fd2d ovnkube-node-7n7x5 ovnkube-node-dn7x8 ovnkube-node-hxgx8 ovnkube-node-j4pkn; do oc delete pods $i -n openshift-ovn-kubernetes; done


Check the status of pods later. Number of pods before/after should be same.

$ oc get pods -n openshift-ovn-kubernetes
NAME                   READY   STATUS    RESTARTS   AGE
ovnkube-master-5655k   4/4     Running   0          87s
ovnkube-master-9mv4v   4/4     Running   0          84s
ovnkube-master-c29gg   4/4     Running   0          75s
ovnkube-node-2cg64     2/2     Running   0          46s
ovnkube-node-2n86d     2/2     Running   0          55s
ovnkube-node-5fhqk     2/2     Running   0          70s
ovnkube-node-74cqs     2/2     Running   0          57s
ovnkube-node-mqbrr     2/2     Running   0          68s
ovnkube-node-tdjvg     2/2     Running   0          53s

Comment 4 Dana Safford 2020-04-02 19:01:29 UTC

As this is becoming important, I raised the Customer Escalation Flag.

Comment 7 errata-xmlrpc 2020-05-04 19:10:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:2011

Note You need to log in before you can comment on or make changes to this bug.