Bug 1812710

Summary: When CNV insalled the kubemacpool mutatingwebhook interferes with openshift-ovn-kubernetes ns
Product: Container Native Virtualization (CNV) Reporter: William Caban <william.caban>
Component: NetworkingAssignee: Petr Horáček <phoracek>
Status: CLOSED ERRATA QA Contact: Meni Yakove <myakove>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 2.2.0CC: aos-bugs, atragler, augol, bschmaus, bshephar, cnv-qe-bugs, danken, dsafford, fsimonce, gkapoor, myakove, ncredi, vpagar
Target Milestone: ---Flags: ncredi: needinfo+
Target Release: 2.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kubemacpool-container-v2.3.0-24 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-04 19:10:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1771572    

Description William Caban 2020-03-11 22:35:48 UTC
Description of problem:

When CNV is installed the kubemacpool mutatingwebhook interferes with Pods openshift-ovn-kubernetes because the namespace is not tagged with the kubemacpool/ignoreAdmission:""


If the kubemacpool controller is down or not available and a pod is deleted or node is restarted the pods can not be recreated or restarted waiting for the kubemacpool.


Version-Release number of selected component (if applicable):
OCP 4.3.5
CNV 2.2

Steps to Reproduce:
1. kubemacpool in CrashLoopBackOff
2. delete one of the ovnkube-node pods in the openshift-ovn-kubernetes namespace
3. container won't be able to run


Expected results:

Two options:
1) CNV operator should label the openshift-ovn-kubernetes with label kubemacpool/ignoreAdmission:"" during deployment
2) The kubemacpool should use a whitelist model instead of a blacklist model when determining in which namespaces to apply

Comment 1 Petr Horáček 2020-03-12 08:38:46 UTC
We need to backport https://github.com/k8snetworkplumbingwg/kubemacpool/commit/02a7388b7c98336674f7425aab30686e69536966 to 2.3 seems like. I'm on it.

Comment 3 Geetika Kapoor 2020-03-27 10:47:45 UTC
Test Environment :
==================

$ oc version
Client Version: 4.4.0-0.nightly-2020-02-17-022408
Server Version: 4.4.0-rc.4
Kubernetes Version: v1.17.1

CNV Version
$ oc get csv -n openshift-cnv | awk ' { print $4 } ' | tail -n1
2.3.0

Steps:
=====

Bug Summary: When our kubemacpool is broken, it blocks openshift-ovn-kubernetes and the cluster may become dead.
Fix: Even when kubemacpool is in CrashLoopBackOff, pods when gets killed/deleted under namespace openshift-ovn-kubernetes should be started again.

1. Put kubemacpool in CrashLoopBackOff

-- oc edit -n openshift-cnv deployment kubemacpool-mac-controller-manager

change /manager to "false" or anything.Save the file.

Now try to get status of pods using and they should move to CrashLoopBackOff state.

-- $ oc get pods -n openshift-cnv | grep -i crash
kubemacpool-mac-controller-manager-6767f6c687-g98n5   0/1     CrashLoopBackOff   13         43m

2. delete one of the ovnkube-node pods in the openshift-ovn-kubernetes namespace


-- oc delete pods ovnkube-node-kjx8z -n openshift-ovn-kubernetes


3. Make sure the pod comes up.

oc get pods -n openshift-ovn-kubernetes


Test Case 2: try to kill all pods and make sure they come up .
============

for i in ovnkube-master-l8tc9 ovnkube-master-sgx6b ovnkube-master-zsxlf ovnkube-node-625gd ovnkube-node-6fd2d ovnkube-node-7n7x5 ovnkube-node-dn7x8 ovnkube-node-hxgx8 ovnkube-node-j4pkn; do oc delete pods $i -n openshift-ovn-kubernetes; done


Check the status of pods later. Number of pods before/after should be same.

$ oc get pods -n openshift-ovn-kubernetes
NAME                   READY   STATUS    RESTARTS   AGE
ovnkube-master-5655k   4/4     Running   0          87s
ovnkube-master-9mv4v   4/4     Running   0          84s
ovnkube-master-c29gg   4/4     Running   0          75s
ovnkube-node-2cg64     2/2     Running   0          46s
ovnkube-node-2n86d     2/2     Running   0          55s
ovnkube-node-5fhqk     2/2     Running   0          70s
ovnkube-node-74cqs     2/2     Running   0          57s
ovnkube-node-mqbrr     2/2     Running   0          68s
ovnkube-node-tdjvg     2/2     Running   0          53s

Comment 4 Dana Safford 2020-04-02 19:01:29 UTC
As this is becoming important, I raised the Customer Escalation Flag.

Comment 7 errata-xmlrpc 2020-05-04 19:10:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:2011