Bug 1812710 - When CNV insalled the kubemacpool mutatingwebhook interferes with openshift-ovn-kubernetes ns
Summary: When CNV insalled the kubemacpool mutatingwebhook interferes with openshift-o...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Networking
Version: 2.2.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 2.3.0
Assignee: Petr Horáček
QA Contact: Meni Yakove
URL:
Whiteboard:
Depends On:
Blocks: 1771572
TreeView+ depends on / blocked
 
Reported: 2020-03-11 22:35 UTC by William Caban
Modified: 2023-09-07 22:22 UTC (History)
13 users (show)

Fixed In Version: kubemacpool-container-v2.3.0-24
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-04 19:10:58 UTC
Target Upstream Version:
Embargoed:
ncredi: needinfo+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github k8snetworkplumbingwg kubemacpool pull 107 0 None closed [release-v0.8] Skip pods in critical namespaces 2020-10-06 07:03:26 UTC
Red Hat Issue Tracker CNV-32781 0 None None None 2023-09-07 22:22:51 UTC
Red Hat Product Errata RHEA-2020:2011 0 None None None 2020-05-04 19:11:10 UTC

Description William Caban 2020-03-11 22:35:48 UTC
Description of problem:

When CNV is installed the kubemacpool mutatingwebhook interferes with Pods openshift-ovn-kubernetes because the namespace is not tagged with the kubemacpool/ignoreAdmission:""


If the kubemacpool controller is down or not available and a pod is deleted or node is restarted the pods can not be recreated or restarted waiting for the kubemacpool.


Version-Release number of selected component (if applicable):
OCP 4.3.5
CNV 2.2

Steps to Reproduce:
1. kubemacpool in CrashLoopBackOff
2. delete one of the ovnkube-node pods in the openshift-ovn-kubernetes namespace
3. container won't be able to run


Expected results:

Two options:
1) CNV operator should label the openshift-ovn-kubernetes with label kubemacpool/ignoreAdmission:"" during deployment
2) The kubemacpool should use a whitelist model instead of a blacklist model when determining in which namespaces to apply

Comment 1 Petr Horáček 2020-03-12 08:38:46 UTC
We need to backport https://github.com/k8snetworkplumbingwg/kubemacpool/commit/02a7388b7c98336674f7425aab30686e69536966 to 2.3 seems like. I'm on it.

Comment 3 Geetika Kapoor 2020-03-27 10:47:45 UTC
Test Environment :
==================

$ oc version
Client Version: 4.4.0-0.nightly-2020-02-17-022408
Server Version: 4.4.0-rc.4
Kubernetes Version: v1.17.1

CNV Version
$ oc get csv -n openshift-cnv | awk ' { print $4 } ' | tail -n1
2.3.0

Steps:
=====

Bug Summary: When our kubemacpool is broken, it blocks openshift-ovn-kubernetes and the cluster may become dead.
Fix: Even when kubemacpool is in CrashLoopBackOff, pods when gets killed/deleted under namespace openshift-ovn-kubernetes should be started again.

1. Put kubemacpool in CrashLoopBackOff

-- oc edit -n openshift-cnv deployment kubemacpool-mac-controller-manager

change /manager to "false" or anything.Save the file.

Now try to get status of pods using and they should move to CrashLoopBackOff state.

-- $ oc get pods -n openshift-cnv | grep -i crash
kubemacpool-mac-controller-manager-6767f6c687-g98n5   0/1     CrashLoopBackOff   13         43m

2. delete one of the ovnkube-node pods in the openshift-ovn-kubernetes namespace


-- oc delete pods ovnkube-node-kjx8z -n openshift-ovn-kubernetes


3. Make sure the pod comes up.

oc get pods -n openshift-ovn-kubernetes


Test Case 2: try to kill all pods and make sure they come up .
============

for i in ovnkube-master-l8tc9 ovnkube-master-sgx6b ovnkube-master-zsxlf ovnkube-node-625gd ovnkube-node-6fd2d ovnkube-node-7n7x5 ovnkube-node-dn7x8 ovnkube-node-hxgx8 ovnkube-node-j4pkn; do oc delete pods $i -n openshift-ovn-kubernetes; done


Check the status of pods later. Number of pods before/after should be same.

$ oc get pods -n openshift-ovn-kubernetes
NAME                   READY   STATUS    RESTARTS   AGE
ovnkube-master-5655k   4/4     Running   0          87s
ovnkube-master-9mv4v   4/4     Running   0          84s
ovnkube-master-c29gg   4/4     Running   0          75s
ovnkube-node-2cg64     2/2     Running   0          46s
ovnkube-node-2n86d     2/2     Running   0          55s
ovnkube-node-5fhqk     2/2     Running   0          70s
ovnkube-node-74cqs     2/2     Running   0          57s
ovnkube-node-mqbrr     2/2     Running   0          68s
ovnkube-node-tdjvg     2/2     Running   0          53s

Comment 4 Dana Safford 2020-04-02 19:01:29 UTC
As this is becoming important, I raised the Customer Escalation Flag.

Comment 7 errata-xmlrpc 2020-05-04 19:10:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:2011


Note You need to log in before you can comment on or make changes to this bug.