Bug 1812710
Summary: | When CNV insalled the kubemacpool mutatingwebhook interferes with openshift-ovn-kubernetes ns | ||
---|---|---|---|
Product: | Container Native Virtualization (CNV) | Reporter: | William Caban <william.caban> |
Component: | Networking | Assignee: | Petr Horáček <phoracek> |
Status: | CLOSED ERRATA | QA Contact: | Meni Yakove <myakove> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 2.2.0 | CC: | aos-bugs, atragler, augol, bschmaus, bshephar, cnv-qe-bugs, danken, dsafford, fsimonce, gkapoor, myakove, ncredi, vpagar |
Target Milestone: | --- | Flags: | ncredi:
needinfo+
|
Target Release: | 2.3.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | kubemacpool-container-v2.3.0-24 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-05-04 19:10:58 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1771572 |
Description
William Caban
2020-03-11 22:35:48 UTC
We need to backport https://github.com/k8snetworkplumbingwg/kubemacpool/commit/02a7388b7c98336674f7425aab30686e69536966 to 2.3 seems like. I'm on it. Test Environment : ================== $ oc version Client Version: 4.4.0-0.nightly-2020-02-17-022408 Server Version: 4.4.0-rc.4 Kubernetes Version: v1.17.1 CNV Version $ oc get csv -n openshift-cnv | awk ' { print $4 } ' | tail -n1 2.3.0 Steps: ===== Bug Summary: When our kubemacpool is broken, it blocks openshift-ovn-kubernetes and the cluster may become dead. Fix: Even when kubemacpool is in CrashLoopBackOff, pods when gets killed/deleted under namespace openshift-ovn-kubernetes should be started again. 1. Put kubemacpool in CrashLoopBackOff -- oc edit -n openshift-cnv deployment kubemacpool-mac-controller-manager change /manager to "false" or anything.Save the file. Now try to get status of pods using and they should move to CrashLoopBackOff state. -- $ oc get pods -n openshift-cnv | grep -i crash kubemacpool-mac-controller-manager-6767f6c687-g98n5 0/1 CrashLoopBackOff 13 43m 2. delete one of the ovnkube-node pods in the openshift-ovn-kubernetes namespace -- oc delete pods ovnkube-node-kjx8z -n openshift-ovn-kubernetes 3. Make sure the pod comes up. oc get pods -n openshift-ovn-kubernetes Test Case 2: try to kill all pods and make sure they come up . ============ for i in ovnkube-master-l8tc9 ovnkube-master-sgx6b ovnkube-master-zsxlf ovnkube-node-625gd ovnkube-node-6fd2d ovnkube-node-7n7x5 ovnkube-node-dn7x8 ovnkube-node-hxgx8 ovnkube-node-j4pkn; do oc delete pods $i -n openshift-ovn-kubernetes; done Check the status of pods later. Number of pods before/after should be same. $ oc get pods -n openshift-ovn-kubernetes NAME READY STATUS RESTARTS AGE ovnkube-master-5655k 4/4 Running 0 87s ovnkube-master-9mv4v 4/4 Running 0 84s ovnkube-master-c29gg 4/4 Running 0 75s ovnkube-node-2cg64 2/2 Running 0 46s ovnkube-node-2n86d 2/2 Running 0 55s ovnkube-node-5fhqk 2/2 Running 0 70s ovnkube-node-74cqs 2/2 Running 0 57s ovnkube-node-mqbrr 2/2 Running 0 68s ovnkube-node-tdjvg 2/2 Running 0 53s As this is becoming important, I raised the Customer Escalation Flag. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:2011 |