Bug 1812710
| Summary: | When CNV insalled the kubemacpool mutatingwebhook interferes with openshift-ovn-kubernetes ns | ||
|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | William Caban <william.caban> |
| Component: | Networking | Assignee: | Petr Horáček <phoracek> |
| Status: | CLOSED ERRATA | QA Contact: | Meni Yakove <myakove> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 2.2.0 | CC: | aos-bugs, atragler, augol, bschmaus, bshephar, cnv-qe-bugs, danken, dsafford, fsimonce, gkapoor, myakove, ncredi, vpagar |
| Target Milestone: | --- | Flags: | ncredi:
needinfo+
|
| Target Release: | 2.3.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | kubemacpool-container-v2.3.0-24 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-05-04 19:10:58 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1771572 | ||
|
Description
William Caban
2020-03-11 22:35:48 UTC
We need to backport https://github.com/k8snetworkplumbingwg/kubemacpool/commit/02a7388b7c98336674f7425aab30686e69536966 to 2.3 seems like. I'm on it. Test Environment :
==================
$ oc version
Client Version: 4.4.0-0.nightly-2020-02-17-022408
Server Version: 4.4.0-rc.4
Kubernetes Version: v1.17.1
CNV Version
$ oc get csv -n openshift-cnv | awk ' { print $4 } ' | tail -n1
2.3.0
Steps:
=====
Bug Summary: When our kubemacpool is broken, it blocks openshift-ovn-kubernetes and the cluster may become dead.
Fix: Even when kubemacpool is in CrashLoopBackOff, pods when gets killed/deleted under namespace openshift-ovn-kubernetes should be started again.
1. Put kubemacpool in CrashLoopBackOff
-- oc edit -n openshift-cnv deployment kubemacpool-mac-controller-manager
change /manager to "false" or anything.Save the file.
Now try to get status of pods using and they should move to CrashLoopBackOff state.
-- $ oc get pods -n openshift-cnv | grep -i crash
kubemacpool-mac-controller-manager-6767f6c687-g98n5 0/1 CrashLoopBackOff 13 43m
2. delete one of the ovnkube-node pods in the openshift-ovn-kubernetes namespace
-- oc delete pods ovnkube-node-kjx8z -n openshift-ovn-kubernetes
3. Make sure the pod comes up.
oc get pods -n openshift-ovn-kubernetes
Test Case 2: try to kill all pods and make sure they come up .
============
for i in ovnkube-master-l8tc9 ovnkube-master-sgx6b ovnkube-master-zsxlf ovnkube-node-625gd ovnkube-node-6fd2d ovnkube-node-7n7x5 ovnkube-node-dn7x8 ovnkube-node-hxgx8 ovnkube-node-j4pkn; do oc delete pods $i -n openshift-ovn-kubernetes; done
Check the status of pods later. Number of pods before/after should be same.
$ oc get pods -n openshift-ovn-kubernetes
NAME READY STATUS RESTARTS AGE
ovnkube-master-5655k 4/4 Running 0 87s
ovnkube-master-9mv4v 4/4 Running 0 84s
ovnkube-master-c29gg 4/4 Running 0 75s
ovnkube-node-2cg64 2/2 Running 0 46s
ovnkube-node-2n86d 2/2 Running 0 55s
ovnkube-node-5fhqk 2/2 Running 0 70s
ovnkube-node-74cqs 2/2 Running 0 57s
ovnkube-node-mqbrr 2/2 Running 0 68s
ovnkube-node-tdjvg 2/2 Running 0 53s
As this is becoming important, I raised the Customer Escalation Flag. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:2011 |