Bug 1853889
Summary: | [ovirt] test case "Managed cluster should have no crashlooping pods in core namespaces over four minutes" 100% failure | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Gal Zaidman <gzaidman> |
Component: | Installer | Assignee: | Gal Zaidman <gzaidman> |
Installer sub component: | OpenShift on RHV | QA Contact: | Lucie Leistnerova <lleistne> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | unspecified | CC: | aos-bugs, hpopal, maszulik, mfojtik, ssonigra, wking, xtian |
Version: | 4.4 | Keywords: | Reopened |
Target Milestone: | --- | ||
Target Release: | 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 16:12:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1858498 |
Description
Gal Zaidman
2020-07-05 09:38:01 UTC
I know that there is an open bug on that test case [1] but I believe this is a different reason. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1842002 This is just a wild guess because I don't know the code, but I see that the other pods have: ports.containerPort startupProbe, livenessProbe readinessProbe but kube-controller-manager-recovery-controller doesn't This is being handled in https://bugzilla.redhat.com/show_bug.cgi?id=1851389 and backports to older versions are on the way. *** This bug has been marked as a duplicate of bug 1851389 *** (In reply to Maciej Szulik from comment #4) > This is being handled in https://bugzilla.redhat.com/show_bug.cgi?id=1851389 > and backports to older versions are on the way. > > *** This bug has been marked as a duplicate of bug 1851389 *** I think that this failure is caused because of the bug[1] fix[2] [1] https://bugzilla.redhat.com/show_bug.cgi?id=1851389 [2] https://github.com/openshift/cluster-kube-controller-manager-operator/pull/421 Both are needed and both are in-progress. The de-duplication still makes sense. *** This bug has been marked as a duplicate of bug 1851389 *** Sorry I edited the fields and didn't see you closed it again *** This bug has been marked as a duplicate of bug 1851389 *** Reopening this, after a talk with Maciej Szulik. The new suspect is the combination on [1] and [2]. PR[1] added logic for checking port availability in recovery-controller. PR[2] change the port of HAProxy to 9443, due to yet another port conflict. Both of the PRs cause the kube-controller-manager-recovery-controller to crash loop. [1] https://github.com/openshift/cluster-kube-controller-manager-operator/pull/421 [2] https://github.com/openshift/baremetal-runtimecfg/pull/59 Yeah, that links that Gal pointed in the previous comment are the main reason this is failing consistently. I wonder why only now this popped up, when cluster-policy-controller is using 9443 port since version 4.3, at least. I'm moving this to oVirt team to fix it. My bad, it's recovery controller that is using 9443, not cpc Verified with CI run results Hello Team Will the solution of this issue be backported to 4.4 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |