Bug 1818106
| Summary: | [upgrade] API was unreachable during disruption for at least... | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Hongkai Liu <hongkliu> |
| Component: | Build | Assignee: | Gabe Montero <gmontero> |
| Status: | CLOSED DUPLICATE | QA Contact: | wewang <wewang> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.3.z | CC: | akashem, aos-bugs, bparees, eparis, jokerman, lmohanty, mfojtik, wking, wzheng |
| Target Milestone: | --- | Keywords: | Upgrades |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | buildcop | ||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-04-03 20:15:38 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Hongkai Liu
2020-03-27 17:27:44 UTC
Actual failure for [1] was: fail [github.com/openshift/origin/test/extended/util/disruption/disruption.go:226]: Mar 27 02:50:23.048: API was unreachable during disruption for at least 4m21s of 48m9s (9%): Not sure if this 4.2.26 -> 4.3.0-0.nightly-2020-03-27-012404 failure is ingress/routing or the API server itself. I guessed ingress/routing for bug 1818104, so going with the API server here. [1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/23264 Might also be an SDN issue like bug 1793635. This seems to be coming in "180 (14% of all failures) API was unreachable during disruption" in last two days of CI runs. I checked all the `clusteroperator` objects, all reported OK except for `kube-paiserver` curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/23264/artifacts/e2e-aws-upgrade/clusteroperators.json | jq '.items | .[] | select(.metadata.name == "kube-apiserver") | .status.conditions[] | select(.type == "Upgradeable")' { "lastTransitionTime": "2020-03-27T02:07:36Z", "message": "DefaultSecurityContextConstraintsUpgradeable: Default SecurityContextConstraints object(s) have mutated [anyuid hostmount-anyuid privileged]", "reason": "DefaultSecurityContextConstraints_Mutated", "status": "False", "type": "Upgradeable" } This is a known issue, the e2e test suite is changing the default SCC. In 4.3, any mutation of the default SCC will prevent upgrade. The resolution is - delete the default SCC object(s) that have been mutated and then delete any of the `openshift-apiserver` Pod in the `openshfit-apiserver` namespace. This is a known issue - the api/auth team had a conversation with Ben Parees about this on slack - https://coreos.slack.com/archives/CB48XQ4KZ/p1585580675154600 Basically, what's happening here is e2e test suite is changing the default SCC. it is adding `system:serviceaccount:e2e-test-s2i-build-root-4qr5v:builder` to `users` of the default SCC. - users: - system:admin - system:serviceaccount:openshift-infra:build-controller - system:serviceaccount:e2e-test-s2i-build-root-4qr5v:builder The default one that ships with the cluster does not have system:serviceaccount:e2e-test-s2i-build-root-4qr5v:builder Assigning it to infrastructure team for now so that they can validate this. Hi eparis, we verified this, please see my comment above - https://bugzilla.redhat.com/show_bug.cgi?id=1818106#c4 Gabe this was a sympton of the SCC mutation e2e you fixed recently. If you've already got a bug for it, just dupe this against that. Gabe, not sure which branches you put the e2e change into, but it sounds like we probably need it at least back to 4.3 to unblock upgrade jobs. Ben https://github.com/openshift/origin/pull/24821 is awaiting cherrypick approval for 4.3 and https://github.com/openshift/origin/pull/24822 for 4.4 is in the same boat The 4.5 bug that merged is 1819276 the 4.3.z bug is 1820266 ... I'll use that for the dupe *** This bug has been marked as a duplicate of bug 1820266 *** |