Bug 1982868
Summary: | 4.8 ManagementCPUsOverride admission plugin blocks 4.7 deployments on empty topology | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | W. Trevor King <wking> | |
Component: | Node | Assignee: | Artyom <alukiano> | |
Node sub component: | Autoscaler (HPA, VPA) | QA Contact: | Sunil Choudhary <schoudha> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | medium | |||
Priority: | high | CC: | alukiano, aos-bugs, nagrawal, rphillips | |
Version: | 4.8 | Keywords: | Reopened | |
Target Milestone: | --- | |||
Target Release: | 4.9.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1982873 (view as bug list) | Environment: | ||
Last Closed: | 2021-10-18 17:39:54 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1982873 | |||
Bug Blocks: | 1995714 |
Description
W. Trevor King
2021-07-15 21:12:33 UTC
> So to support rollbacks to 4.7, the 4.8 admission plugin probably needs to locally hard-code the CRD's 'HighlyAvailable' default.
I can probably figure out how to actually do this ;)
Bug 1977351 is still ON_QA, and since the PR attached to that one added the lines I'm adjusting, maybe we should close this one as a dup, and hang my 4.8 PR on bug 1977351 instead? *** This bug has been marked as a duplicate of bug 1977351 *** 4.7 -> 4.8 -> 4.7 jobs are still failing [1]. Checking a recent run, still timing out [2]: {"component":"entrypoint","file":"prow/entrypoint/run.go:165","func":"k8s.io/test-infra/prow/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 3h0m0s timeout","severity":"error","time":"2021-09-04T02:50:49Z"} From the ClusterVersion [3], the job got stuck in the return leg from 4.8.0-0.ci-2021-09-03-100015 to 4.7.29, with the not particularly informative: Working towards 4.7.29: 68 of 669 done (10% complete) It's sticking earlier now, on some etcd thing: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade-rollback/1433931475606048768/artifacts/e2e-aws-upgrade-rollback/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-b7444fb9-7qzqx_cluster-version-operator.log | grep 'Running sync.*in state\|Result of work' | tail -n6 I0904 02:39:46.601247 1 task_graph.go:555] Result of work: [deployment openshift-etcd-operator/etcd-operator has a replica failure FailedCreate: pods "etcd-operator-69bb77696-" is forbidden: autoscaling.openshift.io/ManagementCPUsOverride infrastructure resource has empty status.controlPlaneTopology or status.infrastructureTopology] I0904 02:42:43.862556 1 sync_worker.go:549] Running sync registry.build02.ci.openshift.org/ci-op-w45p3w7d/release@sha256:b10034bedb4bf08a393462caf4c3fac8f9e4646d3b49d05915850dce0145cf15 (force=true) on generation 3 in state Updating at attempt 11 I0904 02:48:25.775750 1 task_graph.go:555] Result of work: [deployment openshift-etcd-operator/etcd-operator has a replica failure FailedCreate: pods "etcd-operator-69bb77696-" is forbidden: autoscaling.openshift.io/ManagementCPUsOverride infrastructure resource has empty status.controlPlaneTopology or status.infrastructureTopology] I0904 02:51:45.280411 1 sync_worker.go:549] Running sync registry.build02.ci.openshift.org/ci-op-w45p3w7d/release@sha256:b10034bedb4bf08a393462caf4c3fac8f9e4646d3b49d05915850dce0145cf15 (force=true) on generation 3 in state Updating at attempt 12 I0904 02:57:27.193759 1 task_graph.go:555] Result of work: [deployment openshift-etcd-operator/etcd-operator has a replica failure FailedCreate: pods "etcd-operator-69bb77696-" is forbidden: autoscaling.openshift.io/ManagementCPUsOverride infrastructure resource has empty status.controlPlaneTopology or status.infrastructureTopology] I0904 03:00:41.110856 1 sync_worker.go:549] Running sync registry.build02.ci.openshift.org/ci-op-w45p3w7d/release@sha256:b10034bedb4bf08a393462caf4c3fac8f9e4646d3b49d05915850dce0145cf15 (force=true) on generation 3 in state Updating at attempt 13 So still suffering from this issue. [1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade-rollback [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade-rollback/1433931475606048768#1:build-log.txt%3A172 [3]: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade-rollback/1433931475606048768/artifacts/e2e-aws-upgrade-rollback/gather-extra/artifacts/clusterversion.json Hi folks, the relevant PR was merged for 4.9, so we should check the upgrade and roll-back flow for 4.9->4.8->4.9, once we will verify it, the cherry-pick https://github.com/openshift/kubernetes/pull/895 can be merged and we can verify the flow 4.8->4.7->4.8. I can see https://testgrid.k8s.io/redhat-openshift-ocp-release-4.9-informing#periodic-ci-openshift-release-master-ci-4.9-e2e-aws-upgrade-rollback passed(it has some test failures but the deployment passed). I think we can move it to verified. Thanks Artyom, moving to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |