Bug 1977351 - CVO pod skipped by workload partitioning with incorrect error stating cluster is not SNO
Summary: CVO pod skipped by workload partitioning with incorrect error stating cluster...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.8
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.8.0
Assignee: Artyom
QA Contact: Sunil Choudhary
Depends On: 1976379
TreeView+ depends on / blocked
Reported: 2021-06-29 13:57 UTC by OpenShift BugZilla Robot
Modified: 2021-07-27 23:14 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Last Closed: 2021-07-27 23:13:47 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift kubernetes pull 833 0 None open [release-4.8] Bug 1977351: UPSTREAM: <carry>: Reject the pod creation when we can not decide the cluster type 2021-06-29 13:58:26 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:14:11 UTC

Description OpenShift BugZilla Robot 2021-06-29 13:57:58 UTC
+++ This bug was initially created as a clone of Bug #1976379 +++

Created attachment 1794553 [details]
must-gather from cluster where this occurred

Description of problem:
Pod "cluster-version-operator-89bf5cdb5-4qhhh" in openshift-cluster-version namespace was not handled by the workload partitioning pod mutation logic. A warning was added to the pod:

apiVersion: v1
kind: Pod
    workload.openshift.io/warning: only single-node clusters support workload partitioning

Version-Release number of selected component (if applicable): 4.8.0-0.nightly-2021-06-24-222938

How reproducible: unknown

Steps to Reproduce:
1. Cluster installed
2. "oc describe node" shows 20m CPU requests for this pod

Actual results:
  openshift-cluster-version                         cluster-version-operator-89bf5cdb5-4qhhh                        20m (0%)      0 (0%)      50Mi (0%)        0 (0%)         4h   

Expected results:
  openshift-cluster-version                         cluster-version-operator-89bf5cdb5-4qhhh                        0 (0%)      0 (0%)      50Mi (0%)        0 (0%)         4h   

Additional info:

--- Additional comment from alukiano on 2021-06-27 11:31:08 UTC ---

Can you please provide the installer debug log?

Comment 4 Xingxing Xia 2021-07-05 10:40:28 UTC
Like the 4.9 clone bug 1976379#c3 steps, I tested latest 4.8 non-SNO env (4.8.0-0.nightly-2021-07-04-112043), the issue still exists.
Checked its last o/k commit:
oc adm release info --commits registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-07-04-112043 | grep hyperkube
  hyperkube    https://github.com/openshift/kubernetes    f36aa364667...

https://github.com/openshift/kubernetes/blob/f36aa364667/openshift-kube-apiserver/admission/autoscaling/managementcpusoverride/admission.go#L183-L186 already contains the PR code. Thus moving back to ASSIGNED.

Comment 5 Artyom 2021-07-05 11:10:50 UTC
The problem that this annotation was added for the SNO cluster with the workload partitioning when it should not.
It's ok to have this annotation under the pod under the non-SNO cluster.

Can you please verify the bug for the SNO cluster with the workload partitioning enabled?

Comment 6 Xingxing Xia 2021-07-05 12:22:18 UTC
(In reply to Artyom from comment #5)
Thanks for clarification. Then it is better to have the QE colleague from team of the workload partitioning feature. Let me update.

Comment 8 Artyom 2021-07-06 09:47:51 UTC
Did you enable the workload partitioning during the setup? You should provide an additional machine config manifest to enabled it.
Please see - https://github.com/openshift/enhancements/blob/master/enhancements/workload-partitioning/management-workload-partitioning.md#example-manifests

Comment 12 Neelesh Agrawal 2021-07-16 18:32:32 UTC
*** Bug 1982868 has been marked as a duplicate of this bug. ***

Comment 14 W. Trevor King 2021-07-22 05:06:27 UTC
Neelesh closed bug 1982868 as a dup of this one [1], but while this bug is now VERIFIED, 4.7 -> 4.8 -> 4.7 rollback jobs are still failing.  And a recent failure, from 4.7.20-x86_64 to 4.8.0-0.ci-2021-07-19-070057 and back [3] still blocks with [4]:

  deployment openshift-etcd-operator/etcd-operator has a replica failure FailedCreate: pods "etcd-operator-7b677856dc-" is forbidden: autoscaling.openshift.io/ManagementCPUsOverride infrastructure resource has empty status.controlPlaneTopology or status.infrastructureTopology

Did we want to move this back to ASSIGNED until we get that sorted out?  Or should I reopen bug 1982868 so we can handle it separately?

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1982868#c3
[2]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade-rollback
[3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade-rollback/1417258388370231296
[4]: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade-rollback/1417258388370231296/artifacts/e2e-aws-upgrade-rollback/gather-extra/artifacts/clusterversion.json

Comment 17 errata-xmlrpc 2021-07-27 23:13:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.