+++ This bug was initially created as a clone of Bug #2028217 +++ Tomas and Vadim noticed that, when a Deployment manifest leaves 'replicas' unset, the CVO ignores the property. This means that cluster admins can scale those Deployments up or, worse, down to 0, and the CVO will happily continue on without stomping them. Auditing 4.9.10: $ oc adm release extract --to manifests quay.io/openshift-release-dev/ocp-release:4.9.10-x86_64 Extracted release payload from digest sha256:e1853d68d8ff093ec353ca7078b6b6df1533729688bb016b8208263ee7423f66 created at 2021-12-01T09:19:24Z $ for F in $(grep -rl 'kind: Deployment' manifests); do yaml2json < "${F}" | jq -r '.[] | select(.kind == "Deployment" and .spec.replicas == null).metadata | .namespace + " " + .name'; done | sort | uniq openshift-cluster-machine-approver machine-approver openshift-insights insights-operator openshift-network-operator network-operator Those are all important operators, and I'm fairly confident that none of their maintainers expect "cluster admin scales them down to 0" to be a supported UX. We should have the CVO default Deployment replicas to 1 (the type's default [1]), so admins who decide they don't want a network operator pod, etc., have to use some more explicit, alarming API to remove those pods (e.g. setting spec.overrides in the ClusterVersion object to assume control of the resource themselves). [1]: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/deployment-v1/#DeploymentSpec --- Additional comment from wking on 2021-12-02 18:07:42 UTC --- *** Bug 2028599 has been marked as a duplicate of this bug. ***
Performing PR pre-merge verification: 1. Spin up a cluster with the PR using cluster-bot launch openshift/cluster-version-operator#702 gcp $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.ci.test-2021-12-07-032421-ci-ln-i4b7ipb-latest True False 102m Cluster version is 4.9.0-0.ci.test-2021-12-07-032421-ci-ln-i4b7ipb-latest 2. Check replicas $ oc -n openshift-network-operator get -o jsonpath='{.spec.replicas}{"\n"}' deployment network-operator 1 3. Scale the deployment $ oc -n openshift-network-operator scale --replicas 0 deployment/network-operator deployment.apps/network-operator scaled 4. After a few minutes, check the replica $ oc -n openshift-network-operator get -o jsonpath='{.spec.replicas}{"\n"}' deployment network-operator 1 $ oc -n openshift-network-operator get pod NAME READY STATUS RESTARTS AGE network-operator-59694fc78d-v85hj 1/1 Running 0 35s CVO defaults the deployment replica to one. The verification passed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.9.12 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:5214