searched with https://search.ci.openshift.org/?search=creating+Deployment+object+failed+after+update+failed&maxAge=48h&context=1&type=bug%2Bjunit&name=periodic-ci-openshift-release-master-nightly-4.8-upgrade&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job still can see error: Aug 02 13:37:41.192 - 71s E clusteroperator/monitoring condition/Degraded status/True reason/Failed to rollout the stack. Error: running task Updating Prometheus Operator failed: reconciling Prometheus Operator Deployment failed: creating Deployment object failed after update failed: object is being deleted: deployments.apps "prometheus-operator" already exists example https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/20753/rehearse-20753-periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1422170933774258176 upgraded from 4.7.21 to 4.8.0-0.nightly-2021-07-31-065602 error ************************************************************* Aug 02 13:37:41.192 E clusteroperator/monitoring condition/Available status/False reason/UpdatingPrometheusOperatorFailed changed: Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error. Aug 02 13:37:41.192 E clusteroperator/monitoring condition/Degraded status/True reason/UpdatingPrometheusOperatorFailed changed: Failed to rollout the stack. Error: running task Updating Prometheus Operator failed: reconciling Prometheus Operator Deployment failed: creating Deployment object failed after update failed: object is being deleted: deployments.apps "prometheus-operator" already exists Aug 02 13:37:41.192 - 71s E clusteroperator/monitoring condition/Available status/False reason/Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error. Aug 02 13:37:41.192 - 71s E clusteroperator/monitoring condition/Degraded status/True reason/Failed to rollout the stack. Error: running task Updating Prometheus Operator failed: reconciling Prometheus Operator Deployment failed: creating Deployment object failed after update failed: object is being deleted: deployments.apps "prometheus-operator" already exists Aug 02 13:37:43.169 E ns/openshift-service-ca-operator pod/service-ca-operator-699fdbb947-4cv54 node/ip-10-0-222-211.ec2.internal container/service-ca-operator reason/ContainerExit code/1 cause/Error *************************************************************
Could it be that the log message happens before the cluster is actually upgraded to 4.8 (e.g. the cluster monitoring operator's image version is still 4.7 which doesn't include the fix)? Looking at a 4.7 > 4.8 job [0] * The message is logged at Aug 15 23:58:01.983 [1] * the current CMO's logs start at Aug 16 00:26:31 and don't show the "failed to create Deployment ..." message [2] I can't see any occurrence of the log message for 4.8 > 4.9 upgrades [3]. [0] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1427038807974219776 [1] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1427038807974219776/build-log.txt [2] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1427038807974219776/artifacts/e2e-aws-upgrade/gather-extra/artifacts/pods/openshift-monitoring_cluster-monitoring-operator-9c6747665-tdfnk_cluster-monitoring-operator.log [3] https://search.ci.openshift.org/?search=creating+Deployment+object+failed+after+update+failed&maxAge=336h&context=1&type=junit&name=periodic-ci-openshift-release-master-nightly-4.9-upgrade&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job
I've searched for "creating Deployment object failed after update failed" in all jobs whose names contain "4.8" but not "4.7" (e.g. excluding 4.7 > 4.8 upgrade jobs) [1] and I've found nothing except for release-openshift-origin-installer-old-rhcos-e2e-aws-4.8. But this one is special because despite what the job name claims, it spins up a 4.7 cluster [2]. [1] https://search.ci.openshift.org/?search=creating+Deployment+object+failed+after+update+failed&maxAge=336h&context=1&type=junit&name=.*4.8.*&excludeName=.*4.7.*&maxMatches=5&maxBytes=20971520&groupBy=job [2] https://bugzilla.redhat.com/show_bug.cgi?id=1977095#c2
I think the explanation in comment 7 makes sense so setting back to ON_QA. Is this reasonable to backport to 4.7?
Actually based on the CI confirmation outline din comment 7 lets go all the way to VERIFIED.
https://github.com/openshift/cluster-monitoring-operator/pull/1333#issuecomment-902802506 explains why this probably shouldn't be VERIFIED. I'll move it back to ASSIGNED now and stop meddling in your bugs.
*** This bug has been marked as a duplicate of bug 2005205 ***