Bug 1982369

Summary: CMO fails to delete/recreate the deployment resource after '422 Unprocessable Entity' update response
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: MonitoringAssignee: Jayapriya Pai <janantha>
Status: CLOSED DUPLICATE QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: high    
Version: 4.8CC: alegrand, amuller, anpicker, aos-bugs, dgrisonn, eparis, erooth, lcosic, spasquie, wking
Target Milestone: ---Keywords: Upgrades
Target Release: 4.8.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-17 06:21:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1949840, 1956308, 2005205, 2005206    
Bug Blocks: 1996132    

Comment 5 Junqi Zhao 2021-08-03 03:44:24 UTC
searched with
https://search.ci.openshift.org/?search=creating+Deployment+object+failed+after+update+failed&maxAge=48h&context=1&type=bug%2Bjunit&name=periodic-ci-openshift-release-master-nightly-4.8-upgrade&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

still can see error: 
Aug 02 13:37:41.192 - 71s   E clusteroperator/monitoring condition/Degraded status/True reason/Failed to rollout the stack. Error: running task Updating Prometheus Operator failed: reconciling Prometheus Operator Deployment failed: creating Deployment object failed after update failed: object is being deleted: deployments.apps "prometheus-operator" already exists

example
https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/20753/rehearse-20753-periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1422170933774258176

upgraded from 4.7.21 to 4.8.0-0.nightly-2021-07-31-065602
error
*************************************************************
Aug 02 13:37:41.192 E clusteroperator/monitoring condition/Available status/False reason/UpdatingPrometheusOperatorFailed changed: Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error.
Aug 02 13:37:41.192 E clusteroperator/monitoring condition/Degraded status/True reason/UpdatingPrometheusOperatorFailed changed: Failed to rollout the stack. Error: running task Updating Prometheus Operator failed: reconciling Prometheus Operator Deployment failed: creating Deployment object failed after update failed: object is being deleted: deployments.apps "prometheus-operator" already exists
Aug 02 13:37:41.192 - 71s   E clusteroperator/monitoring condition/Available status/False reason/Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error.
Aug 02 13:37:41.192 - 71s   E clusteroperator/monitoring condition/Degraded status/True reason/Failed to rollout the stack. Error: running task Updating Prometheus Operator failed: reconciling Prometheus Operator Deployment failed: creating Deployment object failed after update failed: object is being deleted: deployments.apps "prometheus-operator" already exists
Aug 02 13:37:43.169 E ns/openshift-service-ca-operator pod/service-ca-operator-699fdbb947-4cv54 node/ip-10-0-222-211.ec2.internal container/service-ca-operator reason/ContainerExit code/1 cause/Error
*************************************************************

Comment 8 Simon Pasquier 2021-08-19 16:02:29 UTC
I've searched for "creating Deployment object failed after update failed" in all jobs whose names contain "4.8" but not "4.7" (e.g. excluding 4.7 > 4.8 upgrade jobs) [1] and I've found nothing except for release-openshift-origin-installer-old-rhcos-e2e-aws-4.8. But this one is special because despite what the job name claims, it spins up a 4.7 cluster [2].

[1] https://search.ci.openshift.org/?search=creating+Deployment+object+failed+after+update+failed&maxAge=336h&context=1&type=junit&name=.*4.8.*&excludeName=.*4.7.*&maxMatches=5&maxBytes=20971520&groupBy=job
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1977095#c2

Comment 9 Scott Dodson 2021-08-20 16:00:45 UTC
I think the explanation in comment 7 makes sense so setting back to ON_QA. Is this reasonable to backport to 4.7?

Comment 10 Scott Dodson 2021-08-20 16:02:47 UTC
Actually based on the CI confirmation outline din comment 7 lets go all the way to VERIFIED.

Comment 11 Scott Dodson 2021-08-20 17:12:12 UTC
https://github.com/openshift/cluster-monitoring-operator/pull/1333#issuecomment-902802506 explains why this probably shouldn't be VERIFIED. I'll move it back to ASSIGNED now and stop meddling in your bugs.

Comment 16 Jayapriya Pai 2021-09-17 06:21:57 UTC

*** This bug has been marked as a duplicate of bug 2005205 ***