Bug 1982369 - CMO fails to delete/recreate the deployment resource after '422 Unprocessable Entity' update response
Summary: CMO fails to delete/recreate the deployment resource after '422 Unprocessable...
Keywords:
Status: CLOSED DUPLICATE of bug 2005205
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.8.z
Assignee: Jayapriya Pai
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On: 1949840 1956308 2005205 2005206
Blocks: 1996132
TreeView+ depends on / blocked
 
Reported: 2021-07-14 18:03 UTC by OpenShift BugZilla Robot
Modified: 2021-09-17 06:29 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-17 06:21:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 1285 0 None open [release-4.8] Bug 1982369: Fix deployment update with retry option 2021-07-14 18:03:49 UTC

Comment 5 Junqi Zhao 2021-08-03 03:44:24 UTC
searched with
https://search.ci.openshift.org/?search=creating+Deployment+object+failed+after+update+failed&maxAge=48h&context=1&type=bug%2Bjunit&name=periodic-ci-openshift-release-master-nightly-4.8-upgrade&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

still can see error: 
Aug 02 13:37:41.192 - 71s   E clusteroperator/monitoring condition/Degraded status/True reason/Failed to rollout the stack. Error: running task Updating Prometheus Operator failed: reconciling Prometheus Operator Deployment failed: creating Deployment object failed after update failed: object is being deleted: deployments.apps "prometheus-operator" already exists

example
https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/20753/rehearse-20753-periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1422170933774258176

upgraded from 4.7.21 to 4.8.0-0.nightly-2021-07-31-065602
error
*************************************************************
Aug 02 13:37:41.192 E clusteroperator/monitoring condition/Available status/False reason/UpdatingPrometheusOperatorFailed changed: Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error.
Aug 02 13:37:41.192 E clusteroperator/monitoring condition/Degraded status/True reason/UpdatingPrometheusOperatorFailed changed: Failed to rollout the stack. Error: running task Updating Prometheus Operator failed: reconciling Prometheus Operator Deployment failed: creating Deployment object failed after update failed: object is being deleted: deployments.apps "prometheus-operator" already exists
Aug 02 13:37:41.192 - 71s   E clusteroperator/monitoring condition/Available status/False reason/Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error.
Aug 02 13:37:41.192 - 71s   E clusteroperator/monitoring condition/Degraded status/True reason/Failed to rollout the stack. Error: running task Updating Prometheus Operator failed: reconciling Prometheus Operator Deployment failed: creating Deployment object failed after update failed: object is being deleted: deployments.apps "prometheus-operator" already exists
Aug 02 13:37:43.169 E ns/openshift-service-ca-operator pod/service-ca-operator-699fdbb947-4cv54 node/ip-10-0-222-211.ec2.internal container/service-ca-operator reason/ContainerExit code/1 cause/Error
*************************************************************

Comment 8 Simon Pasquier 2021-08-19 16:02:29 UTC
I've searched for "creating Deployment object failed after update failed" in all jobs whose names contain "4.8" but not "4.7" (e.g. excluding 4.7 > 4.8 upgrade jobs) [1] and I've found nothing except for release-openshift-origin-installer-old-rhcos-e2e-aws-4.8. But this one is special because despite what the job name claims, it spins up a 4.7 cluster [2].

[1] https://search.ci.openshift.org/?search=creating+Deployment+object+failed+after+update+failed&maxAge=336h&context=1&type=junit&name=.*4.8.*&excludeName=.*4.7.*&maxMatches=5&maxBytes=20971520&groupBy=job
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1977095#c2

Comment 9 Scott Dodson 2021-08-20 16:00:45 UTC
I think the explanation in comment 7 makes sense so setting back to ON_QA. Is this reasonable to backport to 4.7?

Comment 10 Scott Dodson 2021-08-20 16:02:47 UTC
Actually based on the CI confirmation outline din comment 7 lets go all the way to VERIFIED.

Comment 11 Scott Dodson 2021-08-20 17:12:12 UTC
https://github.com/openshift/cluster-monitoring-operator/pull/1333#issuecomment-902802506 explains why this probably shouldn't be VERIFIED. I'll move it back to ASSIGNED now and stop meddling in your bugs.

Comment 16 Jayapriya Pai 2021-09-17 06:21:57 UTC

*** This bug has been marked as a duplicate of bug 2005205 ***


Note You need to log in before you can comment on or make changes to this bug.