Bug 1956308 - CMO fails to delete/recreate the deployment resource after '422 Unprocessable Entity' update response
Summary: CMO fails to delete/recreate the deployment resource after '422 Unprocessable...
Keywords:
Status: ASSIGNED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: ---
Assignee: Simon Pasquier
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-03 12:53 UTC by Simon Pasquier
Modified: 2021-05-04 07:02 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)

Description Simon Pasquier 2021-05-03 12:53:42 UTC
Description of problem:
When CMO trying to update a deployment, CMO might go degraded/unavailable for some time.

Version-Release number of selected component (if applicable):
4.8

How reproducible:
Not always but happens quite often in the CI.
https://search.ci.openshift.org/?search=creating+Deployment+object+failed+after+update+failed&maxAge=168h&context=1&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Steps to Reproduce:
1. Upgrade from 4.7 to 4.8.
2.
3.

Actual results:
The operator goes Degraded and Unavailable for a short period of time, the reason being '... Deployment failed: creating Deployment object failed after update failed: object is being deleted: deployments.apps "xxx" already exists'.

Expected results:
No error

Additional info:
https://github.com/openshift/cluster-monitoring-operator/blob/7f4925a7203622d70b3007fbddfb6bc5cce6c1d9/pkg/client/client.go#L716-L731

The issue is probably that the delete operation being asynchronous, CMO should wait for the deployment to be effectively removed before trying to recreate it.


Note You need to log in before you can comment on or make changes to this bug.