Bug 2009530 - Deployment upgrade is failing availability check
Summary: Deployment upgrade is failing availability check
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Special Resource Operator
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.9.0
Assignee: Pablo Acevedo
QA Contact: Walid A.
URL:
Whiteboard:
Depends On: 2009424
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-30 21:21 UTC by dagray
Modified: 2021-10-18 17:52 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2009424
Environment:
Last Closed: 2021-10-18 17:51:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift-psap special-resource-operator pull 104 0 None open Bug 2009530: Fix OperatingSystem() return order 2021-10-05 13:35:11 UTC
Github openshift special-resource-operator pull 39 0 None open [release-4.9] Bug 2009530: poll.go Deployment availability bug fix and Veritas SpecialResource fixes 2021-09-30 21:22:12 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:52:15 UTC

Description dagray 2021-09-30 21:21:06 UTC
+++ This bug was initially created as a clone of Bug #2009424 +++

Description of problem:

Github issue: https://github.com/openshift-psap/special-resource-operator/issues/94

When the image in deployment is updated new replicaSet is created for it. While checking the resource availability for deployment it is stuck, as it keeps referring to old replicaset available replica count.

infoscale-vtas-licensing-controller-68649d5564 0 0 0 22h
infoscale-vtas-licensing-controller-78c4c47dfd 1 1 1 117m

2021-08-24T12:00:04.553Z INFO wait Checking ReplicaSet {"name": "infoscale-vtas-licensing-controller-68649d5564"}
2021-08-24T12:00:04.553Z INFO wait Waiting for availability of {"Kind": "Deployment: infoscale-vtas/infoscale-vtas-licensing-controller"}

2021-08-24T12:00:04.578Z INFO infoscale-vtas RECONCILE REQUEUE: Could not reconcile chart {"error": "Cannot reconcile hardware states: Failed to create state: templates/1000-license-container.yaml: After CRUD hooks failed: Could not wait for resource: Waiting too long for resource: timed out waiting for the condition"}


Version-Release number of selected component (if applicable): 4.9


How reproducible:


Steps to Reproduce:
1. Create a CR that produces a Deployment.
2. Update the CR to use an upgraded helm chart. For example, changing the image.

Actual results:

CR is deployed but is never fully reconciled. It stays looping in:
2021-08-24T12:00:04.578Z INFO infoscale-vtas RECONCILE REQUEUE: Could not reconcile chart {"error": "Cannot reconcile hardware states: Failed to create state: templates/1000-license-container.yaml: After CRUD hooks failed: Could not wait for resource: Waiting too long for resource: timed out waiting for the condition"}


Expected results:
Reconcile loop finishes gracefully.


Additional info:

Comment 2 Walid A. 2021-10-07 17:49:46 UTC
Verified on OCP 4.9rc5 with latest SRO image from release-4.9 github branch (https://github.com/openshift/special-resource-operator.git)

deployed SRO from github repo.

# TAG=release-4.9 make deploy
# oc get pods -n openshift-special-resource-operator
# VERSION=0.0.1 REPO=example SPECIALRESOURCE=ping-pong make
# oc get pods -n ping-pong
NAME                                READY   STATUS    RESTARTS   AGE
ping-pong-client-7fd9cc6848-vbbq9   1/1     Running   0          11m
ping-pong-server-7b8b5c98c4-2wxpb   1/1     Running   0          11m

# oc get deployment -n ping-pong
NAME               READY   UP-TO-DATE   AVAILABLE   AGE
ping-pong-client   1/1     1            1           12m
ping-pong-server   1/1     1            1           12m

## Change image for client and server in each deployment
# oc edit deployment -n ping-pong ping-pong-client
deployment.apps/ping-pong-client edited

# oc edit deployment -n ping-pong ping-pong-server
deployment.apps/ping-pong-server edited

## pod recreated with new image
# oc get pods -n ping-pong
NAME                                READY   STATUS    RESTARTS      AGE
ping-pong-client-c5bff68d7-tmnt2    1/1     Running   2 (25s ago)   65s
ping-pong-server-7cbd48d69c-z9rph   1/1     Running   0             34s



## check SRO manager logs for reconcile success:
# oc logs -n openshift-special-resource-operator special-resource-controller-manager-7b4898899d-kcbxr -c manager | grep RECONCILE
2021-10-07T00:36:35.975Z	INFO	status  	RECONCILE SUCCESS: Reconcile
2021-10-07T00:37:23.187Z	INFO	cert-manager  	RECONCILE REQUEUE: Dependency creation failed 	{"error": "Created new SpecialResource we need to Reconcile"}
2021-10-07T00:38:20.372Z	INFO	cert-manager  	RECONCILE REQUEUE: Could not reconcile chart	{"error": "Cannot reconcile hardware states: failed post-install: hook execution failed cert-manager-startupapicheck cert-manager/templates/startupapicheck-job.yaml: After CRUD hooks failed: Could not wait for resource: Waiting too long for resource: timed out waiting for the condition"}
2021-10-07T00:40:14.893Z	INFO	ping-pong  	RECONCILE SUCCESS: All resources done
2021-10-07T00:40:14.905Z	INFO	status  	RECONCILE SUCCESS: Reconcile
2021-10-07T00:40:28.757Z	INFO	cert-manager  	RECONCILE SUCCESS: All resources done
2021-10-07T00:40:28.771Z	INFO	status  	RECONCILE SUCCESS: Reconcile
2021-10-07T00:41:51.989Z	INFO	ping-pong  	RECONCILE SUCCESS: All resources done
2021-10-07T00:41:52.002Z	INFO	status  	RECONCILE SUCCESS: Reconcile
2021-10-07T00:57:48.497Z	INFO	ping-pong  	RECONCILE SUCCESS: All resources done
2021-10-07T00:57:48.509Z	INFO	status  	RECONCILE SUCCESS: Reconcile
2021-10-07T00:59:12.105Z	INFO	ping-pong  	RECONCILE SUCCESS: All resources done
2021-10-07T00:59:12.116Z	INFO	status  	RECONCILE SUCCESS: Reconcile

Comment 5 errata-xmlrpc 2021-10-18 17:51:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.