Bug 1967578

Summary: [4.8.0] clusterDeployments controller should take 1m to reqeueue when failing with AddOpenshiftVersion
Product: OpenShift Container Platform Reporter: Ronnie Lazar <alazar>
Component: assisted-installerAssignee: Nir Magnezi <nmagnezi>
assisted-installer sub component: assisted-service QA Contact: Yuri Obshansky <yobshans>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, mfilanov, nmagnezi, trwest
Version: 4.8Keywords: Triaged
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: AI-Team-Hive KNI-EDGE-4.8 KNI-EDGE-BH-4.8
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1967423 Environment:
Last Closed: 2021-07-27 23:11:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1967423    
Bug Blocks:    

Description Ronnie Lazar 2021-06-03 12:19:28 UTC
+++ This bug was initially created as a clone of Bug #1967423 +++

Description of problem:
=======================
The clusterDeployments invokes AddOpenshiftVersion() with parameters coming from clusterImageSet[1]. If it fails, (e.g. bad URL or it is just unreachable at the moment), the controller currently requeues every 10s.

The controller should keep trying since we cannot currently determine the reason for the failure. However, in this case, it should wait for 1 minute and not 10 seconds.


[1] https://github.com/openshift/assisted-service/blob/master/docs/crds/clusterImageSet.yaml

Comment 3 Trey West 2021-06-18 15:38:05 UTC
Hey @nmagnezi,

I am verifying on ACM 2.3. I think it looks okay but I noticed that the first few requests were repeated within a second. Any idea what is causing that?

time="2021-06-18T15:23:06Z" level=error msg="Failed to add OCP version for release image: quay.io/openshift-release-dev/ocp-release:4.9.0" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).AddOpenshiftVersion" file="/remote-source/app/internal/bminventory/inventory.go:4466" error="command oc adm release info -o template --template '{{.metadata.version}}' --insecure=false quay.io/openshift-release-dev/ocp-release:4.9.0 exited with non-zero exit code 1: \nerror: image \"quay.io/openshift-release-dev/ocp-release:4.9.0\" not found: manifest unknown: manifest unknown\n" go-id=1075 pkg=Inventory request_id=b8acbcbe-370f-456b-9450-cd9fa636a46a
time="2021-06-18T15:23:07Z" level=error msg="Failed to add OCP version for release image: quay.io/openshift-release-dev/ocp-release:4.9.0" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).AddOpenshiftVersion" file="/remote-source/app/internal/bminventory/inventory.go:4466" error="command oc adm release info -o template --template '{{.metadata.version}}' --insecure=false quay.io/openshift-release-dev/ocp-release:4.9.0 exited with non-zero exit code 1: \nerror: image \"quay.io/openshift-release-dev/ocp-release:4.9.0\" not found: manifest unknown: manifest unknown\n" go-id=1075 pkg=Inventory request_id=588f2303-3ec3-4fe0-943c-477286aba11c
time="2021-06-18T15:23:07Z" level=error msg="Failed to add OCP version for release image: quay.io/openshift-release-dev/ocp-release:4.9.0" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).AddOpenshiftVersion" file="/remote-source/app/internal/bminventory/inventory.go:4466" error="command oc adm release info -o template --template '{{.metadata.version}}' --insecure=false quay.io/openshift-release-dev/ocp-release:4.9.0 exited with non-zero exit code 1: \nerror: image \"quay.io/openshift-release-dev/ocp-release:4.9.0\" not found: manifest unknown: manifest unknown\n" go-id=1075 pkg=Inventory request_id=3f02b3ee-72e4-49ca-91da-049eb9d179e0
time="2021-06-18T15:24:07Z" level=error msg="Failed to add OCP version for release image: quay.io/openshift-release-dev/ocp-release:4.9.0" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).AddOpenshiftVersion" file="/remote-source/app/internal/bminventory/inventory.go:4466" error="command oc adm release info -o template --template '{{.metadata.version}}' --insecure=false quay.io/openshift-release-dev/ocp-release:4.9.0 exited with non-zero exit code 1: \nerror: image \"quay.io/openshift-release-dev/ocp-release:4.9.0\" not found: manifest unknown: manifest unknown\n" go-id=1075 pkg=Inventory request_id=87b9c7f5-dd03-45df-8884-6ebf3a33eedd
time="2021-06-18T15:25:05Z" level=error msg="Failed to add OCP version for release image: quay.io/openshift-release-dev/ocp-release:4.9.0" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).AddOpenshiftVersion" file="/remote-source/app/internal/bminventory/inventory.go:4466" error="command oc adm release info -o template --template '{{.metadata.version}}' --insecure=false quay.io/openshift-release-dev/ocp-release:4.9.0 exited with non-zero exit code 1: \nerror: image \"quay.io/openshift-release-dev/ocp-release:4.9.0\" not found: manifest unknown: manifest unknown\n" go-id=1075 pkg=Inventory request_id=f2730a8f-399e-4f9b-92cd-8e33d6967870
time="2021-06-18T15:25:07Z" level=error msg="Failed to add OCP version for release image: quay.io/openshift-release-dev/ocp-release:4.9.0" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).AddOpenshiftVersion" file="/remote-source/app/internal/bminventory/inventory.go:4466" error="command oc adm release info -o template --template '{{.metadata.version}}' --insecure=false quay.io/openshift-release-dev/ocp-release:4.9.0 exited with non-zero exit code 1: \nerror: image \"quay.io/openshift-release-dev/ocp-release:4.9.0\" not found: manifest unknown: manifest unknown\n" go-id=1075 pkg=Inventory request_id=5b40bb02-d050-48b2-ba65-42be5c053841
time="2021-06-18T15:26:08Z" level=error msg="Failed to add OCP version for release image: quay.io/openshift-release-dev/ocp-release:4.9.0" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).AddOpenshiftVersion" file="/remote-source/app/internal/bminventory/inventory.go:4466" error="command oc adm release info -o template --template '{{.metadata.version}}' --insecure=false quay.io/openshift-release-dev/ocp-release:4.9.0 exited with non-zero exit code 1: \nerror: image \"quay.io/openshift-release-dev/ocp-release:4.9.0\" not found: manifest unknown: manifest unknown\n" go-id=1075 pkg=Inventory request_id=b8d90b25-141b-4ed1-87cf-26bcbdd06568
time="2021-06-18T15:27:08Z" level=error msg="Failed to add OCP version for release image: quay.io/openshift-release-dev/ocp-release:4.9.0" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).AddOpenshiftVersion" file="/remote-source/app/internal/bminventory/inventory.go:4466" error="command oc adm release info -o template --template '{{.metadata.version}}' --insecure=false quay.io/openshift-release-dev/ocp-release:4.9.0 exited with non-zero exit code 1: \nerror: image \"quay.io/openshift-release-dev/ocp-release:4.9.0\" not found: manifest unknown: manifest unknown\n" go-id=1075 pkg=Inventory request_id=07ae216c-1f5d-4007-9541-4aa3c73f8b29
time="2021-06-18T15:28:09Z" level=error msg="Failed to add OCP version for release image: quay.io/openshift-release-dev/ocp-release:4.9.0" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).AddOpenshiftVersion" file="/remote-source/app/internal/bminventory/inventory.go:4466" error="command oc adm release info -o template --template '{{.metadata.version}}' --insecure=false quay.io/openshift-release-dev/ocp-release:4.9.0 exited with non-zero exit code 1: \nerror: image \"quay.io/openshift-release-dev/ocp-release:4.9.0\" not found: manifest unknown: manifest unknown\n" go-id=1075 pkg=Inventory request_id=ed13e609-a18d-4bae-a80c-a8ab95371ba5

Comment 4 Nir Magnezi 2021-06-21 14:58:43 UTC
Hi Trey,

Hard to say without looking at the entire log (to better understand exactly what triggered each reconciliation).

The last iterations seem spaced by 1m, which is what we expect.
The first iterations might have been called in parallel, and didn't trigger each other, as it happens when I tested it: https://gist.github.com/nmagnezi/cf777c0ceca1dd8e7bbee545111ff3c2

Let me know if that answers your question.
Nir

Comment 5 Trey West 2021-06-22 13:45:25 UTC
Verified

Comment 7 errata-xmlrpc 2021-07-27 23:11:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438