Bug 1967423

Summary: [master] clusterDeployments controller should take 1m to reqeueue when failing with AddOpenshiftVersion
Product: OpenShift Container Platform Reporter: Nir Magnezi <nmagnezi>
Component: assisted-installerAssignee: Nir Magnezi <nmagnezi>
assisted-installer sub component: assisted-service QA Contact: Yuri Obshansky <yobshans>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: alazar, aos-bugs, mfilanov, nshidlin
Version: 4.8Keywords: Triaged
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: AI-Team-Hive KNI-EDGE-4.8
Fixed In Version: OCP-Metal-v1.0.21.3 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1967578 (view as bug list) Environment:
Last Closed: 2021-07-27 23:11:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1967578    

Description Nir Magnezi 2021-06-03 06:56:22 UTC
Description of problem:
=======================
The clusterDeployments invokes AddOpenshiftVersion() with parameters coming from clusterImageSet[1]. If it fails, (e.g. bad URL or it is just unreachable at the moment), the controller currently requeues every 10s.

The controller should keep trying since we cannot currently determine the reason for the failure. However, in this case, it should wait for 1 minute and not 10 seconds.


[1] https://github.com/openshift/assisted-service/blob/master/docs/crds/clusterImageSet.yaml

Comment 2 nshidlin 2021-06-08 05:51:55 UTC
Verified with:
assisted-service: quay.io/ocpmetal/assisted-service@sha256:2706a902016fdbda8ca61a69052f22275d51f9cbbc18e877fb34d83055949d82

1 minute in between reconcile in the case of of invalid clusterimageset:

time="2021-06-08T05:44:06Z" level=error msg="failed to add OCP version" func="github.com/openshift/assisted-service/internal/controller/controllers.(*ClusterDeploymentsRecon
ciler).createNewCluster" file="/go/src/github.com/openshift/origin/internal/controller/controllers/clusterdeployments_controller.go:790" agent_cluster_install=sno-0-agent-cl
uster-install agent_cluster_install_namespace=assisted-installer cluster_deployment=sno-0-cluster-deployment cluster_deployment_namespace=assisted-installer error="command o
c adm release info -o template --template '{{.metadata.version}}' --insecure=false quay.io/openshift-release-dev/ocp-release:4.8.0-fc.17-x86_64 exited with non-zero exit cod
e 1: \nerror: image \"quay.io/openshift-release-dev/ocp-release:4.8.0-fc.17-x86_64\" not found: manifest unknown: manifest unknown\n" go-id=647 request_id=06e5c639-a375-4843
-9e8e-00fc5aca2a1f
time="2021-06-08T05:44:06Z" level=info msg="ClusterDeployment Reconcile ended" func="github.com/openshift/assisted-service/internal/controller/controllers.(*ClusterDeploymen
tsReconciler).Reconcile.func1" file="/go/src/github.com/openshift/origin/internal/controller/controllers/clusterdeployments_controller.go:110" agent_cluster_install=sno-0-ag
ent-cluster-install agent_cluster_install_namespace=assisted-installer cluster_deployment=sno-0-cluster-deployment cluster_deployment_namespace=assisted-installer go-id=647
request_id=06e5c639-a375-4843-9e8e-00fc5aca2a1f
time="2021-06-08T05:45:05Z" level=info msg="ClusterDeployment Reconcile started" func="github.com/openshift/assisted-service/internal/controller/controllers.(*ClusterDeploym
entsReconciler).Reconcile" file="/go/src/github.com/openshift/origin/internal/controller/controllers/clusterdeployments_controller.go:113" cluster_deployment=sno-0-cluster-d
eployment cluster_deployment_namespace=assisted-installer go-id=647 request_id=8aa6ed8a-0e8f-48d7-a644-2096b2dbdd72

Comment 5 errata-xmlrpc 2021-07-27 23:11:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438