Bug 1967423 - [master] clusterDeployments controller should take 1m to reqeueue when failing with AddOpenshiftVersion
Summary: [master] clusterDeployments controller should take 1m to reqeueue when failin...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: assisted-installer
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Nir Magnezi
QA Contact: Yuri Obshansky
URL:
Whiteboard: AI-Team-Hive KNI-EDGE-4.8
Depends On:
Blocks: 1967578
TreeView+ depends on / blocked
 
Reported: 2021-06-03 06:56 UTC by Nir Magnezi
Modified: 2021-07-27 23:11 UTC (History)
4 users (show)

Fixed In Version: OCP-Metal-v1.0.21.3
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1967578 (view as bug list)
Environment:
Last Closed: 2021-07-27 23:11:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift assisted-service pull 1912 0 None open Bug 1967423: CD ctrl reqeueue after 1m when failing with AddOpenshiftVersion 2021-06-03 13:50:09 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:11:42 UTC

Internal Links: 1967578

Description Nir Magnezi 2021-06-03 06:56:22 UTC
Description of problem:
=======================
The clusterDeployments invokes AddOpenshiftVersion() with parameters coming from clusterImageSet[1]. If it fails, (e.g. bad URL or it is just unreachable at the moment), the controller currently requeues every 10s.

The controller should keep trying since we cannot currently determine the reason for the failure. However, in this case, it should wait for 1 minute and not 10 seconds.


[1] https://github.com/openshift/assisted-service/blob/master/docs/crds/clusterImageSet.yaml

Comment 2 nshidlin 2021-06-08 05:51:55 UTC
Verified with:
assisted-service: quay.io/ocpmetal/assisted-service@sha256:2706a902016fdbda8ca61a69052f22275d51f9cbbc18e877fb34d83055949d82

1 minute in between reconcile in the case of of invalid clusterimageset:

time="2021-06-08T05:44:06Z" level=error msg="failed to add OCP version" func="github.com/openshift/assisted-service/internal/controller/controllers.(*ClusterDeploymentsRecon
ciler).createNewCluster" file="/go/src/github.com/openshift/origin/internal/controller/controllers/clusterdeployments_controller.go:790" agent_cluster_install=sno-0-agent-cl
uster-install agent_cluster_install_namespace=assisted-installer cluster_deployment=sno-0-cluster-deployment cluster_deployment_namespace=assisted-installer error="command o
c adm release info -o template --template '{{.metadata.version}}' --insecure=false quay.io/openshift-release-dev/ocp-release:4.8.0-fc.17-x86_64 exited with non-zero exit cod
e 1: \nerror: image \"quay.io/openshift-release-dev/ocp-release:4.8.0-fc.17-x86_64\" not found: manifest unknown: manifest unknown\n" go-id=647 request_id=06e5c639-a375-4843
-9e8e-00fc5aca2a1f
time="2021-06-08T05:44:06Z" level=info msg="ClusterDeployment Reconcile ended" func="github.com/openshift/assisted-service/internal/controller/controllers.(*ClusterDeploymen
tsReconciler).Reconcile.func1" file="/go/src/github.com/openshift/origin/internal/controller/controllers/clusterdeployments_controller.go:110" agent_cluster_install=sno-0-ag
ent-cluster-install agent_cluster_install_namespace=assisted-installer cluster_deployment=sno-0-cluster-deployment cluster_deployment_namespace=assisted-installer go-id=647
request_id=06e5c639-a375-4843-9e8e-00fc5aca2a1f
time="2021-06-08T05:45:05Z" level=info msg="ClusterDeployment Reconcile started" func="github.com/openshift/assisted-service/internal/controller/controllers.(*ClusterDeploym
entsReconciler).Reconcile" file="/go/src/github.com/openshift/origin/internal/controller/controllers/clusterdeployments_controller.go:113" cluster_deployment=sno-0-cluster-d
eployment cluster_deployment_namespace=assisted-installer go-id=647 request_id=8aa6ed8a-0e8f-48d7-a644-2096b2dbdd72

Comment 5 errata-xmlrpc 2021-07-27 23:11:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.