Bug 1971654 - [4.8.0] InfraEnv controller should always requeue for backend response HTTP StatusConflict (code 409)
Summary: [4.8.0] InfraEnv controller should always requeue for backend response HTTP S...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: assisted-installer
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Nir Magnezi
QA Contact: Yuri Obshansky
URL:
Whiteboard: AI-Team-Hive KNI-EDGE-4.8
Depends On: 1971640
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-14 13:56 UTC by Michael Filanov
Modified: 2021-07-27 23:13 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1971640
Environment:
Last Closed: 2021-07-27 23:12:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift assisted-service pull 2073 0 None open Bug 1971654: InfraEnv should always requeue on HTTP 409 backend response 2021-06-24 05:13:59 UTC
Red Hat Bugzilla 1971640 1 high CLOSED [master] InfraEnv controller should always requeue for backend response HTTP StatusConflict (code 409) 2021-10-18 17:34:23 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:13:09 UTC

Description Michael Filanov 2021-06-14 13:56:02 UTC
+++ This bug was initially created as a clone of Bug #1971640 +++

Description of problem:
=======================
InfraEnv controller should always requeue for backend response HTTP StatusConflict (code 409)

The InfraEnv controller is expected to get HTTP StatusConflict (code 409) from the backend for requests sent in a gap smaller than 10 seconds[1].

The backend generates that response without looking at the Generate ISO request content (which may either be valid or not). 
The image should always be on par with the spec. For that, each request must be served by the backend.

Currently, the controller will not requeue in a case of 409, if the conditions indicate an error[2].

The controller should always requeue, and set RequeueAfter to something larger than 10 seconds to avoid repeated 409s.

[1] https://github.com/openshift/assisted-service/blob/52dd09d9b935b35a5a92d146581bd142e3339940/internal/bminventory/inventory.go#L897-L908
[2] https://github.com/openshift/assisted-service/blob/01e814cbe3ddb273197c70b7767bceffe275a87a/internal/controller/controllers/infraenv_controller.go#L344-L346

Comment 1 Nir Magnezi 2021-06-27 08:44:49 UTC
Merged to ocm-2.3 https://github.com/openshift/assisted-service/pull/2073

Comment 5 nshidlin 2021-07-11 08:59:25 UTC
Verified with ACM: 2.3.0-DOWNSTREAM-2021-07-09-20-21-59

Steps to verify:
1. trigger infraenv reconcile by patching the ssh key: 
oc patch infraenvs.agent-install.openshift.io sno-0 --type merge -p '{"spec":{"sshAuthorizedKey":"ssh-rsa <ssh_key_1>"}}'

2. sleep 3 seconds

3. trigger a second infraenv reconcile by patching a different ssh keyoc patch infraenvs.agent-install.openshift.io sno-0 --type merge -p '{"spec":{"sshAuthorizedKey":"ssh-rsa <ssh_key_2>"}}'

The first attempt to patch the second ssh key fails as this is less than 10 seconds since the last request to generate the ISO:
time="2021-07-11T08:28:25Z" level=info msg="InfraEnv Reconcile started" func="github.com/openshift/assisted-service/internal/controller/controllers.(*InfraEnvReconciler).Reconcile" file="/remote-source/assisted-service/app/internal/controller/controllers/infraenv_controller.go:81" go-id=3237 infra_env=sno-0 infra_env_namespace=sno-0 request_id=6aa1667d-b300-4fbb-9cc9-d5eead1230af
time="2021-07-11T08:28:25Z" level=info msg="prepare image for cluster daeee389-0377-4bc4-a078-f65a81bd488c" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).GenerateClusterISOInternal" file="/remote-source/assisted-service/app/internal/bminventory/inventory.go:850" go-id=3237 pkg=Inventory request_id=6aa1667d-b300-4fbb-9cc9-d5eead1230af
time="2021-07-11T08:28:25Z" level=error msg="request came too soon after previous request" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).GenerateClusterISOInternal" file="/remote-source/assisted-service/app/internal/bminventory/inventory.go:904" go-id=3237 pkg=Inventory request_id=6aa1667d-b300-4fbb-9cc9-d5eead1230af
time="2021-07-11T08:28:25Z" level=error msg="generate cluster ISO failed" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).GenerateClusterISOInternal.func1" file="/remote-source/assisted-service/app/internal/bminventory/inventory.go:875" go-id=3237 pkg=Inventory request_id=6aa1667d-b300-4fbb-9cc9-d5eead1230af
time="2021-07-11T08:28:25Z" level=info msg="Image sno-0 being prepared for cluster sno-0" func="github.com/openshift/assisted-service/internal/controller/controllers.(*InfraEnvReconciler).handleEnsureISOErrors" file="/remote-source/assisted-service/app/internal/controller/controllers/infraenv_controller.go:360" go-id=3237 infra_env=sno-0 infra_env_namespace=sno-0 request_id=6aa1667d-b300-4fbb-9cc9-d5eead1230af
time="2021-07-11T08:28:25Z" level=info msg="InfraEnv Reconcile ended" func="github.com/openshift/assisted-service/internal/controller/controllers.(*InfraEnvReconciler).Reconcile.func1" file="/remote-source/assisted-service/app/internal/controller/controllers/infraenv_controller.go:78" go-id=3237 infra_env=sno-0 infra_env_namespace=sno-0 request_id=6aa1667d-b300-4fbb-9cc9-d5eead1230af

The infraenv reconcile is re-queued ~20 secs later:
time="2021-07-11T08:28:45Z" level=info msg="InfraEnv Reconcile started" func="github.com/openshift/assisted-service/internal/controller/controllers.(*InfraEnvReconciler).Reconcile" file="/remote-source/assisted-service/app/internal/controller/controllers/infraenv_controller.go:81" go-id=3237 infra_env=sno-0 infra_env_namespace=sno-0 request_id=ed5e60b3-edc5-4eea-83da-fe03c1213f57
time="2021-07-11T08:28:45Z" level=info msg="prepare image for cluster daeee389-0377-4bc4-a078-f65a81bd488c" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).GenerateClusterISOInternal" file="/remote-source/assisted-service/app/internal/bminventory/inventory.go:850" go-id=3237 pkg=Inventory request_id=ed5e60b3-edc5-4eea-83da-fe03c1213f57
time="2021-07-11T08:28:45Z" level=info msg="Successfully uploaded file daeee389-0377-4bc4-a078-f65a81bd488c/discovery.ign" func="github.com/openshift/assisted-service/pkg/s3wrapper.(*FSClient).Upload" file="/remote-source/assisted-service/app/pkg/s3wrapper/filesystem.go:76" go-id=3237 request_id=ed5e60b3-edc5-4eea-83da-fe03c1213f57
time="2021-07-11T08:28:45Z" level=info msg="Creating minimal ISO for cluster daeee389-0377-4bc4-a078-f65a81bd488c" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).generateClusterMinimalISO.func1" file="/remote-source/assisted-service/app/internal/bminventory/inventory.go:1087" go-id=3237 pkg=Inventory request_id=ed5e60b3-edc5-4eea-83da-fe03c1213f57
time="2021-07-11T08:28:45Z" level=info msg="Uploading minimal ISO for cluster daeee389-0377-4bc4-a078-f65a81bd488c" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).generateClusterMinimalISO" file="/remote-source/assisted-service/app/internal/bminventory/inventory.go:1103" go-id=3237 pkg=Inventory request_id=ed5e60b3-edc5-4eea-83da-fe03c1213f57
time="2021-07-11T08:28:45Z" level=info msg="Successfully uploaded file discovery-image-daeee389-0377-4bc4-a078-f65a81bd488c.iso" func="github.com/openshift/assisted-service/pkg/s3wrapper.(*FSClient).UploadStream" file="/remote-source/assisted-service/app/pkg/s3wrapper/filesystem.go:159" go-id=3237 request_id=ed5e60b3-edc5-4eea-83da-fe03c1213f57
...
time="2021-07-11T08:28:45Z" level=info msg="ISODownloadURL changed from https://assisted-service-rhacm.apps.seal12-cluster-assisted-0.qe.lab.redhat.com/api/assisted-install/v1/clusters/daeee389-0377-4bc4-a078-f65a81bd488c/downloads/image?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbHVzdGVyX2lkIjoiZGFlZWUzODktMDM3Ny00YmM0LWEwNzgtZjY1YTgxYmQ0ODhjIn0.GWj9Fa5clVTS7oh5NxSPVgKYg1sdtNR6eliqhz4cO4WMMYxQDcbRknbr5Kvc2QXiyCP3DFh8lHTM28MZiGBYVw to https://assisted-service-rhacm.apps.seal12-cluster-assisted-0.qe.lab.redhat.com/api/assisted-install/v1/clusters/daeee389-0377-4bc4-a078-f65a81bd488c/downloads/image?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbHVzdGVyX2lkIjoiZGFlZWUzODktMDM3Ny00YmM0LWEwNzgtZjY1YTgxYmQ0ODhjIn0.WeRl73aq14ocUbhsohBunrYdEP2mEqSQz7jcL-lKKrB0YzCZ5-6EKX0eJZr9UcuWRXU6KQuI1H-hVPZie7gpYg" func="github.com/openshift/assisted-service/internal/controller/controllers.(*InfraEnvReconciler).updateEnsureISOSuccess" file="/remote-source/assisted-service/app/internal/controller/controllers/infraenv_controller.go:328" go-id=3237 infra_env=sno-0 infra_env_namespace=sno-0 request_id=ed5e60b3-edc5-4eea-83da-fe03c1213f57
time="2021-07-11T08:28:45Z" level=info msg="InfraEnv Reconcile ended" func="github.com/openshift/assisted-service/internal/controller/controllers.(*InfraEnvReconciler).Reconcile.func1" file="/remote-source/assisted-service/app/internal/controller/controllers/infraenv_controller.go:78" go-id=3237 infra_env=sno-0 infra_env_namespace=sno-0 request_id=ed5e60b3-edc5-4eea-83da-fe03c1213f57

Comment 8 errata-xmlrpc 2021-07-27 23:12:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.