Bug 1949061 - [assisted operator][nmstate] Continuous attempts to reconcile InstallEnv in the case of invalid NMStateConfig
Summary: [assisted operator][nmstate] Continuous attempts to reconcile InstallEnv in ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: assisted-installer
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.8.0
Assignee: Nir Magnezi
QA Contact: Yuri Obshansky
URL:
Whiteboard: AI-Team-Hive KNI-EDGE-4.8
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-13 11:12 UTC by nshidlin
Modified: 2021-07-27 23:00 UTC (History)
5 users (show)

Fixed In Version: OCP-Metal-v1.0.21.1
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:00:08 UTC
Target Upstream Version:
Embargoed:
nmagnezi: needinfo-


Attachments (Terms of Use)
InstallEnv and NMStateConfig CRDS (1.15 KB, text/plain)
2021-04-13 11:12 UTC, nshidlin
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift assisted-service pull 1696 0 None open OCPBUGSM-27700 Fix NMstate requeue 2021-05-26 10:42:55 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:00:22 UTC

Description nshidlin 2021-04-13 11:12:58 UTC
Created attachment 1771602 [details]
InstallEnv and NMStateConfig CRDS

Description of problem:
In the case where an InstallEnv references an invalid NMStateConfig the reconcile of the InstallEnv is continuously attempted; even though there is no change to the NMStateConfig. 
In this case InstallEnv reconciliation and ISO generation should only be re-attempted once the NMStateConfig is changed.  

Version-Release number of selected component (if applicable):
assisted-service image:
quay.io/ocpmetal/assisted-service@sha256:c65af18f741660660a04e4a3b155c10a6668527bb790de06a9708f6bec17479b

Steps to Reproduce:
1. Create ClusterDeployment
2. Create invalid NMStateConfig
3. Create InstallEnv referencing invalid NMStateConfig

Actual results:
InstallEnv reconcile is continually attempted with no change made to NMStateConfig  

Expected results:
InstallEnv reconcile should only be attempted if the NMStateConfig is changed

Comment 1 Nir Magnezi 2021-05-09 15:26:41 UTC
This bug is a duplicate of https://issues.redhat.com/browse/MGMT-4695

In short, for invalid nmstate config we get the wrong status code, which makes it hard to determine whether or not we should reqeueue.
For invalid config, we would expect HTTP StatusBadRequest (code 400), while we get HTTP StatusInternalServerError (code500) here.

I have added some debug prints and reproduced ths issue here (added prints marked with 'ZZZ'): https://gist.github.com/nmagnezi/cd4e21691e8c64647bd00d32b0a60b30
See that we initially get 500, followed up by many 409 for requests that arrived in under 10 seconds.
For the latter (code 409), I will try to extend the requeue time to a time longer than 10 seconds, yet it will fix part of the issue.

Yevgeny, any plans for https://issues.redhat.com/browse/MGMT-MGMT-4696 ?

Comment 2 Nir Magnezi 2021-05-09 15:28:39 UTC
Yevgeny, see the question on comment#1

Comment 3 Nir Magnezi 2021-05-27 06:10:29 UTC
Fix merged to master.

QE Verification:
================

You may verify the fix by the referenced YAMLs from: https://github.com/openshift/assisted-service/pull/1696#issuecomment-848670736

Comment 4 nshidlin 2021-06-02 05:52:11 UTC
Verified:

The infraenv is reconciled twice with the invalid nmstate config, and then reconciled again only when there is a change to nmstateconfig matching the label is changed 

quay.io/ocpmetal/assisted-service@sha256:434617dd691c2f5f1a410ffd9866908fc0e9c72e0c3b26ced3d0d8578180fc3a

Comment 7 errata-xmlrpc 2021-07-27 23:00:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.