Description of problem: ======================= Tested the scenario mentioned in bug 1972525 To work around the issue, I updated the InfraEnv annotations - to force a reconcile. Installation was not retriggered because BMAC did not clear up error status from the BMH: https://gist.github.com/nmagnezi/1249fab7dcd313bd85107cf7c9f904f7#file-bmh-yaml-L27-L32 Version-Release number of selected component (if applicable): ============================================================= current git head 403985c8d95b5bef173a11326b3de7aec3fdef18 How reproducible: ================= 1/1 Steps to Reproduce: =================== 1. Follow bug 1972525 2. Clear up the detached annotation from BMH 3. Add some infraEnv annotation (workaround until bug 1972525 gets resolved). 4. Inspect BMH Actual results: =============== Image download URL not up to date with InfraEnv status still in an error state as mentioned above Expected results: ================= BMH should recover and retrigger the install
I looked into this, here's the summary: 1. The environment was using an older version of Assisted Service, which was missing a couple of PRs 2. After updating the assisted-service container (manually modified the Operator Subscription), I was able to retry a deployment (I will attach a screenshot of the BMH's events showing the deprovision/provision of the image when the deployment was retried). @nmagnezi do you want to give this another go before closing the issue?
Created attachment 1791711 [details] bmh events BMH's events showing re-provision of an InfraEnv URL
(In reply to Flavio Percoco from comment #2) > I looked into this, here's the summary: > > 1. The environment was using an older version of Assisted Service, which was > missing a couple of PRs > 2. After updating the assisted-service container (manually modified the > Operator Subscription), I was able to retry a deployment (I will attach a > screenshot of the BMH's events showing the deprovision/provision of the > image when the deployment was retried). > > @nmagnezi do you want to give this another go before closing the > issue? I tried this again. What I see now is that the BMH clears the error status, but didn't get the new image URL, thus no re-install. log: https://gist.github.com/nmagnezi/a49d34d6cf2a8cc0fc110621fde43642 Let's follow up to see if I did something else / wrong that caused this, before we close this bug.
I attempted this again because I simply forgot to remove the 'detached' label. However, now It fails with 404:: https://gist.github.com/nmagnezi/2395564774afa4f5a812ac5cf4e3c0db#file-bmh-yaml-L317 SVC log: https://gist.github.com/nmagnezi/d2f2040ce3d391a823b1e6b3f6bfc888#file-retry-go-L83
For QE: The solution here is to document how to retry an installation: the user need to recreate both BMH(s) and ACI.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759