This bug was initially created as a copy of Bug #1947154 I am copying this bug because: Description of problem: If a SNO instance is in the process of installing openshift onto the node (Using Assisted Service Operator + CRDs), if the instance is removed by deleting relevant CRDs (ClusterDeploy / InstallEnv / Agent), it is not possible re-register+install the same node until the installation has timed out or been manually aborted. Version-Release number of selected component (if applicable): Assisted Service Master (commit 5bc8d7ef053110bb3da7be9460284e930eb03b1e) How reproducible: 100% Steps to Reproduce: 1. Deploy an SNO cluster via CRDs + Assisted Operator 2. Delete the relevant CRDs while it is installing 3. Attempt to reapply the CRDs + start the SNO machine with a new discover ISO Actual results: - Agents fail to start on SNO Machine and agent cr is not created - ClusterDeployment CR says the state is "installing" Assisted Service pod logs: time="2021-04-07T19:07:41Z" level=error msg="failed to deregister cluster: sno-cluster-deployment: cluster 40a22dbd-d424-40dd-9100-13108fd5323b can not be removed while being installed" func="github.com/openshift/assisted-service/internal/controller/controllers.(*ClusterDeploymentsReconciler).deregisterClusterIfNeeded.func1" file="/go/src/github.com/openshift/origin/internal/controller/controllers/clusterdeployments_controller.go:652" time="2021-04-07T19:07:46Z" level=info msg="Deregister cluster id 40a22dbd-d424-40dd-9100-13108fd5323b" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).DeregisterClusterInternal" file="/go/src/github.com/openshift/origin/internal/bminventory/inventory.go:655" go-id=841 pkg=Inventory request_id= time="2021-04-07T19:07:47Z" level=error msg="failed to deregister cluster 40a22dbd-d424-40dd-9100-13108fd5323b" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).DeregisterClusterInternal" file="/go/src/github.com/openshift/origin/internal/bminventory/inventory.go:674" error="cluster 40a22dbd-d424-40dd-9100-13108fd5323b can not be removed while being installed" go-id=841 pkg=Inventory request_id= Expected results: Installation is halted immediately (Aborted automatically?) Additional info: Although the issue is alluded to in the assisted service pods, I did not 100% realize what was happening until I checked the assisted ui, which I should NOT need to do. It's a confusing situation.
I have validated the fix. - 2.3.0-DOWNSTREAM-2021-06-17-01-26-58 - 4.8.0-fc.7 Steps: - Deployed upstream Assisted/Hive operators on OCP 4.8 - Created all required ZTP SNO CRs and got to the point of the SNO instance performing the install - Deleted all SNO CRs (Which deleted the agent cr as expected) - Recreated all of the same ZTP SNO CRs and confirmed that the SNO instance started performing the install again
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438