Bug 1966632 - [4.8.0] [assisted operator] Unable to re-register an SNO instance if deleting CRDs during install
Summary: [4.8.0] [assisted operator] Unable to re-register an SNO instance if deleting...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: assisted-installer
Version: 4.8
Hardware: Unspecified
OS: Unspecified
urgent
medium
Target Milestone: ---
: 4.8.0
Assignee: Fred Rolland
QA Contact: Chad Crum
URL:
Whiteboard: AI-Team-Hive KNI-EDGE-4.8
Depends On: 1947154
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-01 14:50 UTC by Antoni Segura Puimedon
Modified: 2021-07-27 23:11 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:10:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift assisted-service pull 1881 0 None open [ocm-2.3] Bug 1966632: Kubeapi cancel install before delete 2021-06-04 16:06:00 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:11:08 UTC

Description Antoni Segura Puimedon 2021-06-01 14:50:29 UTC
This bug was initially created as a copy of Bug #1947154

I am copying this bug because: 



Description of problem:
If a SNO instance is in the process of installing openshift onto the node (Using Assisted Service Operator + CRDs), if the instance is removed by deleting relevant CRDs (ClusterDeploy / InstallEnv / Agent), it is not possible re-register+install the same node until the installation has timed out or been manually aborted.

Version-Release number of selected component (if applicable):
Assisted Service Master (commit 5bc8d7ef053110bb3da7be9460284e930eb03b1e)

How reproducible:
100%

Steps to Reproduce:
1. Deploy an SNO cluster via CRDs + Assisted Operator
2. Delete the relevant CRDs while it is installing
3. Attempt to reapply the CRDs + start the SNO machine with a new discover ISO

Actual results:
- Agents fail to start on SNO Machine and agent cr is not created
- ClusterDeployment CR says the state is "installing"

Assisted Service pod logs:
time="2021-04-07T19:07:41Z" level=error msg="failed to deregister cluster: sno-cluster-deployment: cluster 40a22dbd-d424-40dd-9100-13108fd5323b can not be removed while being installed" func="github.com/openshift/assisted-service/internal/controller/controllers.(*ClusterDeploymentsReconciler).deregisterClusterIfNeeded.func1" file="/go/src/github.com/openshift/origin/internal/controller/controllers/clusterdeployments_controller.go:652"
time="2021-04-07T19:07:46Z" level=info msg="Deregister cluster id 40a22dbd-d424-40dd-9100-13108fd5323b" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).DeregisterClusterInternal" file="/go/src/github.com/openshift/origin/internal/bminventory/inventory.go:655" go-id=841 pkg=Inventory request_id=
time="2021-04-07T19:07:47Z" level=error msg="failed to deregister cluster 40a22dbd-d424-40dd-9100-13108fd5323b" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).DeregisterClusterInternal" file="/go/src/github.com/openshift/origin/internal/bminventory/inventory.go:674" error="cluster 40a22dbd-d424-40dd-9100-13108fd5323b can not be removed while being installed" go-id=841 pkg=Inventory request_id=



Expected results:
Installation is halted immediately (Aborted automatically?)

Additional info:

Although the issue is alluded to in the assisted service pods, I did not 100% realize what was happening until I checked the assisted ui, which I should NOT need to do. It's a confusing situation.

Comment 3 Chad Crum 2021-06-19 13:33:06 UTC
I have validated the fix.

- 2.3.0-DOWNSTREAM-2021-06-17-01-26-58
- 4.8.0-fc.7


Steps:
- Deployed upstream Assisted/Hive operators on OCP 4.8 
- Created all required ZTP SNO CRs and got to the point of the SNO instance performing the install
- Deleted all SNO CRs (Which deleted the agent cr as expected)
- Recreated all of the same ZTP SNO CRs and confirmed that the SNO instance started performing the install again

Comment 5 errata-xmlrpc 2021-07-27 23:10:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.