Bug 1956959

Summary:	ipv6 disconnected sno crd deployment hive reports success status and clusterdeployrmet reporting false
Product:	OpenShift Container Platform	Reporter:	bjacot
Component:	assisted-installer	Assignee:	Daniel Erez <derez>
assisted-installer sub component:	Deployment Operator	QA Contact:	bjacot
Status:	CLOSED ERRATA	Docs Contact:
Severity:	urgent
Priority:	urgent	CC:	alazar, aos-bugs, keyoung, mfilanov, sasha
Version:	4.8	Keywords:	TestBlocker, Triaged
Target Milestone:	---
Target Release:	4.8.0
Hardware:	All
OS:	Unspecified
Whiteboard:	AI-Team-Hive
Fixed In Version:	OCP-Metal-v1.0.20.1	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-07-27 23:06:09 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description bjacot 2021-05-04 18:26:13 UTC

I have completed an ipv6 disconnected sno deployment via CRD's.  I am not doing ztp.  I am manually mounting and booting the iso.  I am seeing that the sno deployment succeeded in hive but cluster deployment status is still showing false.  Also the kubeconfig of the sno deployment is not showing up in secrets -n assisted-installer.

[kni@provisionhost-0-0 ~]$ oc get clusterdeployments.hive.openshift.io -n assisted-installer -o=custom-columns='STATUS:status.conditions[-1].message'
STATUS
The installation has completed: Cluster is installed
[kni@provisionhost-0-0 ~]$ 
[kni@provisionhost-0-0 ~]$ oc get cd sno-cluster-deployment -o json | jq -r '.spec.installed'
false
[kni@provisionhost-0-0 ~]$ 


##### from the SNO vm###
[root@sno ~]# export KUBECONFIG=/sysroot/ostree/deploy/rhcos/var/lib/kubelet/kubeconfig
[root@sno ~]# oc get nodes
NAME   STATUS   ROLES           AGE   VERSION
sno    Ready    master,worker   59m   v1.21.0-rc.0+6825c59
[root@sno ~]# 
[root@sno ~]# oc get pods -n assisted-installer 
NAME                                  READY   STATUS      RESTARTS   AGE
assisted-installer-controller-n4spz   0/1     Completed   0          69m
[root@sno ~]# 

#### see these pods in error ###
[root@sno ~]# oc get pods -A|grep -v Run|grep -v Compl
NAMESPACE                                          NAME                                                      READY   STATUS             RESTARTS   AGE
openshift-kube-controller-manager                  installer-6-sno                                           0/1     Error              0          50m
openshift-kube-scheduler                           installer-7-sno                                           0/1     Error              0          50m

Comment 1 Michael Filanov 2021-05-05 13:54:54 UTC

The reason that we found was related to extra space in the ssh-key. This space is trimmed in the backend and the controller sees this as a change and tries to update the backend again, causing a reconcile loop. 
There is a good reason for the trim and it resolves a bug that if there is a new line in the ssh key the boot will fail. so the solution should be that the controller match back-end behavior and trim ssh key before comparing it to the backend. 
In addition, we could try to avoid calling updates in specific cases or specific states.

Comment 2 bjacot 2021-06-11 18:32:47 UTC

I was able to perform an SNO deployment and updated my automation to trim blank space.

Comment 5 errata-xmlrpc 2021-07-27 23:06:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438