Bug 1956959

Summary: ipv6 disconnected sno crd deployment hive reports success status and clusterdeployrmet reporting false
Product: OpenShift Container Platform Reporter: bjacot
Component: assisted-installerAssignee: Daniel Erez <derez>
assisted-installer sub component: Deployment Operator QA Contact: bjacot
Status: VERIFIED --- Docs Contact:
Severity: urgent    
Priority: urgent CC: alazar, aos-bugs, keyoung, mfilanov, sasha
Version: 4.8Keywords: TestBlocker, Triaged
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: All   
OS: Unspecified   
Whiteboard: AI-Team-Hive
Fixed In Version: OCP-Metal-v1.0.20.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description bjacot 2021-05-04 18:26:13 UTC
I have completed an ipv6 disconnected sno deployment via CRD's.  I am not doing ztp.  I am manually mounting and booting the iso.  I am seeing that the sno deployment succeeded in hive but cluster deployment status is still showing false.  Also the kubeconfig of the sno deployment is not showing up in secrets -n assisted-installer.

[kni@provisionhost-0-0 ~]$ oc get clusterdeployments.hive.openshift.io -n assisted-installer -o=custom-columns='STATUS:status.conditions[-1].message'
The installation has completed: Cluster is installed
[kni@provisionhost-0-0 ~]$ 
[kni@provisionhost-0-0 ~]$ oc get cd sno-cluster-deployment -o json | jq -r '.spec.installed'
[kni@provisionhost-0-0 ~]$ 

##### from the SNO vm###
[root@sno ~]# export KUBECONFIG=/sysroot/ostree/deploy/rhcos/var/lib/kubelet/kubeconfig
[root@sno ~]# oc get nodes
sno    Ready    master,worker   59m   v1.21.0-rc.0+6825c59
[root@sno ~]# 
[root@sno ~]# oc get pods -n assisted-installer 
NAME                                  READY   STATUS      RESTARTS   AGE
assisted-installer-controller-n4spz   0/1     Completed   0          69m
[root@sno ~]# 

#### see these pods in error ###
[root@sno ~]# oc get pods -A|grep -v Run|grep -v Compl
NAMESPACE                                          NAME                                                      READY   STATUS             RESTARTS   AGE
openshift-kube-controller-manager                  installer-6-sno                                           0/1     Error              0          50m
openshift-kube-scheduler                           installer-7-sno                                           0/1     Error              0          50m

Comment 1 Michael Filanov 2021-05-05 13:54:54 UTC
The reason that we found was related to extra space in the ssh-key. This space is trimmed in the backend and the controller sees this as a change and tries to update the backend again, causing a reconcile loop. 
There is a good reason for the trim and it resolves a bug that if there is a new line in the ssh key the boot will fail. so the solution should be that the controller match back-end behavior and trim ssh key before comparing it to the backend. 
In addition, we could try to avoid calling updates in specific cases or specific states.

Comment 2 bjacot 2021-06-11 18:32:47 UTC
I was able to perform an SNO deployment and updated my automation to trim blank space.