Bug 1956959 - ipv6 disconnected sno crd deployment hive reports success status and clusterdeployrmet reporting false
Summary: ipv6 disconnected sno crd deployment hive reports success status and clusterd...
Status: ON_QA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: assisted-installer
Version: 4.8
Hardware: All
OS: Unspecified
Target Milestone: ---
: ---
Assignee: Daniel Erez
QA Contact: bjacot
Whiteboard: AI-Team-Hive
Depends On:
TreeView+ depends on / blocked
Reported: 2021-05-04 18:26 UTC by bjacot
Modified: 2021-05-13 11:56 UTC (History)
5 users (show)

Fixed In Version: OCP-Metal-v1.0.20.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed:
Target Upstream Version:

Attachments (Terms of Use)

Description bjacot 2021-05-04 18:26:13 UTC
I have completed an ipv6 disconnected sno deployment via CRD's.  I am not doing ztp.  I am manually mounting and booting the iso.  I am seeing that the sno deployment succeeded in hive but cluster deployment status is still showing false.  Also the kubeconfig of the sno deployment is not showing up in secrets -n assisted-installer.

[kni@provisionhost-0-0 ~]$ oc get clusterdeployments.hive.openshift.io -n assisted-installer -o=custom-columns='STATUS:status.conditions[-1].message'
The installation has completed: Cluster is installed
[kni@provisionhost-0-0 ~]$ 
[kni@provisionhost-0-0 ~]$ oc get cd sno-cluster-deployment -o json | jq -r '.spec.installed'
[kni@provisionhost-0-0 ~]$ 

##### from the SNO vm###
[root@sno ~]# export KUBECONFIG=/sysroot/ostree/deploy/rhcos/var/lib/kubelet/kubeconfig
[root@sno ~]# oc get nodes
sno    Ready    master,worker   59m   v1.21.0-rc.0+6825c59
[root@sno ~]# 
[root@sno ~]# oc get pods -n assisted-installer 
NAME                                  READY   STATUS      RESTARTS   AGE
assisted-installer-controller-n4spz   0/1     Completed   0          69m
[root@sno ~]# 

#### see these pods in error ###
[root@sno ~]# oc get pods -A|grep -v Run|grep -v Compl
NAMESPACE                                          NAME                                                      READY   STATUS             RESTARTS   AGE
openshift-kube-controller-manager                  installer-6-sno                                           0/1     Error              0          50m
openshift-kube-scheduler                           installer-7-sno                                           0/1     Error              0          50m

Comment 1 Michael Filanov 2021-05-05 13:54:54 UTC
The reason that we found was related to extra space in the ssh-key. This space is trimmed in the backend and the controller sees this as a change and tries to update the backend again, causing a reconcile loop. 
There is a good reason for the trim and it resolves a bug that if there is a new line in the ssh key the boot will fail. so the solution should be that the controller match back-end behavior and trim ssh key before comparing it to the backend. 
In addition, we could try to avoid calling updates in specific cases or specific states.

Note You need to log in before you can comment on or make changes to this bug.