1956959 – ipv6 disconnected sno crd deployment hive reports success status and clusterdeployrmet reporting false

Bug 1956959 - ipv6 disconnected sno crd deployment hive reports success status and clusterdeployrmet reporting false

Summary: ipv6 disconnected sno crd deployment hive reports success status and clusterd...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	assisted-installer
Sub Component:
Version:	4.8
Hardware:	All
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Daniel Erez
QA Contact:	bjacot
Docs Contact:
URL:
Whiteboard:	AI-Team-Hive
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-05-04 18:26 UTC by bjacot
Modified:	2021-07-27 23:06 UTC (History)
CC List:	5 users (show)
Fixed In Version:	OCP-Metal-v1.0.20.1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-07-27 23:06:09 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2021:2438	0	None	None	None	2021-07-27 23:06:25 UTC

Description bjacot 2021-05-04 18:26:13 UTC

I have completed an ipv6 disconnected sno deployment via CRD's.  I am not doing ztp.  I am manually mounting and booting the iso.  I am seeing that the sno deployment succeeded in hive but cluster deployment status is still showing false.  Also the kubeconfig of the sno deployment is not showing up in secrets -n assisted-installer.

[kni@provisionhost-0-0 ~]$ oc get clusterdeployments.hive.openshift.io -n assisted-installer -o=custom-columns='STATUS:status.conditions[-1].message'
STATUS
The installation has completed: Cluster is installed
[kni@provisionhost-0-0 ~]$ 
[kni@provisionhost-0-0 ~]$ oc get cd sno-cluster-deployment -o json | jq -r '.spec.installed'
false
[kni@provisionhost-0-0 ~]$ 


##### from the SNO vm###
[root@sno ~]# export KUBECONFIG=/sysroot/ostree/deploy/rhcos/var/lib/kubelet/kubeconfig
[root@sno ~]# oc get nodes
NAME   STATUS   ROLES           AGE   VERSION
sno    Ready    master,worker   59m   v1.21.0-rc.0+6825c59
[root@sno ~]# 
[root@sno ~]# oc get pods -n assisted-installer 
NAME                                  READY   STATUS      RESTARTS   AGE
assisted-installer-controller-n4spz   0/1     Completed   0          69m
[root@sno ~]# 

#### see these pods in error ###
[root@sno ~]# oc get pods -A|grep -v Run|grep -v Compl
NAMESPACE                                          NAME                                                      READY   STATUS             RESTARTS   AGE
openshift-kube-controller-manager                  installer-6-sno                                           0/1     Error              0          50m
openshift-kube-scheduler                           installer-7-sno                                           0/1     Error              0          50m

Comment 1 Michael Filanov 2021-05-05 13:54:54 UTC

The reason that we found was related to extra space in the ssh-key. This space is trimmed in the backend and the controller sees this as a change and tries to update the backend again, causing a reconcile loop. 
There is a good reason for the trim and it resolves a bug that if there is a new line in the ssh key the boot will fail. so the solution should be that the controller match back-end behavior and trim ssh key before comparing it to the backend. 
In addition, we could try to avoid calling updates in specific cases or specific states.

Comment 2 bjacot 2021-06-11 18:32:47 UTC

I was able to perform an SNO deployment and updated my automation to trim blank space.

Comment 5 errata-xmlrpc 2021-07-27 23:06:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.