Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2043738

Summary: SNO Installation on 4.10-fc.2 Does Not Complete
Product: OpenShift Container Platform Reporter: Benjamin Schmaus <bschmaus>
Component: NetworkingAssignee: Ben Nemec <bnemec>
Networking sub component: runtime-cfg QA Contact: Victor Voronkov <vvoronko>
Status: CLOSED NOTABUG Docs Contact:
Severity: medium    
Priority: unspecified CC: aos-bugs, calfonso, ercohen, htariq, rfreiman
Version: 4.10   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-02-02 21:02:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Log Bundle none

Description Benjamin Schmaus 2022-01-21 21:23:49 UTC
Created attachment 1852619 [details]
Log Bundle

Thanks for opening a bug report!
Before hitting the button, please fill in as much of the template below as you can.
If you leave out information, it's harder to help you.
Be ready for follow-up questions, and please respond in a timely manner.
If we can't reproduce a bug we might close your issue.
If we're wrong, PLEASE feel free to reopen it and explain why.

Version:

$ openshift-install version
4.10-fc.2

Platform:
Baremetal SNO

Please specify:
 IPI 

What happened?

Installation fails to complete - many cluster operators seem hinder to progress
Attached log bundle

What did you expect to happen?

Installation to complete

How to reproduce it (as minimally and precisely as possible)?
Use AI or Platform None to create a bootable ISO that will create SNO node on baremetal

Anything else we need to know?

oc get co
NAME                                       VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.10.0-fc.2   False       False         True       43m     APIServerDeploymentAvailable: no apiserver.openshift-oauth-apiserver pods available on any node....
baremetal                                  4.10.0-fc.2   True        False         False      35m     
cloud-controller-manager                   4.10.0-fc.2   True        False         False      35m     
cloud-credential                           4.10.0-fc.2   True        False         False      43m     
cluster-autoscaler                         4.10.0-fc.2   True        False         False      35m     
config-operator                            4.10.0-fc.2   True        False         False      44m     
console                                                                                               
csi-snapshot-controller                    4.10.0-fc.2   True        False         False      43m     
dns                                        4.10.0-fc.2   True        False         False      35m     
etcd                                       4.10.0-fc.2   True        False         True       33m     StaticPodsDegraded: pod/etcd-r740 container "etcd-health-monitor" is waiting: CrashLoopBackOff: back-off 5m0s restarting failed container=etcd-health-monitor pod=etcd-r740_openshift-etcd(226fae129393e3068efeb4a516de88f9)
image-registry                                                                                        
ingress                                                  Unknown     True          Unknown    35m     Not all ingress controllers are available.
insights                                   4.10.0-fc.2   True        False         False      30m     
kube-apiserver                             4.10.0-fc.2   True        False         False      29m     
kube-controller-manager                    4.10.0-fc.2   True        False         False      30m     
kube-scheduler                             4.10.0-fc.2   True        False         False      33m     
kube-storage-version-migrator              4.10.0-fc.2   True        False         False      43m     
machine-api                                4.10.0-fc.2   True        False         False      35m     
machine-approver                                                                                      
machine-config                                           True        True          True       33m     Unable to apply 4.10.0-fc.2: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: configuration status for pool master is empty: 0 (ready 0) out of 1 nodes are updating to latest configuration rendered-master-8096224d6931030961556cb3ce316af7, retrying
marketplace                                4.10.0-fc.2   True        False         False      43m     
monitoring                                               False       False         True       42m     Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error.
network                                    4.10.0-fc.2   True        False         False      44m     
node-tuning                                4.10.0-fc.2   True        False         False      43m     
openshift-apiserver                        4.10.0-fc.2   False       False         True       43m     APIServerDeploymentAvailable: no apiserver.openshift-apiserver pods available on any node....
openshift-controller-manager               4.10.0-fc.2   True        False         False      29m     
openshift-samples                                                                                     
operator-lifecycle-manager                 4.10.0-fc.2   True        False         False      35m     
operator-lifecycle-manager-catalog         4.10.0-fc.2   True        False         False      35m     
operator-lifecycle-manager-packageserver   4.10.0-fc.2   True        False         False      30m     
service-ca                                 4.10.0-fc.2   True        False         False      43m     
storage                                    4.10.0-fc.2   True        False         False      35m

Comment 3 Eran Cohen 2022-01-23 16:09:50 UTC
kube-apiserver log:

W0121 20:46:42.177289      18 patch_genericapiserver.go:130] Request to "/apis/rbac.authorization.k8s.io/v1/namespaces/openshift-authentication-operator/rolebindings" (source IP 10.128.0.38:44152, user agent "openshift-controller-manager/v0.0.0 (linux/am
d64) kubernetes/$Format/system:serviceaccount:openshift-infra:default-rolebindings-controller") before server is ready, possibly a sign for a broken load balancer setup.
W0121 20:46:42.186751      18 patch_genericapiserver.go:130] Request to "/apis/apps/v1/namespaces/openshift-monitoring/deployments/prometheus-adapter" (source IP 10.128.0.20:52770, user agent "Go-http-client/2.0") before server is ready, possibly a sign 
for a broken load balancer setup.
W0121 20:46:42.197858      18 patch_genericapiserver.go:130] Request to "/apis/monitoring.coreos.com/v1" (source IP 10.128.0.41:45058, user agent "ingress-operator/v0.0.0 (linux/amd64) kubernetes/$Format") before server is ready, possibly a sign for a br
oken load balancer setup.
W0121 20:46:42.199464      18 patch_genericapiserver.go:130] Request to "/api/v1/namespaces/openshift-kube-controller-manager/configmaps/config" (source IP 10.128.0.13:37526, user agent "Go-http-client/2.0") before server is ready, possibly a sign for a 
broken load balancer setup.
W0121 20:46:42.211012      18 patch_genericapiserver.go:130] Request to "/apis/admissionregistration.k8s.io/v1/mutatingwebhookconfigurations" (source IP 10.128.0.46:56426, user agent "olm/v0.0.0 (linux/amd64) kubernetes/$Format") before server is ready, 
possibly a sign for a broken load balancer setup.
W0121 20:46:42.212543      18 patch_genericapiserver.go:130] Request to "/apis/config.openshift.io/v1/clusteroperators/marketplace" (source IP 10.128.0.6:50544, user agent "marketplace-operator/v0.0.0 (linux/amd64) kubernetes/$Format") before server is r
eady, possibly a sign for a broken load balancer setup.
W0121 20:46:42.212837      18 patch_genericapiserver.go:130] Request to "/apis/operators.coreos.com/v1/namespaces/openshift-operator-lifecycle-manager/operatorgroups" (source IP 10.128.0.46:56426, user agent "olm/v0.0.0 (linux/amd64) kubernetes/$Format")
 before server is ready, possibly a sign for a broken load balancer setup.
W0121 20:46:42.212889      18 patch_genericapiserver.go:130] Request to "/apis/config.openshift.io/v1/clusteroperators/operator-lifecycle-manager-packageserver" (source IP 10.128.0.46:56426, user agent "olm/v0.0.0 (linux/amd64) kubernetes/$Format") befor
e server is ready, possibly a sign for a broken load balancer setup.
I0121 20:46:42.213916      18 genericapiserver.go:812] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-apiserver", Name:"kube-apiserver-r740", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'KubeAPIRea
dyz' readyz=true
W0121 20:46:42.549578      18 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {10.11.176.112:2379 10.11.176.112 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is
 valid for ::1, 10.11.176.230, 127.0.0.1, ::1, not 10.11.176.112". Reconnecting...
W0121 20:46:43.348519      18 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {10.11.176.112:2379 10.11.176.112 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is
 valid for ::1, 10.11.176.230, 127.0.0.1, ::1, not 10.11.176.112". Reconnecting...
W0121 20:46:43.408224      18 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {10.11.176.112:2379 10.11.176.112 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is
 valid for ::1, 10.11.176.230, 127.0.0.1, ::1, not 10.11.176.112". Reconnecting...
W0121 20:46:43.501701      18 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {10.11.176.112:2379 10.11.176.112 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is
 valid for ::1, 10.11.176.230, 127.0.0.1, ::1, not 10.11.176.112". Reconnecting...
W0121 20:46:44.382918      18 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {10.11.176.112:2379 10.11.176.112 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is
 valid for ::1, 10.11.176.230, 127.0.0.1, ::1, not 10.11.176.112". Reconnecting...
W0121 20:46:44.680172      18 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {10.11.176.112:2379 10.11.176.112 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is
 valid for ::1, 10.11.176.230, 127.0.0.1, ::1, not 10.11.176.112". Reconnecting...
W0121 20:46:45.436591      18 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {10.11.176.112:2379 10.11.176.112 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is
 valid for ::1, 10.11.176.230, 127.0.0.1, ::1, not 10.11.176.112". Reconnecting...

Seems that the kube-apiserver is failing to communicate with etcd.