1949267 – Azure Installation fails when specifying two ssh keys in install-config.yaml - apiserver indicates transport: loopyWriter.run returning. connection error: desc = "transport is closing"

Bug 1949267 - Azure Installation fails when specifying two ssh keys in install-config.yaml - apiserver indicates transport: loopyWriter.run returning. connection error: desc = "transport is closing"

Summary: Azure Installation fails when specifying two ssh keys in install-config.yaml ...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	aos-install
QA Contact:	Gaoyun Pei
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-04-13 20:00 UTC by To Hung Sze
Modified:	2021-04-14 13:53 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-04-14 13:02:01 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
must-gather log (19.14 MB, application/zip) 2021-04-13 20:01 UTC, To Hung Sze	no flags	Details
View All

Description To Hung Sze 2021-04-13 20:00:41 UTC

Description of problem:
When installing on Azure and specify 2 ssh keys in install-config, installation fails.
apiserver log indicates
transport: loopyWriter.run returning. connection error: desc = "transport is closing"

Version-Release number of selected component (if applicable):
openshift-install-linux-4.8.0-0.nightly-2021-04-09-222447

How reproducible:
Add a second ssh key to install-config.yaml and then install on Azure.


Steps to Reproduce:
1. openshift-install create install-config --dir <test_dir>
2. vi <test_dir>/install-config.yaml
Add a second public ssh key to ssh-key section at bottom.
3. openshift-install create cluster --dir <test_dir>

Actual results:
Install fails.

time="2021-04-13T15:07:32-04:00" level=info msg="Cluster operator console Progressing is True with SyncLoopRefresh_InProgress: SyncLoopRefreshProgressing: Working toward version 4.8.0-0.nightly-2021-04-09-222447"
time="2021-04-13T15:07:32-04:00" level=info msg="Cluster operator console Available is False with Deployment_InsufficientReplicas: DeploymentAvailable: 0 pods available for console deployment"
time="2021-04-13T15:07:32-04:00" level=info msg="Cluster operator ingress Available is False with IngressUnavailable: Not all ingress controllers are available."
time="2021-04-13T15:07:32-04:00" level=info msg="Cluster operator ingress Progressing is True with Reconciling: Not all ingress controllers are available."
time="2021-04-13T15:07:32-04:00" level=error msg="Cluster operator ingress Degraded is True with IngressControllersDegraded: Some ingresscontrollers are degraded: ingresscontroller \"default\" is degraded: DegradedConditions: One or more other status conditions indicate a degraded state: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: EnsureBackendPoolDeleted: failed to parse the VMAS ID : getAvailabilitySetNameByID: failed to parse the VMAS ID \nThe kube-controller-manager logs may contain more details.), CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)"
time="2021-04-13T15:07:32-04:00" level=info msg="Cluster operator insights Disabled is False with AsExpected: "
time="2021-04-13T15:07:32-04:00" level=info msg="Cluster operator network ManagementStateDegraded is False with : "
time="2021-04-13T15:07:32-04:00" level=error msg="Cluster initialization failed because one or more operators are not functioning properly.\nThe cluster should be accessible for troubleshooting as detailed in the documentation linked below,\nhttps://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html\nThe 'wait-for install-complete' subcommand can then be used to continue the installation"
time="2021-04-13T15:07:32-04:00" level=fatal msg="failed to initialize the cluster: Some cluster operators are still updating: authentication, console, ingress"

Expected results:
Instaill finishes.
Same procedure with GCP works.

Additional info:
In attached must-gather, apiserver's log shows
2021-04-13T19:17:00.079688614Z I0413 19:17:00.079636       1 clientconn.go:948] ClientConn switching balancer to "pick_first"
2021-04-13T19:17:00.079836114Z I0413 19:17:00.079800       1 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc0016aebd0, {CONNECTING <nil>}
2021-04-13T19:17:00.089945329Z I0413 19:17:00.089879       1 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc0016aebd0, {READY <nil>}
2021-04-13T19:17:00.092089832Z I0413 19:17:00.092050       1 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"

Comment 1 To Hung Sze 2021-04-13 20:01:47 UTC

Created attachment 1771696 [details]
must-gather log

Comment 2 Stefan Schimanski 2021-04-14 07:57:57 UTC

This is not fatal and probably just noise:

  transport: loopyWriter.run returning. connection error: desc = "transport is closing"

Please do some root cause analysis before moving such a bug to a random component. I don't see a proof that apiserver is at fault here.

Moving to installer component.

Comment 3 Matthew Staebler 2021-04-14 13:02:01 UTC

You need to destroy your bootstrap resources before the installation will complete.

> 2021-04-13T19:12:30.579883591Z I0413 19:12:30.579774       1 azure_loadbalancer.go:1141] reconcileLoadBalancer for service (openshift-ingress/router-default)(true): lb backendpool - found unwanted node tszeaz041321b-zrqfk-bootstrap, decouple it from the LB

Comment 4 To Hung Sze 2021-04-14 13:53:14 UTC

Azure does not like having bootstrap around (I did keep the bootstrap around when installing - sorry I forgot to include this piece of information)

Note You need to log in before you can comment on or make changes to this bug.