Description of problem: We're using ACM 2.1.4 to install a baremetal cluster. When creating a new provisioner connection, we pasted an ssh private key into the appropriate field. At install time, the connection to the provisioner was failing with a "permission denied" error. Upon inspection, the Secret containing the private key looked like this: stringData: ssh-privatekey: |- -----BEGIN OPENSSH PRIVATE KEY----- b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEbm9uZQAAAAAAAAABAAACFwAAAAdzc2gtcn NhAAAAAwEAAQAAAgEA6oP1JMWngWpB1rZKe7XDvfQZpjO10DjbkL/rNSgttB+I75+twG9Z [...] G6lJk6jX+b7h8gwPy1usHz2RGfyIL+1pajvPVodxSEtteI1Esq1o3A+YliQhPmH8GDvn6v Mz2zm42cckKjbNgKfHwY1F8x4p/t7Vh57u+wtSwbqr3isXGVJIJzB0kwtrUZFU6WnKmdWp 2S9oDKnSYMICawAAAA9BQ00gcHJvdmlzaW9uZXIBAg== -----END OPENSSH PRIVATE KEY----- type: Opaque The keyfile in the above YAML document lacks a terminal newline. Because of the missing newline, the key was rejected by ssh. We were able to resolve the problem by correcting the missing newline and submitting the cluster configuration using `oc apply`. Version-Release number of selected component (if applicable): ACM 2.1.4
Hey Lars! To be clear, you're installing a new cluster to manage, no ACM's Hub cluster. Is that correct? Thanks! Nathan
This is concerning the installation of a Bare Metal cluster, not ACM's Hub; reassigning to Cluster Lifecycle for triage.
I can confirm this bug still exists in ACM 2.2.3. Hive can not ssh-add the private key to ssh-agent in order to connect to libvirtURI during a baremetal cluster deployment because ssh-add complains about a malformed key. Correcting the secret inflight is not an option as there are literally seconds between the moment it gets created and before Hive attempts to add it to ssh-agent in the provisioning pod.
Logs from the hive container of the provisioning pod: time="2021-05-12T11:57:04Z" level=debug msg="Couldn't find install logs provider environment variable. Skipping." I0512 11:57:05.258136 1 request.go:645] Throttling request took 1.006969248s, request: GET:https://172.30.0.1:443/apis/autoscaling.openshift.io/v1?timeout=32s time="2021-05-12T11:57:07Z" level=debug msg="checking for SSH private key" installID=sd8f9qwv time="2021-05-12T11:57:07Z" level=debug msg="checking for SSH private key" installID=sd8f9qwv time="2021-05-12T11:57:07Z" level=info msg="initializing ssh agent with 2 keys" installID=sd8f9qwv time="2021-05-12T11:57:07Z" level=debug msg="no SSH_AUTH_SOCK defined. starting ssh-agent" installID=sd8f9qwv time="2021-05-12T11:57:08Z" level=error msg="failed to add private key: /tmp/ssh-privatekey" error="exit status 1" installID=sd8f9qwv key=/tmp/ssh-privatekey time="2021-05-12T11:57:08Z" level=info msg="ssh agent is not initialized" error="exit status 1" installID=sd8f9qwv time="2021-05-12T11:57:08Z" level=info msg="waiting for files to be available: [/output/openshift-install /output/oc]" installID=sd8f9qwv ... time="2021-05-12T12:01:20Z" level=debug msg=" Generating Platform Permissions Check..." time="2021-05-12T12:01:20Z" level=debug msg=" Fetching Platform Provisioning Check..." time="2021-05-12T12:01:20Z" level=debug msg=" Fetching Install Config..." time="2021-05-12T12:01:20Z" level=debug msg=" Reusing previously-fetched Install Config" time="2021-05-12T12:01:20Z" level=debug msg=" Generating Platform Provisioning Check..." time="2021-05-12T12:01:22Z" level=fatal msg="failed to fetch Cluster: failed to fetch dependency of \"Cluster\": failed to generate asset \"Platform Provisioning Check\": platform.baremetal.libvirtURI: Internal error: could not connect to libvirt: virError(Code=38, Domain=7, Message='Cannot recv data: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).: Connection reset by peer')" time="2021-05-12T12:01:23Z" level=error msg="error after waiting for command completion" error="exit status 1" installID=sd8f9qwv time="2021-05-12T12:01:23Z" level=error msg="error provisioning cluster" error="exit status 1" installID=sd8f9qwv time="2021-05-12T12:01:23Z" level=error msg="error running openshift-install, running deprovision to clean up" error="exit status 1" installID=sd8f9qwv time="2021-05-12T12:01:23Z" level=debug msg="Unable to find log storage actuator. Disabling gathering logs." installID=sd8f9qwv time="2021-05-12T12:01:23Z" level=info msg="saving installer output" installID=sd8f9qwv time="2021-05-12T12:01:23Z" level=debug msg="installer console log: level=info msg=Consuming Install Config from target directory\nlevel=warning msg=Making control-plane schedulable by setting MastersSchedulable to true for Scheduler cluster settings\nlevel=warning msg=Discarding the Openshift Manifests that was provided in the target directory because its dependencies are dirty and it needs to be regenerated\nlevel=info msg=Manifests created in: manifests and openshift\nlevel=warning msg=Found override for release image. Please be warned, this is not advised\nlevel=info msg=Consuming Worker Machines from target directory\nlevel=info msg=Consuming Openshift Manifests from target directory\nlevel=info msg=Consuming OpenShift Install (Manifests) from target directory\nlevel=info msg=Consuming Common Manifests from target directory\nlevel=info msg=Consuming Master Machines from target directory\nlevel=info msg=Ignition-Configs created in: . and auth\nlevel=info msg=Consuming Worker Ignition Config from target directory\nlevel=info msg=Consuming Master Ignition Config from target directory\nlevel=info msg=Consuming Bootstrap Ignition Config from target directory\nlevel=info msg=Obtaining RHCOS image file from 'https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.7/47.83.202103251640-0/x86_64/rhcos-47.83.202103251640-0-qemu.x86_64.qcow2.gz?sha256=2cc7c8841e6b2b0f5d3573b82453fddad3c44972c080969458af85c7097e9bc5'\nlevel=fatal msg=failed to fetch Cluster: failed to fetch dependency of \"Cluster\": failed to generate asset \"Platform Provisioning Check\": platform.baremetal.libvirtURI: Internal error: could not connect to libvirt: virError(Code=38, Domain=7, Message='Cannot recv data: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).: Connection reset by peer')\n" installID=sd8f9qwv time="2021-05-12T12:01:24Z" level=error msg="failed due to install error" error="exit status 1" installID=sd8f9qwv time="2021-05-12T12:01:24Z" level=fatal msg="runtime error" error="exit status 1" Logs from the SSH server: May 12 12:17:04 centos-m-32vcpu-256gb-ams3-01 sshd[633501]: Connection from 10.128.0.160 port 49409 on 10.128.0.1 port 22 ... May 12 12:17:04 centos-m-32vcpu-256gb-ams3-01 sshd[633501]: debug1: KEX done [preauth] May 12 12:17:04 centos-m-32vcpu-256gb-ams3-01 sshd[633501]: debug3: receive packet: type 5 [preauth] May 12 12:17:04 centos-m-32vcpu-256gb-ams3-01 sshd[633501]: debug3: send packet: type 6 [preauth] May 12 12:17:04 centos-m-32vcpu-256gb-ams3-01 sshd[633501]: debug3: receive packet: type 50 [preauth] May 12 12:17:04 centos-m-32vcpu-256gb-ams3-01 sshd[633501]: debug1: userauth-request for user root service ssh-connection method none [preauth] ... May 12 12:17:04 centos-m-32vcpu-256gb-ams3-01 sshd[633501]: debug3: userauth_finish: failure partial=0 next methods="publickey,gssapi-keyex,gssapi-with-mic" [preauth] (note "method none" in "userauth-request" - no private keys were loaded by the client) I have cross-checked this with the Hive folks and verified they do not touch anything, the culprit is the missing newline in the ${cluster}-ssh-private-key secret. I tested by appending several newlines and whitespace at the end of "Provider Connection" secret's sshPrivateKey field in metadata, and they all get trimmed down to (and including) the last newline: apiVersion: v1 kind: Secret metadata: labels: cluster.open-cluster-management.io/cloudconnection: "" cluster.open-cluster-management.io/provider: bmc name: foobarbaz namespace: foobarbaz type: Opaque stringData: metadata: | libvirtURI: 'qemu+ssh://root.foobar.com/system' ... sshPrivatekey: "-----BEGIN OPENSSH PRIVATE KEY-----\nb3BlbnNzaC1rZXktdjEA...BtYXJ2aW4uYm8xNC5sb2NhbAEC\n-----END OPENSSH PRIVATE KEY-----\n \n \n" sshPublickey: "ssh-rsa AAAAB3NzaC...ry4zNt johndoe\n" The sshPrivateKey field turns into a secret with no trailing newline.