Bug 1935411 - Missing newline in ssh private key prevents ACM from connecting to provisioner node
Summary: Missing newline in ssh private key prevents ACM from connecting to provisione...
Keywords:
Status: MODIFIED
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: Cluster Lifecycle
Version: rhacm-2.1.z
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Le Yang
QA Contact: Derek Ho
Christopher Dawson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-04 20:05 UTC by Lars Kellogg-Stedman
Modified: 2025-11-01 08:28 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github open-cluster-management backlog issues 10129 0 None None None 2021-03-05 13:56:02 UTC

Description Lars Kellogg-Stedman 2021-03-04 20:05:03 UTC
Description of problem:

We're using ACM 2.1.4 to install a baremetal cluster. When creating a new provisioner connection, we pasted an ssh private key into the appropriate field. At install time, the connection to the provisioner was failing with a "permission denied" error.

Upon inspection, the Secret containing the private key looked like this:

  stringData:
    ssh-privatekey: |-
      -----BEGIN OPENSSH PRIVATE KEY-----
      b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEbm9uZQAAAAAAAAABAAACFwAAAAdzc2gtcn
      NhAAAAAwEAAQAAAgEA6oP1JMWngWpB1rZKe7XDvfQZpjO10DjbkL/rNSgttB+I75+twG9Z
      [...]
      G6lJk6jX+b7h8gwPy1usHz2RGfyIL+1pajvPVodxSEtteI1Esq1o3A+YliQhPmH8GDvn6v
      Mz2zm42cckKjbNgKfHwY1F8x4p/t7Vh57u+wtSwbqr3isXGVJIJzB0kwtrUZFU6WnKmdWp
      2S9oDKnSYMICawAAAA9BQ00gcHJvdmlzaW9uZXIBAg==
      -----END OPENSSH PRIVATE KEY-----
  type: Opaque

The keyfile in the above YAML document lacks a terminal newline.
Because of the missing newline, the key was rejected by ssh. We were
able to resolve the problem by correcting the missing newline and
submitting the cluster configuration using `oc apply`.

Version-Release number of selected component (if applicable):

ACM 2.1.4

Comment 1 Nathan Weatherly 2021-03-08 14:08:23 UTC
Hey Lars!

To be clear, you're installing a new cluster to manage, no ACM's Hub cluster. Is that correct?

Thanks!

Nathan

Comment 2 Nathan Weatherly 2021-03-09 13:00:44 UTC
This is concerning the installation of a Bare Metal cluster, not ACM's Hub; reassigning to Cluster Lifecycle for triage.

Comment 3 Grega Bremec 2021-05-13 05:52:30 UTC
I can confirm this bug still exists in ACM 2.2.3.

Hive can not ssh-add the private key to ssh-agent in order to connect to libvirtURI during a baremetal cluster deployment because ssh-add complains about a malformed key.

Correcting the secret inflight is not an option as there are literally seconds between the moment it gets created and before Hive attempts to add it to ssh-agent in the provisioning pod.

Comment 4 Grega Bremec 2021-05-13 06:02:34 UTC
Logs from the hive container of the provisioning pod:

time="2021-05-12T11:57:04Z" level=debug msg="Couldn't find install logs provider environment variable. Skipping."
I0512 11:57:05.258136       1 request.go:645] Throttling request took 1.006969248s, request: GET:https://172.30.0.1:443/apis/autoscaling.openshift.io/v1?timeout=32s
time="2021-05-12T11:57:07Z" level=debug msg="checking for SSH private key" installID=sd8f9qwv
time="2021-05-12T11:57:07Z" level=debug msg="checking for SSH private key" installID=sd8f9qwv
time="2021-05-12T11:57:07Z" level=info msg="initializing ssh agent with 2 keys" installID=sd8f9qwv
time="2021-05-12T11:57:07Z" level=debug msg="no SSH_AUTH_SOCK defined. starting ssh-agent" installID=sd8f9qwv
time="2021-05-12T11:57:08Z" level=error msg="failed to add private key: /tmp/ssh-privatekey" error="exit status 1" installID=sd8f9qwv key=/tmp/ssh-privatekey
time="2021-05-12T11:57:08Z" level=info msg="ssh agent is not initialized" error="exit status 1" installID=sd8f9qwv
time="2021-05-12T11:57:08Z" level=info msg="waiting for files to be available: [/output/openshift-install /output/oc]" installID=sd8f9qwv
...
time="2021-05-12T12:01:20Z" level=debug msg="  Generating Platform Permissions Check..."
time="2021-05-12T12:01:20Z" level=debug msg="  Fetching Platform Provisioning Check..."
time="2021-05-12T12:01:20Z" level=debug msg="    Fetching Install Config..."
time="2021-05-12T12:01:20Z" level=debug msg="    Reusing previously-fetched Install Config"
time="2021-05-12T12:01:20Z" level=debug msg="  Generating Platform Provisioning Check..."
time="2021-05-12T12:01:22Z" level=fatal msg="failed to fetch Cluster: failed to fetch dependency of \"Cluster\": failed to generate asset \"Platform Provisioning Check\": platform.baremetal.libvirtURI: Internal error: could not connect to libvirt: virError(Code=38, Domain=7, Message='Cannot recv data: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).: Connection reset by peer')"
time="2021-05-12T12:01:23Z" level=error msg="error after waiting for command completion" error="exit status 1" installID=sd8f9qwv
time="2021-05-12T12:01:23Z" level=error msg="error provisioning cluster" error="exit status 1" installID=sd8f9qwv
time="2021-05-12T12:01:23Z" level=error msg="error running openshift-install, running deprovision to clean up" error="exit status 1" installID=sd8f9qwv
time="2021-05-12T12:01:23Z" level=debug msg="Unable to find log storage actuator. Disabling gathering logs." installID=sd8f9qwv
time="2021-05-12T12:01:23Z" level=info msg="saving installer output" installID=sd8f9qwv
time="2021-05-12T12:01:23Z" level=debug msg="installer console log: level=info msg=Consuming Install Config from target directory\nlevel=warning msg=Making control-plane schedulable by setting MastersSchedulable to true for Scheduler cluster settings\nlevel=warning msg=Discarding the Openshift Manifests that was provided in the target directory because its dependencies are dirty and it needs to be regenerated\nlevel=info msg=Manifests created in: manifests and openshift\nlevel=warning msg=Found override for release image. Please be warned, this is not advised\nlevel=info msg=Consuming Worker Machines from target directory\nlevel=info msg=Consuming Openshift Manifests from target directory\nlevel=info msg=Consuming OpenShift Install (Manifests) from target directory\nlevel=info msg=Consuming Common Manifests from target directory\nlevel=info msg=Consuming Master Machines from target directory\nlevel=info msg=Ignition-Configs created in: . and auth\nlevel=info msg=Consuming Worker Ignition Config from target directory\nlevel=info msg=Consuming Master Ignition Config from target directory\nlevel=info msg=Consuming Bootstrap Ignition Config from target directory\nlevel=info msg=Obtaining RHCOS image file from 'https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.7/47.83.202103251640-0/x86_64/rhcos-47.83.202103251640-0-qemu.x86_64.qcow2.gz?sha256=2cc7c8841e6b2b0f5d3573b82453fddad3c44972c080969458af85c7097e9bc5'\nlevel=fatal msg=failed to fetch Cluster: failed to fetch dependency of \"Cluster\": failed to generate asset \"Platform Provisioning Check\": platform.baremetal.libvirtURI: Internal error: could not connect to libvirt: virError(Code=38, Domain=7, Message='Cannot recv data: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).: Connection reset by peer')\n" installID=sd8f9qwv
time="2021-05-12T12:01:24Z" level=error msg="failed due to install error" error="exit status 1" installID=sd8f9qwv
time="2021-05-12T12:01:24Z" level=fatal msg="runtime error" error="exit status 1"

Logs from the SSH server:

May 12 12:17:04 centos-m-32vcpu-256gb-ams3-01 sshd[633501]: Connection from 10.128.0.160 port 49409 on 10.128.0.1 port 22
...
May 12 12:17:04 centos-m-32vcpu-256gb-ams3-01 sshd[633501]: debug1: KEX done [preauth]
May 12 12:17:04 centos-m-32vcpu-256gb-ams3-01 sshd[633501]: debug3: receive packet: type 5 [preauth]
May 12 12:17:04 centos-m-32vcpu-256gb-ams3-01 sshd[633501]: debug3: send packet: type 6 [preauth]
May 12 12:17:04 centos-m-32vcpu-256gb-ams3-01 sshd[633501]: debug3: receive packet: type 50 [preauth]
May 12 12:17:04 centos-m-32vcpu-256gb-ams3-01 sshd[633501]: debug1: userauth-request for user root service ssh-connection method none [preauth]
...
May 12 12:17:04 centos-m-32vcpu-256gb-ams3-01 sshd[633501]: debug3: userauth_finish: failure partial=0 next methods="publickey,gssapi-keyex,gssapi-with-mic" [preauth]

(note "method none" in "userauth-request" - no private keys were loaded by the client)

I have cross-checked this with the Hive folks and verified they do not touch anything, the culprit is the missing newline in the ${cluster}-ssh-private-key secret.

I tested by appending several newlines and whitespace at the end of "Provider Connection" secret's sshPrivateKey field in metadata, and they all get trimmed down to (and including) the last newline:

  apiVersion: v1
  kind: Secret
  metadata:
    labels:
      cluster.open-cluster-management.io/cloudconnection: ""
      cluster.open-cluster-management.io/provider: bmc
    name: foobarbaz
    namespace: foobarbaz
  type: Opaque
  stringData:
    metadata: |
      libvirtURI: 'qemu+ssh://root.foobar.com/system'
      ...
      sshPrivatekey: "-----BEGIN OPENSSH PRIVATE KEY-----\nb3BlbnNzaC1rZXktdjEA...BtYXJ2aW4uYm8xNC5sb2NhbAEC\n-----END OPENSSH PRIVATE KEY-----\n  \n  \n"
      sshPublickey: "ssh-rsa AAAAB3NzaC...ry4zNt johndoe\n"

The sshPrivateKey field turns into a secret with no trailing newline.


Note You need to log in before you can comment on or make changes to this bug.