Bug 1848527 - gathering debug bootstrap information should not depend on SSH key in config
Summary: gathering debug bootstrap information should not depend on SSH key in config
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.5
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.6.0
Assignee: Abhinav Dahiya
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-18 13:42 UTC by Aleksandar Kostadinov
Modified: 2020-07-21 02:37 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-06-29 17:15:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1811453 0 medium CLOSED gathering debug bootstrap information should not require prior ssh setup 2021-02-22 00:41:40 UTC

Description Aleksandar Kostadinov 2020-06-18 13:42:07 UTC
Description of problem:

When bootstrap process fails, openshift-installer tries to obtain debug logs by performing an SSH connection to the bootstrap machine. As of Bug #1811453 installer automatically generates an SSH key when user did *NOT* specify SSH public key in install-config.

I noticed that if user *HAS* specified an SSH key in install config, then during gather, the installer fails to obtain logs unless the private key is in `~/.ssh`.

My request is for the installer to always generate an SSH kye pair that would be used for gathering logs and optionally also add a user public key if such is specified.

This is useful in CI/ops scenarios where user:
* does not want to provide their private key to the build system
* wants failure log gathered automatically
* want SSH access to the nodes for debug purposes

I hope it is straightforward for installer to put both - its own public key and user's public key in authorized_keys file.

Version-Release number of the following components:

> ./openshift-install 4.4.7
>  built from commit 37d024ff54e3d2caf853ced640453222546be935
>  release image registry.svc.ci.openshift.org/ocp/release@sha256:d0f3b8dae00e0a5574af01fa927f7fb2a835495887566140274ae1ab227cbdf0


How reproducible:
always

Steps to Reproduce:
1. run IPI installer
2. set SSH public key to something NOT present in ~/.ssh
3. bootstrap should fail (not sure how to cause this)

Actual results:

> level=error msg="Attempted to gather debug logs after installation failure: failed to create SSH client: failed to initialize the SSH agent: no keys found for SSH agent"

Expected results:

> INFO Bootstrap gather logs captured here "/mnt/install-dir/log-bundle-20200617151547.tar.gz"

Comment 2 Abhinav Dahiya 2020-06-18 23:54:51 UTC
>  As of Bug #1811453 installer automatically generates an SSH key when user did *NOT* specify SSH public key in install-config.
> I noticed that if user *HAS* specified an SSH key in install config, then during gather, the installer fails to obtain logs unless the private key is in `~/.ssh`.
> My request is for the installer to always generate an SSH kye pair that would be used for gathering logs and optionally also add a user public key if such is specified.

That's inaccurate, we always add the installer created ssh key to the bootstrap host, but this only available starting 4.5. So try with that you reproducer had 4.4.7 installer.

Remember we only add this key to the bootstrap host, it will NOT be added to the control-plane or compute because of security reasons.

Comment 3 Abhinav Dahiya 2020-06-24 14:55:23 UTC
Waiting for response from the user, will close by EOW otherwise.

Comment 4 Aleksandar Kostadinov 2020-06-25 20:13:26 UTC
Hi, sorry, I didn't forget about it but needed time to test the difference. So with 4.5 indeed installer creates its own key always. And can gather logs from bootstrap node. It cannot do so from other nodes it seems to me.

Log when user DID specify SSH key but it was not found in ~/.ssh:

> INFO Pulling debug logs from the bootstrap machine 
> DEBUG Added /tmp/bootstrap-ssh073982441 to installer's internal agent 
> DEBUG Gathering bootstrap systemd summary ...      
> DEBUG Gathering bootstrap failed systemd unit status ... 
> DEBUG Gathering bootstrap journals ...             
> DEBUG Gathering bootstrap containers ...           
> DEBUG Gathering rendered assets...                 
> DEBUG Gathering cluster resources ...              
> DEBUG Waiting for logs ...                         
> DEBUG Gather remote logs                           
> DEBUG Collecting info from 192.168.0.225           
> DEBUG lost connection                              
> DEBUG Warning: Permanently added '192.168.0.225' (ECDSA) to the list of known ho ts.
> DEBUG core.0.225: Permission denied (publickey,gssapi-keyex,gssapi-with- ic).
> DEBUG Collecting info from 192.168.1.154           
> DEBUG lost connection                              
> DEBUG Warning: Permanently added '192.168.1.154' (ECDSA) to the list of known ho ts.
> DEBUG core.1.154: Permission denied (publickey,gssapi-keyex,gssapi-with- ic).
> DEBUG Collecting info from 192.168.2.52            
> DEBUG lost connection                              
> DEBUG Warning: Permanently added '192.168.2.52' (ECDSA) to the list of known hos s.
> DEBUG core.2.52: Permission denied (publickey,gssapi-keyex,gssapi-with-m c).
> DEBUG Log bundle written to /var/home/core/log-bundle-20200624101745.tar.gz 

Log when user DID specify SSH key and it WAS present in ~/.ssh:

> level=debug msg="Added /tmp/bootstrap-ssh519240711 to installer's internal agent"
> level=debug msg="Added /home/jboss/.ssh/ocp-bootstrap-20200625-32-tvs84b.pem to installer's internal agent"
> level=debug msg="Gathering bootstrap systemd summary ..."
> level=debug msg="Gathering bootstrap failed systemd unit status ..."
> level=debug msg="Gathering bootstrap journals ..."
> level=debug msg="Gathering bootstrap containers ..."
> level=debug msg="Gathering rendered assets..."
> level=debug msg="Gathering cluster resources ..."
> level=debug msg="Waiting for logs ..."
> level=debug msg="Gather remote logs"
> level=debug msg="Collecting info from 192.168.1.236"
> level=debug msg="Warning: Permanently added '192.168.1.236' (ECDSA) to the list of known hosts.\r"
> level=debug msg="Gathering master systemd summary ..."
> level=debug msg="Gathering master failed systemd unit status ..."
> level=debug msg="Gathering master journals ..."
> level=debug msg="Gathering master containers ..."
> level=debug msg="Waiting for logs ..."
> level=debug msg="Collecting info from 192.168.2.119"
> level=debug msg="Warning: Permanently added '192.168.2.119' (ECDSA) to the list of known hosts.\r"
> level=debug msg="Gathering master systemd summary ..."
> level=debug msg="Gathering master failed systemd unit status ..."
> level=debug msg="Gathering master journals ..."
> level=debug msg="Gathering master containers ..."
> level=debug msg="Waiting for logs ..."
> level=debug msg="Collecting info from 192.168.1.220"
> level=debug msg="Warning: Permanently added '192.168.1.220' (ECDSA) to the list of known hosts.\r"
> level=debug msg="Gathering master systemd summary ..."
> level=debug msg="Gathering master failed systemd unit status ..."
> level=debug msg="Gathering master journals ..."
> level=debug msg="Gathering master containers ..."
> level=debug msg="Waiting for logs ..."
> level=debug msg="Log bundle written to /var/home/core/log-bundle-20200625152401.tar.gz"
> level=info msg="Bootstrap gather logs captured here \"/mnt/flexy/workdir/install-dir/log-bundle-20200625152401.tar.gz\""

Is bootstrap SSH key uploaded to all machines when used did NOT provide a key? Can't bootstrap key be also setup on the other machines? If there is a security concern, can't that key be deleted after cluster is up and running once?

Overall things are improved compared to 4.4 and earlier though.

Comment 5 Abhinav Dahiya 2020-06-29 17:15:33 UTC
> 
> Remember we only add this key to the bootstrap host, it will NOT be added to
> the control-plane or compute because of security reasons.

The reason for when we don't add it to control plane is security reasons and we don't intend to change it. Closing it as it already works for 4.5

Comment 6 Aleksandar Kostadinov 2020-06-30 22:49:49 UTC
Abhinav Dahiya, could you point at relevant discussion or explain what these "security reasons" are? Just stating "security reasons" precludes any further development that can possibly implement the feature  avoiding these problems.

Comment 7 Abhinav Dahiya 2020-07-21 02:37:12 UTC
Putting installer generated keys on the cluster machines which allows root access to those machines over SSH is something we do not want to take on. Bootstrap machine is transient but cluster machines are not. It puts extra burden on the installer to maintain and respond to problems that can arise.
Therefore we will let the users decide if they want certain keys to be installed on the cluster machines and not do it for them automatically.


Note You need to log in before you can comment on or make changes to this bug.