Bug 1811453

Summary: gathering debug bootstrap information should not require prior ssh setup
Product: OpenShift Container Platform Reporter: Aleksandar Kostadinov <akostadi>
Component: InstallerAssignee: Abhinav Dahiya <adahiya>
Installer sub component: openshift-installer QA Contact: Yunfei Jiang <yunjiang>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: sdodson, yunjiang
Version: 4.4Keywords: UserExperience
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Release Note
Doc Text:
The installer now configures the bootstrap-host with a generated SSH key which will be used by `gather bootstrap` command when there are no other authentication methods available i.e SSH_AGENT is not configured and no usuable keys in `~/.ssh`
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-13 17:18:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Aleksandar Kostadinov 2020-03-08 21:48:55 UTC
Description of problem:

When bootstrap process fails, openshift-installer tries to obtain debug logs by performing an SSH connection to the bootstrap machine. For this to work, during installation user needs to specify an SSH public key and have private key under ~/.ssh

IT will be much better user experience if installer is always able to collect bootstrap debug information regardless of user input.

Possible remedy when user does not specify a key or specifies a public key that installer doesn't have a private key for:
1. installer to use existing key pair from `~/.ssh` 
2. (if no existing key pair) to generate a new SSH key in current installer directory (not inside ~/.ssh)
3. Use key from point #1 or #2 when bootstrap debugging is required.

Version-Release number of the following components:
bash-4.2$ workdir/openshift-install version
workdir/openshift-install 4.4.0-0.nightly-2020-03-07-113547
built from commit f371355517f9da267c295e11c01cd3dfc54b39d4
release image registry.svc.ci.openshift.org/ocp/release@sha256:f616ef3c31ea273818d511a61396bba3e49ef20fce86a51d0fb290b9cf5a0894

How reproducible:
always

Steps to Reproduce:
1. run IPI installer
2. bootstrap should fail (not sure how to cause this)

Actual results:

> DEBUG Using Install Config loaded from state file  
> DEBUG Reusing previously-fetched Install Config    
> INFO Pulling debug logs from the bootstrap machine 
> ERROR Attempted to gather debug logs after installation failure: failed to create SSH client: failed to initialize the SSH agent: no keys found for SSH agent 
> FATAL Bootstrap failed to complete: failed to wait for bootstrapping to complete: timed out waiting for the condition 
> ...

Expected results:

> INFO Pulling debug logs from the bootstrap machine 
> DEBUG Added /path/to/installer/generated/ssh/key.pem to installer's internal agent 
> DEBUG Gathering bootstrap systemd summary ...
> ...

Comment 1 Aleksandar Kostadinov 2020-03-08 21:56:15 UTC
btw another minor issue is that installer log does not point at correct log bundle location. In log I see

> DEBUG Log bundle written to /var/home/core/log-bundle-20200308190642.tar.gz

while bundle is actually generated as

> install-dir/log-bundle-20200308190642.tar.gz

Excuse me if you want a separate bug for this.

Comment 2 Scott Dodson 2020-03-09 12:55:53 UTC
(In reply to Aleksandar Kostadinov from comment #1)
> btw another minor issue is that installer log does not point at correct log
> bundle location. In log I see
> 
> > DEBUG Log bundle written to /var/home/core/log-bundle-20200308190642.tar.gz
> 
> while bundle is actually generated as
> 
> > install-dir/log-bundle-20200308190642.tar.gz
> 
> Excuse me if you want a separate bug for this.

That's a bug, the original premise of this bug is a feature request. Lets re-scope and deal only with the confusing path emitted.

You can file an RFE over here https://issues.redhat.com/projects/RFE/issues

Comment 3 Abhinav Dahiya 2020-03-10 00:20:57 UTC
Please open a separate bug wrt https://bugzilla.redhat.com/show_bug.cgi?id=1811453#c1 

it's difficult to track a bug whose description is in some later commit.

Comment 4 Aleksandar Kostadinov 2020-04-07 21:05:55 UTC
I created bug 1821932 for log output issue. Reverting back title of this issue.

Skott, could you explain why it is beneficial for the installer to fail gathering logs in absence of initial ssh setup? IMHO it is fine to use user SSH setup when such exist but who would mind the log gathering to work even when prior SSH setup is missing. I mean kind of user will object? More likely users without SSH setup would be happy.

Comment 5 Abhinav Dahiya 2020-04-24 02:20:42 UTC
https://github.com/openshift/installer/pull/3437

^^ that should allow the users to not provide the ssh key, but still in those cases get logs from bootstrap-host. It does not configure the cluster with that ssh key.

Comment 9 Mike Gahagan 2020-04-30 19:32:12 UTC
Verified using 4.5.0-0.nightly-2020-04-30-112808 using ipi install on Azure

started creating cluster specifying "none" for ssh key.
wait for installation to reach the waiting for bootstrapping to complete phase
go to Azure portal, find resource group containing the cluster, turn off all master vm's
Installation fails after the 40 minute timeout
Installer successfully gathered bootstrap logs
Also examined the terraform.tfvars.json file and confirmed the public ssh key in that file does not match any keys on my system confirming the installer created the public/private key pair.

Comment 11 errata-xmlrpc 2020-07-13 17:18:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409