Bug 1889273

Summary: Improve the generic error message 'connect: no route to host'
Product: OpenShift Container Platform Reporter: Michael Burman <mburman>
Component: InstallerAssignee: aos-install
Installer sub component: openshift-installer QA Contact: Gaoyun Pei <gpei>
Status: CLOSED NOTABUG Docs Contact:
Severity: medium    
Priority: unspecified CC: gzaidman, hpopal
Version: 4.6   
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-20 14:03:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michael Burman 2020-10-19 08:50:43 UTC
Improve the generic error message 'connect: no route to host' 

Many times the OCP on RHV failing with the generic error message:
no route to host

This error is not useful and very generic. The real issue or failure can't be understood from such message.
We need to try and improve it and provide the user a meaningful error message.

For example, I'm now failing with:
DEBUG Still waiting for the Kubernetes API: Get "https://<hostname>:6443/version?timeout=32s": dial tcp 10.x.x.x:6443: connect: no route to host 
ERROR Attempted to gather ClusterOperator status after installation failure: listing ClusterOperator objects: Get "https://<hostname>:6443/apis/config.openshift.io/v1/clusteroperators": dial tcp 10.x.x.x:6443: connect: no route to host 
DEBUG Fetching Bootstrap SSH Key Pair...           
DEBUG Loading Bootstrap SSH Key Pair...            
DEBUG Using Bootstrap SSH Key Pair loaded from state file 
DEBUG Reusing previously-fetched Bootstrap SSH Key Pair 
DEBUG Fetching Install Config...                   
DEBUG Loading Install Config...                    
DEBUG   Loading SSH Key...                         
DEBUG   Using SSH Key loaded from state file       
DEBUG   Loading Base Domain...                     
DEBUG     Loading Platform...                      
DEBUG     Using Platform loaded from state file    
DEBUG   Using Base Domain loaded from state file   
DEBUG   Loading Cluster Name...                    
DEBUG     Loading Base Domain...                   
DEBUG     Loading Platform...                      
DEBUG   Using Cluster Name loaded from state file  
DEBUG   Loading Pull Secret...                     
DEBUG   Using Pull Secret loaded from state file   
DEBUG   Loading Platform...                        
DEBUG Using Install Config loaded from state file  
DEBUG Reusing previously-fetched Install Config    
INFO Pulling debug logs from the bootstrap machine 
DEBUG Added /tmp/bootstrap-ssh263426912 to installer's internal agent 
DEBUG Added /root/.ssh/id_rsa to installer's internal agent 
ERROR Attempted to gather debug logs after installation failure: failed to create SSH client: failed to use the provided keys for authentication: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain 
FATAL Bootstrap failed to complete: failed waiting for Kubernetes API: Get "https://<hostname>:6443/version?timeout=32s": dial tcp 10.x.x.x:6443: connect: no route to host

Bootstrap VM:

Error: error pulling image "registry.svc.ci.openshift.org/ocp/release@sha256:9663f178a9a5bf87fad0d4e2dabeaef32110d4c9c3d400eededd4f6bff5109fc": unable to pull registry.svc.ci.openshift.org/ocp/release@sha256:9663f178a9a5bf87fad0d4e2dabeaef32110d4c9c3d400eededd4f6bff5109fc: unable to pull image: Error initializing source docker://registry.svc.ci.openshift.org/ocp/release@sha256:9663f178a9a5bf87fad0d4e2dabeaef32110d4c9c3d400eededd4f6bff5109fc: Error reading manifest sha256:9663f178a9a5bf87fad0d4e2dabeaef32110d4c9c3d400eededd4f6bff5109fc in registry.svc.ci.openshift.org/ocp/release: unauthorized: authentication required

Version:
openshift-install-linux-4.6.0-0.nightly-2020-10-03-051134

Platform:
ovirt/RHV

Please specify:
* IPI 

What happened?
Install failed with no route to host error message. 

What did you expect to happen?
Provide clear and meaningful 
The current error is not helpful and might be misleading the user about the real issue. 

How to reproduce it (as minimally and precisely as possible)?
100% at the current moment
Issue was shown to devel
FYI, i have masked the hostname and IPs in the description.

Comment 2 Scott Dodson 2020-10-20 14:03:15 UTC
The API is the Installer's only view in to the target cluster, if that API never becomes available then the fall back is to generate a bootstrap log bundle. We're working to ensure that log bundle includes information necessary to clearly identify pull secret errors like yours. 

This is tracked in https://issues.redhat.com/browse/CORS-1533 as a feature.