Bug 1719808 - [IPI] [OSP] Pulling debug logs from the bootstrap machine never works because bootstrap node has no public address
Summary: [IPI] [OSP] Pulling debug logs from the bootstrap machine never works because...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.2.0
Assignee: Dan Prince
QA Contact: David Sanz
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-12 15:16 UTC by David Sanz
Modified: 2019-10-16 06:32 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:31:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 2212 0 'None' closed Bug 1719808: openstack: have gather bootstrap look for FIP 2020-02-07 14:56:27 UTC
Github openshift installer pull 2256 0 'None' closed Bug 1719808: openstack: change bootstrap_fip module for gather 2020-02-07 14:56:28 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:32:10 UTC

Description David Sanz 2019-06-12 15:16:06 UTC
Description of problem:

On installation fails, installer is trying to download bootstrap logs accessing the IP address of the bootstrap node. This is an address of an internal network that could not be accessible from the server where the installer is being executed.

Any connection with the cluster has to be done using the FIP address assigned to the API instance.

$ ./openshift-install create cluster
INFO Consuming "Install Config" from target directory 
INFO Creating infrastructure resources...         
INFO Waiting up to 30m0s for the Kubernetes API at https://api.morenod-ocp.qe.devcluster.openshift.com:6443... 
INFO API v1.14.0+d406851 up                       
INFO Waiting up to 30m0s for bootstrapping to complete... 
INFO Pulling debug logs from the bootstrap machine 
ERROR failed to create SSH client: dial tcp 192.168.26.7:22: connect: connection timed out 
FATAL failed to wait for bootstrapping to complete: timed out waiting for the condition 


Version-Release number of the following components:
$ ./openshift-install version
./openshift-install unreleased-master-1110-g7cc42fa81b7a26c6ae180af3ff097f32d8c30c51-dirty
built from commit 7cc42fa81b7a26c6ae180af3ff097f32d8c30c51
release image registry.svc.ci.openshift.org/ocp/release@sha256:f221a7095d3b4e57ecc2a9280152e3d096d3c665209945b1651ba4292e9a17f9


How reproducible:

Steps to Reproduce:
1.Install OCP4 on OSP
2.Wait until installation fails (or force it)
3.Check installation log

Actual results:
Installation fails and logs cannot be obtained

Expected results:
Logs are downloaded using the API instance as a jumping host to connect with the rest of the nodes

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Eric Duen 2019-07-30 15:42:53 UTC
Reassign to Dan Prince.  He is currently working on this.

Comment 2 weiwei jiang 2019-08-09 06:31:35 UTC
I saw bootstrap-fip is in for this isuse, but I found that
this floating ip will not be destroyed even I destroy the cluster.

Comment 3 David Sanz 2019-08-13 08:53:16 UTC
(In reply to weiwei jiang from comment #2)
> I saw bootstrap-fip is in for this isuse, but I found that
> this floating ip will not be destroyed even I destroy the cluster.

Created bug https://bugzilla.redhat.com/show_bug.cgi?id=1740543 for deleting this  FIP

Comment 4 weiwei jiang 2019-08-19 05:38:37 UTC
Tried with 
➜  ✗ ./openshift-install version 
./openshift-install v4.2.0-201908181300-dirty
built from commit 4e204c5e509de1bd31113b0c0e73af1a35e52c0a
release image registry.svc.ci.openshift.org/ocp/release@sha256:bbf1c5a9b0ca47cae481bd9327bcde6869b826118e14fce61a6e3f99728f4c4c

it still use internal ip to pull bootstrap logs.

...
DEBUG   Loading "Platform"...                      
DEBUG Using "Install Config" loaded from state file 
DEBUG Reusing previously-fetched "Install Config"  
INFO Pulling debug logs from the bootstrap machine 
ERROR failed to create SSH client, ensure the proper ssh key is in your keyring or specify with --key: dial tcp 192.168.0.14:22: connect: connection timed out 
FATAL waiting for Kubernetes API: context deadline exceeded 

(openstack) server list --name wjosp0819
+--------------------------------------+-------------------------------------+--------+---------------------------------------------------------------+-------+----------+
| ID                                   | Name                                | Status | Networks                                                      | Image | Flavor   |
+--------------------------------------+-------------------------------------+--------+---------------------------------------------------------------+-------+----------+
| 280acfc1-94e8-458c-953d-6121c41cd2f6 | preserve-wjosp0819b-6rwjc-master-2  | ACTIVE | preserve-wjosp0819b-6rwjc-openshift=192.168.0.25              |       | m1.large |
| 3ecf217b-4efc-4478-b92e-b2076f7273d1 | preserve-wjosp0819b-6rwjc-master-1  | ACTIVE | preserve-wjosp0819b-6rwjc-openshift=192.168.0.13              |       | m1.large |
| 98442568-b5a9-4868-a399-3c6fea8000d0 | preserve-wjosp0819b-6rwjc-master-0  | ACTIVE | preserve-wjosp0819b-6rwjc-openshift=192.168.0.27              |       | m1.large |
| 42de807d-af1c-45af-b935-d4101d74decb | preserve-wjosp0819b-6rwjc-bootstrap | ACTIVE | preserve-wjosp0819b-6rwjc-openshift=192.168.0.14, 10.0.79.162 |       | m1.large |
+--------------------------------------+-------------------------------------+--------+---------------------------------------------------------------+-------+----------+

Comment 5 Dan Prince 2019-08-19 11:07:49 UTC
Should be resolved by: https://github.com/openshift/installer/pull/2212

Comment 8 Mike Fedosin 2019-08-23 08:30:01 UTC
There is a regression caused by https://github.com/openshift/installer/pull/2128
The bootstrap_fip resource was moved from topology to bootstrap module, so it can't be found.

The fix that solves the regression is on review: https://github.com/openshift/installer/pull/2256

Comment 10 David Sanz 2019-08-26 09:03:29 UTC
Verified on 4.2.0-0.nightly-2019-08-25-233755:

$ ./openshift-install gather bootstrap --log-level debug
DEBUG OpenShift Installer v4.2.0-201908251340-dirty 
DEBUG Built from commit c2e6b0afd7f33ae0125d1ac96f3948919748ffc5 
DEBUG Fetching "Install Config"...                 
DEBUG Loading "Install Config"...                  
DEBUG   Loading "SSH Key"...                       
DEBUG   Loading "Base Domain"...                   
DEBUG     Loading "Platform"...                    
DEBUG   Loading "Cluster Name"...                  
DEBUG     Loading "Base Domain"...                 
DEBUG   Loading "Pull Secret"...                   
DEBUG   Loading "Platform"...                      
DEBUG Using "Install Config" loaded from state file 
DEBUG Reusing previously-fetched "Install Config"  
INFO Pulling debug logs from the bootstrap machine 
DEBUG Gathering bootstrap journals ...             
DEBUG Gathering bootstrap containers ...           
DEBUG Gathering rendered assets...                 
DEBUG Gathering cluster resources ...              
DEBUG error: the server doesn't have a resource type "pods" 
DEBUG error: the server doesn't have a resource type "nodes" 
DEBUG error: the server doesn't have a resource type "nodes" 
DEBUG error: the server doesn't have a resource type "apiservices" 
DEBUG error: the server doesn't have a resource type "pods" 
DEBUG error: the server doesn't have a resource type "clusterversion" 
DEBUG error: the server doesn't have a resource type "clusteroperators" 
DEBUG Waiting for logs ...                         
DEBUG error: the server doesn't have a resource type "csr" 
DEBUG error: the server doesn't have a resource type "configmaps" 
DEBUG error: the server doesn't have a resource type "kubecontrollermanager" 
DEBUG error: the server doesn't have a resource type "kubeapiserver" 
DEBUG error: the server doesn't have a resource type "events" 
DEBUG error: the server doesn't have a resource type "endpoints" 
DEBUG error: the server doesn't have a resource type "machineconfigpools" 
DEBUG error: the server doesn't have a resource type "machineconfigs" 
DEBUG error: the server doesn't have a resource type "namespaces" 
DEBUG error: the server doesn't have a resource type "nodes" 
DEBUG error: the server doesn't have a resource type "openshiftapiserver" 
DEBUG error: the server doesn't have a resource type "pods" 
DEBUG error: the server doesn't have a resource type "secrets" 
DEBUG error: the server doesn't have a resource type "roles" 
DEBUG error: the server doesn't have a resource type "services" 
DEBUG error: the server doesn't have a resource type "rolebindings" 
DEBUG error: the server doesn't have a resource type "secrets" 
DEBUG Error from server (NotFound): the server could not find the requested resource 
DEBUG Gather remote logs                           
DEBUG Collecting info from 192.168.0.16            
 EBUG Warning: Permanently added '192.168.0.16' (ECDSA) to the list of known hosts.
DEBUG Gathering master journals ...                
DEBUG Gathering master containers ...              
DEBUG Waiting for logs ...                         
DEBUG Collecting info from 192.168.0.14            
 EBUG Warning: Permanently added '192.168.0.14' (ECDSA) to the list of known hosts.
DEBUG Gathering master journals ...                
DEBUG Gathering master containers ...              
DEBUG Waiting for logs ...                         
DEBUG Collecting info from 192.168.0.24            
 EBUG Warning: Permanently added '192.168.0.24' (ECDSA) to the list of known hosts.
DEBUG Gathering master journals ...                
DEBUG Gathering master containers ...              
DEBUG Waiting for logs ...                         
DEBUG Log bundle written to ~/log-bundle.tar.gz    
INFO Bootstrap gather logs captured here "log-bundle-20190826110214.tar.gz"

Comment 12 errata-xmlrpc 2019-10-16 06:31:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.