Bug 1802820 - [UPI on Azure] the UPI templates do not let us ssh to the bootstrap node by default
Summary: [UPI on Azure] the UPI templates do not let us ssh to the bootstrap node by d...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.4.0
Assignee: Fabiano Franz
QA Contact: Etienne Simard
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-13 23:39 UTC by Etienne Simard
Modified: 2020-05-04 11:36 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-04 11:36:24 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 3118 0 None closed Bug 1802820: SSH to bootstrap node in Azure UPI 2020-12-04 15:51:34 UTC
Red Hat Product Errata RHBA-2020:0581 0 None None None 2020-05-04 11:36:39 UTC

Description Etienne Simard 2020-02-13 23:39:43 UTC
Description of problem:

Using the current UPI templates for 4.4 available at https://github.com/openshift/installer/tree/master/upi/azure and the docs available at https://github.com/openshift/installer/blob/master/docs/user/azure/install_upi.md do not provide ssh access to the bootstrap node for troubleshooting purposes

Version-Release number of the following components:

https://github.com/openshift/installer/commit/f058552712fbe65ec6f25f192a827dab82f70f9d

How reproducible: Easy

Steps to Reproduce:
1. Follow the UPI on Azure steps to create a new OCP cluster
2. Once the Bootstrap node is up, try to ssh

Actual results:

You can't ssh to the bootstrap node

~~~
$ ssh core@1.2.3.4
^C
~~~

Expected results:

We should be able to ssh to the bootstrap node and gather bootstrap logs 

~~~
$ ssh core@1.2.3.4
The authenticity of host '1.2.3.4 (1.2.3.4)' can't be established.
ECDSA key fingerprint is SHA256:JRPa0+ufibRP8lwOtb6cXaEx28EQrpWOufbeJWuVsR4.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '1.2.3.4' (ECDSA) to the list of known hosts.
Red Hat Enterprise Linux CoreOS 44.81.202002071430-0
  Part of OpenShift 4.4, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.4/architecture/architecture-rhcos.html

---
This is the bootstrap node; it will be destroyed when the master is fully up.

The primary service is "bootkube.service". To watch its status, run e.g.

  journalctl -b -f -u bootkube.service
~~~


Additional info:

I was able to ssh to the bootstrap node after adding a NAT and Firewall rules for SSH towards the bootstrap node.

Comment 3 Etienne Simard 2020-02-18 22:00:07 UTC
Code Version:

Installer master including https://github.com/openshift/installer/pull/3118

Installer Version:

DEBUG OpenShift Installer 4.4.0-0.nightly-2020-02-18-132334 
DEBUG Built from commit 43bed121efd7d9b3353e7ef5bd85dae07e0cc97e 



Failed verification: while the ssh NSG rule is present in the updated bootstrap template, there are no LB or Inbound NAT rules so the connection is not accepted unless you add one of those to the public-lb.

Example

~~~

[After the bootstrap is up and running]

[esimard@elaptop essboot2]$ ssh core@13.89.107.79
^C
[esimard@elaptop essboot2]$ telnet 13.89.107.79 22
Trying 13.89.107.79...
^C

[After Successfully saved load balancer inbound NAT rule 'natssh']

[esimard@elaptop essboot2]$ telnet 13.89.107.79 22
Trying 13.89.107.79...
Connected to 13.89.107.79.
Escape character is '^]'.
SSH-2.0-OpenSSH_8.0
^]
telnet> ^CConnection closed.
[esimard@elaptop essboot2]$ ssh core@13.89.107.79
The authenticity of host '13.89.107.79 (13.89.107.79)' can't be established.
ECDSA key fingerprint is SHA256:DUh0QqZc5OLbp/bclpuRnoeZtWurotXyqT+yX5iHBUg.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '13.89.107.79' (ECDSA) to the list of known hosts.
Red Hat Enterprise Linux CoreOS 44.81.202001241431.0
  Part of OpenShift 4.4, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.4/architecture/architecture-rhcos.html

---
This is the bootstrap node; it will be destroyed when the master is fully up.

The primary service is "bootkube.service". To watch its status, run e.g.

  journalctl -b -f -u bootkube.service
[systemd]
Failed Units: 1
  sssd.service
~~~

Comment 4 Fabiano Franz 2020-02-19 17:33:17 UTC
Are you trying to connect using the same public IP attached to the public load balancer (the same that serves api.clustername)?

The strategy I proposed in https://github.com/openshift/installer/pull/3118 is to use _another_ public IP specifically for SSH access, pointing directly to the network interface that belongs to the bootstrap VM, and that public IP gets deleted along with the NSG rule once the bootstrap process is finished. WDYT about this approach? 

I understand, though, it's not clear and maybe we need a mention to this in the docs, with a command to print the connection string with the ssh-specific public IP.

Comment 5 Etienne Simard 2020-02-19 18:24:48 UTC
Hello Fabiano,

No, I copied the `Public IP address` directly from the bootstrap Virtual Machine. I don't mind the approach as long as it's documented but it might not be the standard way to do it.
My other tests are done with the `Public IP address` (Azure IPI and possibly other platforms as well).

I just re-tested with the `bootstrap-ssh-pip` address and confirmed that it works. If you decide to continue with this approach, please re-send it to ON_QA and so I can add the verification logs.

Comment 6 Etienne Simard 2020-02-20 23:16:04 UTC
Verified 

Installer master including https://github.com/openshift/installer/pull/3118

Installer Version:

DEBUG OpenShift Installer 4.4.0-0.nightly-2020-02-18-132334 
DEBUG Built from commit 43bed121efd7d9b3353e7ef5bd85dae07e0cc97e 


Notes: My documentation on that subject was not right: if you click connect, it's the IP dedicated to the bootstrap. In the bootstrap NIC, the NIC public IP shows the IP dedicated to the bootstrap (same as IPI on Azure). In the bootstrap VM overview, the public IP is the one for the public LB. 


Confirmed SSH and gather boostrap:

~~~
$ ./openshift-install gather bootstrap --bootstrap 52.154.233.131 --master "10.0.0.8"
INFO Pulling debug logs from the bootstrap machine 
INFO Bootstrap gather logs captured here "log-bundle-20200219131723.tar.gz" 
~~~

Comment 8 errata-xmlrpc 2020-05-04 11:36:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.