Description of problem: Using the current UPI templates for 4.4 available at https://github.com/openshift/installer/tree/master/upi/azure and the docs available at https://github.com/openshift/installer/blob/master/docs/user/azure/install_upi.md do not provide ssh access to the bootstrap node for troubleshooting purposes Version-Release number of the following components: https://github.com/openshift/installer/commit/f058552712fbe65ec6f25f192a827dab82f70f9d How reproducible: Easy Steps to Reproduce: 1. Follow the UPI on Azure steps to create a new OCP cluster 2. Once the Bootstrap node is up, try to ssh Actual results: You can't ssh to the bootstrap node ~~~ $ ssh core.3.4 ^C ~~~ Expected results: We should be able to ssh to the bootstrap node and gather bootstrap logs ~~~ $ ssh core.3.4 The authenticity of host '1.2.3.4 (1.2.3.4)' can't be established. ECDSA key fingerprint is SHA256:JRPa0+ufibRP8lwOtb6cXaEx28EQrpWOufbeJWuVsR4. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '1.2.3.4' (ECDSA) to the list of known hosts. Red Hat Enterprise Linux CoreOS 44.81.202002071430-0 Part of OpenShift 4.4, RHCOS is a Kubernetes native operating system managed by the Machine Config Operator (`clusteroperator/machine-config`). WARNING: Direct SSH access to machines is not recommended; instead, make configuration changes via `machineconfig` objects: https://docs.openshift.com/container-platform/4.4/architecture/architecture-rhcos.html --- This is the bootstrap node; it will be destroyed when the master is fully up. The primary service is "bootkube.service". To watch its status, run e.g. journalctl -b -f -u bootkube.service ~~~ Additional info: I was able to ssh to the bootstrap node after adding a NAT and Firewall rules for SSH towards the bootstrap node.
Code Version: Installer master including https://github.com/openshift/installer/pull/3118 Installer Version: DEBUG OpenShift Installer 4.4.0-0.nightly-2020-02-18-132334 DEBUG Built from commit 43bed121efd7d9b3353e7ef5bd85dae07e0cc97e Failed verification: while the ssh NSG rule is present in the updated bootstrap template, there are no LB or Inbound NAT rules so the connection is not accepted unless you add one of those to the public-lb. Example ~~~ [After the bootstrap is up and running] [esimard@elaptop essboot2]$ ssh core.107.79 ^C [esimard@elaptop essboot2]$ telnet 13.89.107.79 22 Trying 13.89.107.79... ^C [After Successfully saved load balancer inbound NAT rule 'natssh'] [esimard@elaptop essboot2]$ telnet 13.89.107.79 22 Trying 13.89.107.79... Connected to 13.89.107.79. Escape character is '^]'. SSH-2.0-OpenSSH_8.0 ^] telnet> ^CConnection closed. [esimard@elaptop essboot2]$ ssh core.107.79 The authenticity of host '13.89.107.79 (13.89.107.79)' can't be established. ECDSA key fingerprint is SHA256:DUh0QqZc5OLbp/bclpuRnoeZtWurotXyqT+yX5iHBUg. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '13.89.107.79' (ECDSA) to the list of known hosts. Red Hat Enterprise Linux CoreOS 44.81.202001241431.0 Part of OpenShift 4.4, RHCOS is a Kubernetes native operating system managed by the Machine Config Operator (`clusteroperator/machine-config`). WARNING: Direct SSH access to machines is not recommended; instead, make configuration changes via `machineconfig` objects: https://docs.openshift.com/container-platform/4.4/architecture/architecture-rhcos.html --- This is the bootstrap node; it will be destroyed when the master is fully up. The primary service is "bootkube.service". To watch its status, run e.g. journalctl -b -f -u bootkube.service [systemd] Failed Units: 1 sssd.service ~~~
Are you trying to connect using the same public IP attached to the public load balancer (the same that serves api.clustername)? The strategy I proposed in https://github.com/openshift/installer/pull/3118 is to use _another_ public IP specifically for SSH access, pointing directly to the network interface that belongs to the bootstrap VM, and that public IP gets deleted along with the NSG rule once the bootstrap process is finished. WDYT about this approach? I understand, though, it's not clear and maybe we need a mention to this in the docs, with a command to print the connection string with the ssh-specific public IP.
Hello Fabiano, No, I copied the `Public IP address` directly from the bootstrap Virtual Machine. I don't mind the approach as long as it's documented but it might not be the standard way to do it. My other tests are done with the `Public IP address` (Azure IPI and possibly other platforms as well). I just re-tested with the `bootstrap-ssh-pip` address and confirmed that it works. If you decide to continue with this approach, please re-send it to ON_QA and so I can add the verification logs.
Verified Installer master including https://github.com/openshift/installer/pull/3118 Installer Version: DEBUG OpenShift Installer 4.4.0-0.nightly-2020-02-18-132334 DEBUG Built from commit 43bed121efd7d9b3353e7ef5bd85dae07e0cc97e Notes: My documentation on that subject was not right: if you click connect, it's the IP dedicated to the bootstrap. In the bootstrap NIC, the NIC public IP shows the IP dedicated to the bootstrap (same as IPI on Azure). In the bootstrap VM overview, the public IP is the one for the public LB. Confirmed SSH and gather boostrap: ~~~ $ ./openshift-install gather bootstrap --bootstrap 52.154.233.131 --master "10.0.0.8" INFO Pulling debug logs from the bootstrap machine INFO Bootstrap gather logs captured here "log-bundle-20200219131723.tar.gz" ~~~
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581