Bug 1805429
Summary: | [osp16 hackfest] Enabling ssh admin (tripleo-admin) for hosts - Timed out waiting for port 22 (mix of VM and BM) | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Chris Janiszewski <cjanisze> |
Component: | python-tripleoclient | Assignee: | Alex Schultz <aschultz> |
Status: | CLOSED ERRATA | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 16.0 (Train) | CC: | aschultz, augol, emacchi, hbrock, jschluet, jslagle, mburns |
Target Milestone: | beta | Keywords: | Triaged |
Target Release: | 16.1 (Train on RHEL 8.2) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | python-tripleoclient-12.3.2-0.20200229004913.6d57d68.el8ost | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-07-29 07:50:34 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Chris Janiszewski
2020-02-20 18:50:15 UTC
The following option can be used to tune this timeout: --overcloud-ssh-port-timeout OVERCLOUD_SSH_PORT_TIMEOUT Timeout for to wait for the ssh port to become active. Thanks for the update. I am glad we have that option. It's probably safe to assume that if I hit this on the relatively quick to boot supermicro boards, our customers will also hit it on more traditional and slower OEM servers. I would say it's not uncommon for the traditional servers that could take ~15 minutes or more to boot. I would highly recommend changing the default to something higher. I want to say we already did raise it to 10 mins, but i'll have to check. https://review.opendev.org/#/c/620754/ so there are two values but we did raise one to 10 mins. Usually we don't get to ssh enable process until several minutes after the systems should already be up/deployed so I'm not sure what specifically happened in this scenario. In our testing it's usually like 10+ minutes before the ssh enable process runs after the nodes should already be up. The one thing that might be different is my environment has a mix of VM and BM. The VM restarts in a matter of seconds where the BM nodes are typically about 5 minutes to boot. Does the timer gets reset after the first node comes up or anything along those lines ? No it's once it reaches the point where it needs to try and do the ssh key bits. The overall timeouts are global to the entire environment. adding --overcloud-ssh-port-timeout 600 \ to my deployment script has fixed this problem for me .. again I have a relatively fast posting hardware .. please consider changing the defaults .. in any case I'd like to leave this BZ as the artifact for others who hit the issue. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3148 |