Bug 1829707
Summary: | Missing connection_timeout in deploy_roles | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | midavies |
Component: | openstack-tripleo-common | Assignee: | Steve Baker <sbaker> |
Status: | CLOSED ERRATA | QA Contact: | David Rosenfeld <drosenfe> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 16.1 (Train) | CC: | bfournie, mburns, midavies, sbaker, slinaber, tonyb |
Target Milestone: | beta | Keywords: | Triaged |
Target Release: | 16.1 (Train on RHEL 8.2) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-tripleo-common-11.3.3-0.20200525163439.20973e4.el8ost | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-07-29 07:52:21 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
midavies
2020-04-30 07:21:27 UTC
I would have expected that this fix would be available in the compose being tested as the tripleo-common package built for 16.1 on 4/13 has the fix - https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1164925. I've just done a fresh deploy to verify, I'm still seeing the issue. Here's the environment I used today: RHEL: Red Hat Enterprise Linux release 8.2 (Ootpa) RHOS: Red Hat OpenStack Platform release 16.1 (Train) RHOS Puddle: 16.1-trunk -p RHOS-16.1-RHEL-8-20200428.n.0 Yum Repos: 16.1-trunk ceph-4 ceph-osd-4 rhel-8.2 rpm -qa | grep tripleo python3-tripleoclient-12.3.2-0.20200424033448.b951192.el8ost.noarch openstack-tripleo-puppet-elements-11.2.2-0.20200311084936.a6fef08.el8ost.noarch python3-tripleo-common-11.3.3-0.20200423204446.86569f2.el8ost.noarch ansible-role-tripleo-modify-image-1.1.1-0.20200311081746.bb6f78d.el8ost.noarch ansible-tripleo-ipa-0.1.2-0.20200427103432.f23f480.el8ost.noarch openstack-tripleo-image-elements-10.6.2-0.20200313223428.8c91b46.el8ost.noarch openstack-tripleo-common-11.3.3-0.20200423204446.86569f2.el8ost.noarch ansible-tripleo-ipsec-9.2.1-0.20200311073016.0c8693c.el8ost.noarch openstack-tripleo-common-containers-11.3.3-0.20200423204446.86569f2.el8ost.noarch python3-tripleoclient-heat-installer-12.3.2-0.20200424033448.b951192.el8ost.noarch openstack-tripleo-heat-templates-11.3.2-0.20200428015016.d5442cd.el8ost.noarch puppet-tripleo-11.4.1-0.20200420213421.cae687c.el8ost.noarch openstack-tripleo-validations-11.3.2-0.20200415073428.7b94843.el8ost.noarch And this is the patch I needed to see this work: diff -u baremetal_deploy.yaml.orig baremetal_deploy.yaml --- baremetal_deploy.yaml.orig 2020-05-04 04:08:07.682156095 +0000 +++ baremetal_deploy.yaml 2020-05-04 04:28:49.289445878 +0000 @@ -203,6 +203,7 @@ - ctlplane_network: ctlplane - ssh_keys: [] - ssh_user_name: heat-admin + - connection_timeout: 600 - timeout: 3600 - concurrency: 20 - queue_name: tripleo Followed by: openstack workbook delete tripleo.baremetal_deploy.v1 openstack workbook create /usr/share/openstack-tripleo-common/workbooks/baremetal_deploy.yaml @Bob: The build you pint out isn't tagged as -pending so we're getting: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1176328 @Michael: Can you install the RPMs from the build Bob points out before installing the undercloud to verify that fixes the problem. If it does I assume we can mark it as modified and include the nevra and it will get tagged correctly? Hi Tony - I'm confused as https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1176328 should have the fix too, see the note from April 3: [Train only] Add ssh timeout for baremetal_deploy Actually I think the referenced change[1] caused this problem, since it: - Adds a connection_timeout input to the deploy_instances but that workflow doesn't do anything with that input value - Calls deploy_instances with connection_timeout from the deploy_roles workflow, but deploy_roles is missing a connection_timeout input Also I don't see a need for deploy_roles or deploy_instances to deal with ansible connection timeouts because it doesn't call ansible and doesn't attempt to connect to any remote nodes. I'm going to propose a revert to this change. [1] https://opendev.org/openstack/tripleo-common/commit/3d3afa62dc392236dd3191956ed2bf2f05f3b0e1 Thanks Steve - I misunderstood the fix needed as I misread the original diff. I was suggesting to add in connection_timeout to the input params for deploy_roles, but your solution of removing the connection_timeout if it isn't required is even better. Thanks for seeing through my mistake. I've reviewed the upstream patch (https://review.opendev.org/#/c/725426), and I've verified it in my local environment. It works as expected. Looking forward to this landing in rhos16.1. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3148 |