Bug 2101422

Summary: OSP17 second overcloud deploy fails due to pacemaker authentication failing
Product: Red Hat OpenStack Reporter: David Rosenfeld <drosenfe>
Component: openstack-tripleo-heat-templatesAssignee: Rabi Mishra <ramishra>
Status: CLOSED ERRATA QA Contact: Joe H. Rahme <jhakimra>
Severity: high Docs Contact:
Priority: high    
Version: 17.0 (Wallaby)CC: cjeanner, jschluet, mburns, pweeks, ramishra, slinaber
Target Milestone: betaKeywords: Triaged
Target Release: 17.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-14.3.1-0.20220701162329.dd13d73.el9ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-21 12:23:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Rosenfeld 2022-06-27 12:48:54 UTC
Description of problem: A multi-overcloud deployment is attempted. The first overcloud deploy successfully. The second overcloud fails deploy with this seen in logs:

Jun 27 08:21:33 overcloud2-controller-0 sudo[21883]: pam_unix(sudo:session): session opened for user root(uid=0) by heat-admin(uid=1000)
Jun 27 08:21:37 overcloud2-controller-0 puppet-user[20499]: Debug: /Stage[main]/Pacemaker::Corosync/Exec[reauthenticate-across-all-nodes]/returns: Exec try 42/360
Jun 27 08:21:37 overcloud2-controller-0 puppet-user[20499]: Debug: Exec[reauthenticate-across-all-nodes](provider=posix): Executing '/sbin/pcs host auth overcloud2-controller-0 -u hacluster -p Rywi8S0IDzBC97CM'
Jun 27 08:21:37 overcloud2-controller-0 puppet-user[20499]: Debug: Executing: '/sbin/pcs host auth overcloud2-controller-0 -u hacluster -p Rywi8S0IDzBC97CM'
Jun 27 08:21:37 overcloud2-controller-0 ansible-async_wrapper.py[20478]: 20479 still running (3135)

We need this fix: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/847758

Version-Release number of selected component (if applicable): RHOS-17.0-RHEL-9-20220623.n.1


How reproducible: Every time


Steps to Reproduce:
1. Execute multi-overcloud deploy job: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/df/view/deployment/job/DFG-df-deployment-17.0-virthost-2cont_2comp-lvm-ipv4-multiple-overcloud/
2.
3.

Actual results: Second overcloud deploy fails with error shown above.


Expected results: Second overcloud deploys successfully.


Additional info:

Comment 4 David Rosenfeld 2022-07-11 12:32:44 UTC
During Phase 3 execution of  RHOS-17.0-RHEL-9-20220708.n.1 the second overcloud in the multi-overcloud job: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/df/view/deployment/job/DFG-df-deployment-17.0-virthost-2cont_2comp-lvm-ipv4-multiple-overcloud/  deployed successfully.

Comment 9 errata-xmlrpc 2022-09-21 12:23:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543