Bug 1805604

Summary: Instance HA fails on OSP-16
Product: Red Hat OpenStack Reporter: Sadique Puthen <sputhenp>
Component: puppet-pacemakerAssignee: RHOS Maint <rhos-maint>
Status: CLOSED DUPLICATE QA Contact: nlevinki <nlevinki>
Severity: high Docs Contact:
Priority: unspecified    
Version: 16.0 (Train)CC: jjoyce, jschluet, michele, slinaber, tvignaud
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-21 07:05:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sadique Puthen 2020-02-21 06:36:49 UTC
Description of problem:

TripleO/Director deployment with instance-HA fails in OSP-16 with below error.

Feb 20 16:19:47 puppet-user: Notice: /Stage[main]/Tripleo::Certmonger::Ca::Crl/Exec[tripleo-ca-crl]/returns: executed successfully\n<13>Feb 20 16:19:47 puppet-user: Notice: /Stage[main]/Tripleo::Certmonger::Ca::Crl/File[tripleo-ca-crl-file]/seluser: seluser changed 'unconfined_u' to 'system_u'\n<13>Feb 20 16:19:47 puppet-user: Notice: /Stage[main]/Tripleo::Certmonger::Ca::Crl/Exec[tripleo-ca-crl-process-command]: Triggered 'refresh' from 2 events\n<13>Feb 20 16:19:47 puppet-user: Notice: /Stage[main]/Tripleo::Certmonger::Ca::Crl/Cron[tripleo-refresh-crl-file]/ensure: created\n<13>Feb 20 16:19:50 puppet-user: Notice: /Stage[main]/Pacemaker::Stonith/Pacemaker::Property[Disable STONITH]/Pcmk_property[property--stonith-enabled]/ensure: created\n<13>Feb 20 16:24:54 puppet-user: Error: pcs create failed: Error: Unable to communicate with computeiha-1\n<13>Feb 20 16:24:54 puppet-user: Error: /Stage[main]/Tripleo::Profile::Base::Pacemaker/Pacemaker::Resource::Remote[computeiha-1]/Pcmk_remote[computeiha-1]/ensure: change from 'absent' to 'present' failed: pcs create failed: Error: Unable to communicate with computeiha-1\n<13>Feb 20 16:24:54 puppet-user: Notice: /Stage[main]/Tripleo::Profile::Base::Pacemaker/Exec[exec-wait-for-computeiha-1]: Dependency Pcmk_remote[computeiha-1] has failures: true\n<13>Feb 20 16:24:54 puppet-user: Warning: /Stage[main]/Tripleo::Profile::Base::Pacemaker/Exec[exec-wait-for-computeiha-1]: Skipping because of failed dependencies\n<13>Feb 20 16:29:59 puppet-user: Error: pcs create failed: Error: Unable to communicate with computeiha-2\n<13>Feb 20 16:29:59 puppet-user: Error: /Stage[main]/Tripleo::Profile::Base::Pacemaker/Pacemaker::Resource::Remote[computeiha-2]/Pcmk_remote[computeiha-2]/ensure: change from 'absent' to 'present' failed: pcs create failed: Error: Unable to communicate with computeiha-2\n<13>Feb 20 16:29:59 puppet-user: Notice: /Stage[main]/Tripleo::Profile::Base::Pacemaker/Exec[exec-wait-for-computeiha-2]: Dependency Pcmk_remote[computeiha-2] has failures: true\n<13>Feb 20 16:29:59 puppet-user: Warning: /Stage[main]/Tripleo::Profile::Base::Pacemaker/Exec[exec-wait-for-com

Controller node has below error.

Feb 20 16:24:54 controller-1 puppet-user[24152]: Error: pcs create failed: Error: Unable to communicate with computeiha-1
Feb 20 16:24:54 controller-1 puppet-user[24152]: Error: /Stage[main]/Tripleo::Profile::Base::Pacemaker/Pacemaker::Resource::Remote[computeiha-1]/Pcmk_remote[computeiha-1]/ensure: change from 'absent' to 'present' failed: pcs create failed: Error: Unable to communicate with computeiha-1
Feb 20 16:24:54 controller-1 puppet-user[24152]: Notice: /Stage[main]/Tripleo::Profile::Base::Pacemaker/Exec[exec-wait-for-computeiha-1]: Dependency Pcmk_remote[computeiha-1] has failures: true
Feb 20 16:24:54 controller-1 puppet-user[24152]: Warning: /Stage[main]/Tripleo::Profile::Base::Pacemaker/Exec[exec-wait-for-computeiha-1]: Skipping because of failed dependencies

On each compute node, pacemaker remoted has failed to start with below error.

Feb 20 17:15:53 computeiha-1 puppet-user[24880]: Notice: Compiled catalog for computeiha-1.redhat.local in environment production in 0.68 seconds
Feb 20 17:15:53 computeiha-1 puppet-user[24880]: Error: Validation of File_line[pcsd_bind_addr] failed: path is a required attribute (file: /etc/puppet/modules/pacemaker/manifests/remote.pp, line: 89)
Feb 20 17:15:53 computeiha-1 ansible-__main__.py[24876]: Module complete (24876)

Templates are here. https://gitlab.cee.redhat.com/sputhenp/openstack/blob/master/basic/templates/osp-16/instance-ha/overcloud-deploy-tls-everywhere.sh

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Michele Baldessari 2020-02-21 07:05:59 UTC

*** This bug has been marked as a duplicate of bug 1784222 ***