Bug 2178614

Summary: "Create Cluster tripleo_cluster" fails if it's the second attempt
Product: Red Hat OpenStack Reporter: David Hill <dhill>
Component: puppet-pacemakerAssignee: Luca Miccini <lmiccini>
Status: NEW --- QA Contact: Nobody <nobody>
Severity: high Docs Contact:
Priority: medium    
Version: 16.2 (Train)CC: jjoyce, jmarcian, jschluet, lmiccini, slinaber, tvignaud
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Hill 2023-03-15 12:34:58 UTC
Description of problem:
"Create Cluster tripleo_cluster" fails if it's the second attempt.  In this customer case (not the first time we see this), the authentication failed for some reasons (MTU size, etc) and then, second deployment fails with :
~~~
<13>Mar 13 14:39:12 puppet-user: Notice: /Stage[main]/Pacemaker::Corosync/Exec[Create Cluster tripleo_cluster]/returns: Error: Hosts 'overcloud-controller-1', 'overcloud-controller-2' are not known to pcs, try to authenticate the hosts using 'pcs host auth overcloud-controller-1 overcloud-controller-2' command
~~~

    Exec <|tag == 'pacemaker-auth'|>
    ->
    exec {"Create Cluster ${cluster_name}":
      creates   => '/etc/cluster/cluster.conf',
      command   => $cluster_setup_cmd,
      timeout   => $cluster_start_timeout,
      tries     => $cluster_start_tries,
      try_sleep => $cluster_start_try_sleep,
      unless    => '/usr/bin/test -f /etc/corosync/corosync.conf',
      require   => Class['pacemaker::install'],
    }
    ->


Version-Release number of selected component (if applicable):
All

How reproducible:
If the first "Create Cluster tripleo_cluster" wasn't executed for some reasons.

Steps to Reproduce:
1. idk exactly what happened but it happened and hacluster password was set, then auth happened (probably) and "Create Cluster tripleo_cluster" didn't complete or wasn't even executed
2. Retry deployment
3.

Actual results:
Fails because pcsd is not authenticated to all hosts

Expected results:
It should authenticate if it's not authenticated

Additional info:
It's not the first time we see this behavior but it's the first time we open a BZ for this.