Bug 2178614 - "Create Cluster tripleo_cluster" fails if it's the second attempt
Summary: "Create Cluster tripleo_cluster" fails if it's the second attempt
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-pacemaker
Version: 16.2 (Train)
Hardware: x86_64
OS: All
medium
high
Target Milestone: ---
: ---
Assignee: Luca Miccini
QA Contact: Nobody
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-03-15 12:34 UTC by David Hill
Modified: 2023-08-03 15:59 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-23108 0 None None None 2023-03-15 12:36:54 UTC
Red Hat Knowledge Base (Solution) 5413081 0 None None None 2023-03-15 12:48:26 UTC
Red Hat Knowledge Base (Solution) 6324651 0 None None None 2023-03-15 12:49:42 UTC

Description David Hill 2023-03-15 12:34:58 UTC
Description of problem:
"Create Cluster tripleo_cluster" fails if it's the second attempt.  In this customer case (not the first time we see this), the authentication failed for some reasons (MTU size, etc) and then, second deployment fails with :
~~~
<13>Mar 13 14:39:12 puppet-user: Notice: /Stage[main]/Pacemaker::Corosync/Exec[Create Cluster tripleo_cluster]/returns: Error: Hosts 'overcloud-controller-1', 'overcloud-controller-2' are not known to pcs, try to authenticate the hosts using 'pcs host auth overcloud-controller-1 overcloud-controller-2' command
~~~

    Exec <|tag == 'pacemaker-auth'|>
    ->
    exec {"Create Cluster ${cluster_name}":
      creates   => '/etc/cluster/cluster.conf',
      command   => $cluster_setup_cmd,
      timeout   => $cluster_start_timeout,
      tries     => $cluster_start_tries,
      try_sleep => $cluster_start_try_sleep,
      unless    => '/usr/bin/test -f /etc/corosync/corosync.conf',
      require   => Class['pacemaker::install'],
    }
    ->


Version-Release number of selected component (if applicable):
All

How reproducible:
If the first "Create Cluster tripleo_cluster" wasn't executed for some reasons.

Steps to Reproduce:
1. idk exactly what happened but it happened and hacluster password was set, then auth happened (probably) and "Create Cluster tripleo_cluster" didn't complete or wasn't even executed
2. Retry deployment
3.

Actual results:
Fails because pcsd is not authenticated to all hosts

Expected results:
It should authenticate if it's not authenticated

Additional info:
It's not the first time we see this behavior but it's the first time we open a BZ for this.


Note You need to log in before you can comment on or make changes to this bug.