Bug 2178614
| Summary: | "Create Cluster tripleo_cluster" fails if it's the second attempt | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | David Hill <dhill> |
| Component: | puppet-pacemaker | Assignee: | Luca Miccini <lmiccini> |
| Status: | CLOSED WONTFIX | QA Contact: | Nobody <nobody> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 16.2 (Train) | CC: | jjoyce, jmarcian, jschluet, lmiccini, slinaber, tvignaud |
| Target Milestone: | --- | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-04-26 12:09:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
We won't be able to fix it as the risk of rewriting parts of the puppet manifest is too high this far into the product lifecycle. Our advice is to troubleshoot the failure and re-trigger the deployment (in case of FFU) or to delete the overcloud and start from a clean state if it is a new deployment. The upcoming RHOSP18 release will not make use of pacemaker preventing this issue altogether. |
Description of problem: "Create Cluster tripleo_cluster" fails if it's the second attempt. In this customer case (not the first time we see this), the authentication failed for some reasons (MTU size, etc) and then, second deployment fails with : ~~~ <13>Mar 13 14:39:12 puppet-user: Notice: /Stage[main]/Pacemaker::Corosync/Exec[Create Cluster tripleo_cluster]/returns: Error: Hosts 'overcloud-controller-1', 'overcloud-controller-2' are not known to pcs, try to authenticate the hosts using 'pcs host auth overcloud-controller-1 overcloud-controller-2' command ~~~ Exec <|tag == 'pacemaker-auth'|> -> exec {"Create Cluster ${cluster_name}": creates => '/etc/cluster/cluster.conf', command => $cluster_setup_cmd, timeout => $cluster_start_timeout, tries => $cluster_start_tries, try_sleep => $cluster_start_try_sleep, unless => '/usr/bin/test -f /etc/corosync/corosync.conf', require => Class['pacemaker::install'], } -> Version-Release number of selected component (if applicable): All How reproducible: If the first "Create Cluster tripleo_cluster" wasn't executed for some reasons. Steps to Reproduce: 1. idk exactly what happened but it happened and hacluster password was set, then auth happened (probably) and "Create Cluster tripleo_cluster" didn't complete or wasn't even executed 2. Retry deployment 3. Actual results: Fails because pcsd is not authenticated to all hosts Expected results: It should authenticate if it's not authenticated Additional info: It's not the first time we see this behavior but it's the first time we open a BZ for this.