Hide Forgot
Description of problem: Overcloud deploy fails while trying to add a messaging node in a composable environment. On the controller node the failure seems related to the cluster setup command: Error: /sbin/pcs cluster setup --wait --name tripleo_cluster controller-0 controller-1 controller-2 messaging-0 messaging-1 messaging-2 galera-0 galera-1 galera-2 --token 10000 returned 1 instead of one of [0] Error: /Stage[main]/Pacemaker::Corosync/Exec[Create Cluster tripleo_cluster]/returns: change from notrun to 0 failed: /sbin/pcs cluster setup --wait --name tripleo_cluster controller-0 controller-1 controller-2 messaging-0 messaging-1 messaging-2 galera-0 galera-1 galera-2 --token 10000 returned 1 instead of one of [0] Looking at the nodes (sosreport will be attached) error is related to the messaging-1 node: Error: /sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1 returned 1 instead of one of [0] Error: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns: change from notrun to 0 failed: /sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1 returned 1 instead of one of [0] The pcsd call that fails is specifically this one: ::ffff:172.17.0.24 - - [17/May/2017:01:46:41 +0000] "GET /remote/cluster_destroy HTTP/1.1" 401 24 0.0181 this gives 401 and the entire deployment to fail. Version-Release number of selected component (if applicable): The tested puddle is 2017-05-09.2 How reproducible: It's a race, so no specific tests are needed, just some continuous deployment on the same env. Actual results: Deploy fails. Expected results: Deploy succeed. Additional info: This race happened 1 time in a string of 20 consecutive deployments, so can be considered "rare".
Here the sosreports: http://file.rdu.redhat.com/~rscarazz/BZ1451842/
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462