Description of problem:
galera-master fails to start or manually promoted to master if the former master node went offline
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Have a 3-node HA environment running
2. Kill (shutdown abruptly) the node that is the Galera master
Can not promote to master or start Galera-master in other nodes
Should be able to start or promote to master other Galera nodes.
In a Galera cluster, there is no single node that is the "galera master", all nodes are masters. So when one controller is powered off, the remaining nodes just continue running normally. So it's not clear what's actually observed.
We will at least require SOS reports from all three controller nodes in order to begin diagnosing what was seen.
The resource agent could not restart the stopped galera cluster automatically because one of the controller node was offline.
Reading logs from customer ticket, a mistake was done while following the procedure to bootstrap the cluster manually: the pcs command to restart the first galera node has not been executed on the node selected for bootstrap. The resource agent prevented the restart accordingly.
After replaying the manual restart procedure, the cluster went up as expected.
Removing needinfo?, as C#2 was pretty elucidative.