Description of problem: If you want to install an unusual setup via OSPd, having just 2 controller nodes, even if the process starts it fails. Version-Release number of selected component (if applicable): openstack-heat-templates-0-0.6.20150605git.el7ost.noarch How reproducible: 1. Prepare the OSPd environment as usual; 2. Start the overcloud deploy by passing a control-scale of 2: openstack overcloud deploy --templates --control-scale 2 --control-flavor vm --compute-scale 2 --compute-flavor baremetal --ntp-server 10.16.255.1 3. Wait a little bit, then it fails; Actual results: The error is not specific, it's just a simple HEAT FAILED. Expected results: Success. Additional info: Investigating on the problem reveals that it's a matter of quorum. At some time the setup do this command: /usr/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1 (see openstack-puppet/modules/pacemaker/manifests/corosync.pp) and since the cluster is composed by just two nodes there is NO quorum. This might take us to one of these choices: 1 - Prohibit a two controllers setup; 2 - Do not check the quorum on the cluster, setting also a property like no-quorum-policy to "ignore";
pcs will (now?) set the relevant "two_node" options in corosync.conf, so i would expect the cluster to have quorum if both nodes can see each other. come to think of it, even without those options the cluster should have quorum if both node can see each other.
Raoul: did you ever save the logs from a failed installation? I'd be quite keen to look at them
No, I did not take any log at the time I got the problem. Now it does not happen anymore so unfortunately I cannot give you anything. If it will happen again I'll surely do.
What if we try installing with older cluster and/or director versions?
It would be a huge job, since even if we have the openstack-heat-templates version, we don't know what puppet modules were installed and neither the status of the RHEL at that time (so the pcs version and so on).
Sounds like we should close this then
Agreed, let's close this one out. We've been deploying this two-node scenario many times in the last few days and it always worked.