This Bugzilla is being used to track details regarding a feature request for full end-to-end high-availability for Red Hat OpenStack Platform guests.
Note, the current approach we're considering for Compute Node HA and guest/VM HA is roughly described here:
*** This bug has been marked as a duplicate of bug 1185030 ***
I believe this was closed as a duplicate, though the requests are not quite the same. The work in bug 1185030 is about failures of compute nodes themselves, while this request is more about responding to failures of applications in guests.
The hurdle for HA inside the guest is always people's willingness to have another daemon running there. *cough* matahari *cough*
pacemaker-remoted can certainly fill this role however this option is currently incompatible with using pacemaker-remoted for the compute-node (nested pacemaker-remoted is not currently on the roadmap). There is also the issue of where the guest's HA configuration should live.
- guest HA was limited to services inside a single guest (no cross-talk or co-ordination required between guests)
- the guests' HA configuration lived inside the guest (in a yet to be determined form)
If this is the case, we could use pacemaker-remoted inside the guest in stand-alone mode (no communication to the outside world).
1. We implement the method for scaling corosync as discussed pre-DevConf this year.
This would allow the compute nodes to be full members of the cluster and pacemaker-remoted to be used for guests.
- Drawback, the configuration will get quite large as it will contain all compute nodes and all HA service configuration for guests.
2. We allow crosstalk between guests on a single compute node by leaving each as a single-node cluster and including the guest service configuration in the compute node's cluster.
- Dubious benefit here, failure of the compute node would take down the entire virtual cluster.
3. For pure monitoring of guest services, we could make use of nagios agents (which normally run from outside the target host). This is already supported.
Bulk update to reflect scope of Red Hat OpenStack Platform 9 and Red Hat OpenStack Platform does not include this issue (No pm_ack+).
Bumping to 7.6 for now but this may become a priority for OSP13 depending on how the Ceph folks implement their per-pool NFS servers.
*** Bug 1908099 has been marked as a duplicate of this bug. ***
Closing this CURRENTRELEASE as this is now supported as of 8.7