Description of problem: There are significant concerns with cinder in an A/A environment. The RHEL-OSP 6 HA Ref Arch has been updated to make it A/P now. This needs to be reflected in the puppet code as well.
Corrected the title to reflect my understanding that this is *only* about cinder-volume Both cinder-api and cinder-scheduler should run A/A
Testing patch: https://github.com/redhat-openstack/astapor/pull/480
(In reply to Mark McLoughlin from comment #5) > Corrected the title to reflect my understanding that this is *only* about > cinder-volume > > Both cinder-api and cinder-scheduler should run A/A The ref arch doc shows all cinder services now as A/P
Ok, based on further discussion, it appears the proposal is to change all services to A/P I think the issue with cinder-volume running as A/A is easy - it's not recommended upstream, it's not intended to run multiple cinder-volume services with the same host= setting [citation needed] The issue with cinder-api and cinder-scheduler is different - they are intended to be run A/A but apparently we suspect some race conditions. We need to track those race conditions as individual bugs and set ourselves the goal of running these services A/A again ASAP
To summarize my findings, there it at least a race condition in the volume api when volume status is queried and updated, an example can be found in volume-extend. If two cinder-volume instances receive operations for the same volume, the status update will race and leave the database (and cinder's general understanding of current volume state) in an inconsistent state. I haven't yet gone through all of the state management code to determine and number and severity of all existing races, but I do expect others to exist as the first one was quite easy to find. In addition, it is the responsibility of the driver authors to implement the driver in a process-safe way. To my knowledge, the drivers that we support do this correctly, but I need to verify myself to be most confident. I expect A/A cinder-volume to behave incorrectly for certain volume operations until we and the community address the issues in the current code base. A more comprehensive analysis of the code with a focus on HA behavior is needed to better characterize the problems that exist, their severity, and estimated effort to correct them.
And to be clear, the status update race exists in volume/api.py, which is the internal API for cinder-volume. I haven't yet found anything to suggest that cinder-api or cinder-scheduler are faulty - although to be fair, I should look closer.
I have been asked to make this a cinder-volume only change: https://github.com/redhat-openstack/astapor/pull/481
Merged
tested with openstack-foreman-installer-3.0.16-1.el7ost Cinder volume runs in A/P mode. cinder-volume (systemd:openstack-cinder-volume): Started pcmk-mac848f69fbc49f cinder-scheduler and cinder-api are A/A
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0641.html