Bug 1193229 - Make cinder-volume A/P in all circumstances
Summary: Make cinder-volume A/P in all circumstances
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-foreman-installer
Version: 6.0 (Juno)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z1
: Installer
Assignee: Jason Guiditta
QA Contact: Leonid Natapov
URL:
Whiteboard:
Depends On:
Blocks: 1195479
TreeView+ depends on / blocked
 
Reported: 2015-02-16 22:32 UTC by Mike Burns
Modified: 2015-03-05 18:20 UTC (History)
11 users (show)

Fixed In Version: openstack-foreman-installer-3.0.16-1.el7ost
Doc Type: Bug Fix
Doc Text:
An issue with cinder-volume service for certain operations run in Active/Active mode caused possible data corruption in cinder-volumes. This fix makes the service run in Active/Passive mode, which stops the data corruption.
Clone Of:
: 1195479 (view as bug list)
Environment:
Last Closed: 2015-03-05 18:20:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0641 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform Installer Bug Fix Advisory 2015-03-05 23:15:51 UTC

Description Mike Burns 2015-02-16 22:32:44 UTC
Description of problem:
There are significant concerns with cinder in an A/A environment.  The RHEL-OSP 6 HA Ref Arch has been updated to make it A/P now.  This needs to be reflected in the puppet code as well.

Comment 5 Mark McLoughlin 2015-02-17 15:25:57 UTC
Corrected the title to reflect my understanding that this is *only* about cinder-volume

Both cinder-api and cinder-scheduler should run A/A

Comment 6 Jason Guiditta 2015-02-17 15:38:43 UTC
Testing patch:
https://github.com/redhat-openstack/astapor/pull/480

Comment 7 Jason Guiditta 2015-02-17 15:39:12 UTC
(In reply to Mark McLoughlin from comment #5)
> Corrected the title to reflect my understanding that this is *only* about
> cinder-volume
> 
> Both cinder-api and cinder-scheduler should run A/A

The ref arch doc shows all cinder services now as A/P

Comment 8 Mark McLoughlin 2015-02-17 15:42:04 UTC
Ok, based on further discussion, it appears the proposal is to change all services to A/P

I think the issue with cinder-volume running as A/A is easy - it's not recommended upstream, it's not intended to run multiple cinder-volume services with the same host= setting [citation needed]

The issue with cinder-api and cinder-scheduler is different - they are intended to be run A/A but apparently we suspect some race conditions. We need to track those race conditions as individual bugs and set ourselves the goal of running these services A/A again ASAP

Comment 9 Jon Bernard 2015-02-17 17:48:38 UTC
To summarize my findings, there it at least a race condition in the volume api when volume status is queried and updated, an example can be found in volume-extend.  If two cinder-volume instances receive operations for the same volume, the status update will race and leave the database (and cinder's general understanding of current volume state) in an inconsistent state.

I haven't yet gone through all of the state management code to determine and number and severity of all existing races, but I do expect others to exist as the first one was quite easy to find.

In addition, it is the responsibility of the driver authors to implement the driver in a process-safe way.  To my knowledge, the drivers that we support do this correctly, but I need to verify myself to be most confident.

I expect A/A cinder-volume to behave incorrectly for certain volume operations until we and the community address the issues in the current code base.  A more comprehensive analysis of the code with a focus on HA behavior is needed to better characterize the problems that exist, their severity, and estimated effort to correct them.

Comment 10 Jon Bernard 2015-02-17 17:58:56 UTC
And to be clear, the status update race exists in volume/api.py, which is the internal API for cinder-volume.  I haven't yet found anything to suggest that cinder-api or cinder-scheduler are faulty - although to be fair, I should look closer.

Comment 11 Jason Guiditta 2015-02-17 19:07:35 UTC
I have been asked to make this a cinder-volume only change:
https://github.com/redhat-openstack/astapor/pull/481

Comment 12 Jason Guiditta 2015-02-17 19:34:56 UTC
Merged

Comment 14 Leonid Natapov 2015-02-18 11:19:39 UTC
tested with openstack-foreman-installer-3.0.16-1.el7ost

Cinder volume runs in A/P mode.


cinder-volume (systemd:openstack-cinder-volume): Started pcmk-mac848f69fbc49f

cinder-scheduler and cinder-api are A/A

Comment 17 errata-xmlrpc 2015-03-05 18:20:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0641.html


Note You need to log in before you can comment on or make changes to this bug.