1193229 – Make cinder-volume A/P in all circumstances

Bug 1193229 - Make cinder-volume A/P in all circumstances

Summary: Make cinder-volume A/P in all circumstances

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-foreman-installer
Sub Component:
Version:	6.0 (Juno)
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	z1
Target Release:	Installer
Assignee:	Jason Guiditta
QA Contact:	Leonid Natapov
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1195479
TreeView+	depends on / blocked

Reported:	2015-02-16 22:32 UTC by Mike Burns
Modified:	2015-03-05 18:20 UTC (History)
CC List:	11 users (show)
Fixed In Version:	openstack-foreman-installer-3.0.16-1.el7ost
Doc Type:	Bug Fix
Doc Text:	An issue with cinder-volume service for certain operations run in Active/Active mode caused possible data corruption in cinder-volumes. This fix makes the service run in Active/Passive mode, which stops the data corruption.
Clone Of:
Clones:	1195479 (view as bug list)
Environment:
Last Closed:	2015-03-05 18:20:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2015:0641	0	normal	SHIPPED_LIVE	Red Hat Enterprise Linux OpenStack Platform Installer Bug Fix Advisory	2015-03-05 23:15:51 UTC

Description Mike Burns 2015-02-16 22:32:44 UTC

Description of problem:
There are significant concerns with cinder in an A/A environment.  The RHEL-OSP 6 HA Ref Arch has been updated to make it A/P now.  This needs to be reflected in the puppet code as well.

Comment 5 Mark McLoughlin 2015-02-17 15:25:57 UTC

Corrected the title to reflect my understanding that this is *only* about cinder-volume

Both cinder-api and cinder-scheduler should run A/A

Comment 6 Jason Guiditta 2015-02-17 15:38:43 UTC

Testing patch:
https://github.com/redhat-openstack/astapor/pull/480

Comment 7 Jason Guiditta 2015-02-17 15:39:12 UTC

(In reply to Mark McLoughlin from comment #5)
> Corrected the title to reflect my understanding that this is *only* about
> cinder-volume
> 
> Both cinder-api and cinder-scheduler should run A/A

The ref arch doc shows all cinder services now as A/P

Comment 8 Mark McLoughlin 2015-02-17 15:42:04 UTC

Ok, based on further discussion, it appears the proposal is to change all services to A/P

I think the issue with cinder-volume running as A/A is easy - it's not recommended upstream, it's not intended to run multiple cinder-volume services with the same host= setting [citation needed]

The issue with cinder-api and cinder-scheduler is different - they are intended to be run A/A but apparently we suspect some race conditions. We need to track those race conditions as individual bugs and set ourselves the goal of running these services A/A again ASAP

Comment 9 Jon Bernard 2015-02-17 17:48:38 UTC

To summarize my findings, there it at least a race condition in the volume api when volume status is queried and updated, an example can be found in volume-extend.  If two cinder-volume instances receive operations for the same volume, the status update will race and leave the database (and cinder's general understanding of current volume state) in an inconsistent state.

I haven't yet gone through all of the state management code to determine and number and severity of all existing races, but I do expect others to exist as the first one was quite easy to find.

In addition, it is the responsibility of the driver authors to implement the driver in a process-safe way.  To my knowledge, the drivers that we support do this correctly, but I need to verify myself to be most confident.

I expect A/A cinder-volume to behave incorrectly for certain volume operations until we and the community address the issues in the current code base.  A more comprehensive analysis of the code with a focus on HA behavior is needed to better characterize the problems that exist, their severity, and estimated effort to correct them.

Comment 10 Jon Bernard 2015-02-17 17:58:56 UTC

And to be clear, the status update race exists in volume/api.py, which is the internal API for cinder-volume.  I haven't yet found anything to suggest that cinder-api or cinder-scheduler are faulty - although to be fair, I should look closer.

Comment 11 Jason Guiditta 2015-02-17 19:07:35 UTC

I have been asked to make this a cinder-volume only change:
https://github.com/redhat-openstack/astapor/pull/481

Comment 12 Jason Guiditta 2015-02-17 19:34:56 UTC

Merged

Comment 14 Leonid Natapov 2015-02-18 11:19:39 UTC

tested with openstack-foreman-installer-3.0.16-1.el7ost

Cinder volume runs in A/P mode.


cinder-volume (systemd:openstack-cinder-volume): Started pcmk-mac848f69fbc49f

cinder-scheduler and cinder-api are A/A

Comment 17 errata-xmlrpc 2015-03-05 18:20:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0641.html

Note You need to log in before you can comment on or make changes to this bug.