1016224 – cinder: create volume stuck in creating even though scheduler reports failure

Bug 1016224 - cinder: create volume stuck in creating even though scheduler reports failure

Summary: cinder: create volume stuck in creating even though scheduler reports failure

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-cinder
Sub Component:
Version:	4.0
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	z2
Target Release:	4.0
Assignee:	Flavio Percoco
QA Contact:	Dafna Ron
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1051605 (view as bug list)
Depends On:
Blocks:	977865 1016216 1035891 1051605 1051606 1066955
TreeView+	depends on / blocked

Reported:	2013-10-07 17:58 UTC by Dafna Ron
Modified:	2019-09-09 13:22 UTC (History)
CC List:	10 users (show)
Fixed In Version:	openstack-cinder-2013.2.1-6.el6ost
Doc Type:	Bug Fix
Doc Text:	Prior to this update, the driver initialization check was done prior to calling the method responsible for processing the RPC request. Consequently, volumes would enter an inconsistent state ('creating' instead of 'error'), which resulted in unusable and stuck volumes. This issue has been resolved with this update, and the driver check has been moved into the method itself.
Clone Of:
Clones:	1051605 1066955 (view as bug list)
Environment:
Last Closed:	2014-03-04 20:12:46 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
logs (14.49 MB, application/x-gzip) 2013-10-07 17:58 UTC, Dafna Ron	no flags	Details
logs (368.25 KB, application/x-gzip) 2014-02-18 18:00 UTC, Dafna Ron	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1053931	None	None	None	Never
Launchpad	1211839	None	None	None	Never
Launchpad	1242942	None	None	None	Never
OpenStack gerrit	61088	None	MERGED	Move driver initialization check into the method	2020-12-09 10:18:19 UTC
OpenStack gerrit	67097	None	MERGED	Move driver initialization check into the method	2020-12-09 10:17:51 UTC
Red Hat Product Errata	RHBA-2014:0213	normal	SHIPPED_LIVE	Red Hat Enterprise Linux OpenStack Platform 4 Bug Fix and Enhancement Advisory	2014-03-05 01:11:55 UTC

Description Dafna Ron 2013-10-07 17:58:04 UTC

Created attachment 808972 [details]
logs

Description of problem:

This was opened in folsom and fixed. 
its a regression to Havana. 
This happened when I failed to mount my gluster share due to issues on the gluster server and tried to create a volume.

Version-Release number of selected component (if applicable):

[root@cougar06 ~(keystone_admin)]# rpm -qa |grep cinder
python-cinder-2013.2-0.9.b3.el6ost.noarch
python-cinderclient-1.0.5-1.el6ost.noarch
openstack-cinder-2013.2-0.9.b3.el6ost.noarch
[root@cougar06 ~(keystone_admin)]# rpm -qa |grep nova
openstack-nova-scheduler-2013.2-0.24.rc1.el6ost.noarch
openstack-nova-novncproxy-2013.2-0.24.rc1.el6ost.noarch
openstack-nova-common-2013.2-0.24.rc1.el6ost.noarch
python-novaclient-2.15.0-1.el6ost.noarch
python-nova-2013.2-0.24.rc1.el6ost.noarch
openstack-nova-console-2013.2-0.24.rc1.el6ost.noarch
openstack-nova-conductor-2013.2-0.24.rc1.el6ost.noarch
openstack-nova-cert-2013.2-0.24.rc1.el6ost.noarch
openstack-nova-compute-2013.2-0.24.rc1.el6ost.noarch
openstack-nova-api-2013.2-0.24.rc1.el6ost.noarch
openstack-nova-network-2013.2-0.24.rc1.el6ost.noarch

How reproducible:


Steps to Reproduce:
1. configure cinder to work with gluster
2. hard shut down gluster
3. send a command to create a volume 
4. start gluster again

Actual results:

scheduler reports a problem but the volume is stuck in creating 

Expected results:

if scheduler reports a problem move creating to error

Additional info: logs

[root@cougar06 ~(keystone_admin)]# cinder list 
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
|                  ID                  |   Status  | Display Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
| 1560fa00-752b-4d7b-a747-3ef9bf483692 | available |     new      |  1   |     None    |   True   |             |
| 22c3e84c-1d9b-4a45-9244-06b3ab6c401a |  creating |     bla      |  10  |     None    |  False   |             |
| aadc9c04-17ab-42c4-8bce-c2f63cd287fa | available |  image_new   |  1   |     None    |   True   |             |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+


https://bugs.launchpad.net/nova/+bug/1053931

Comment 1 Xavier Queralt 2013-10-11 12:23:38 UTC

This belongs to cinder as nova-volume doesn't exist any more.

I've been able to reproduce the issue with latest cinder without the need of using glusterfs:

1. kill cinder-volume service: service openstack-cinder-volume stop
2. create a volume before : cinder create 1
3. check the list of volumes: cinder list

The volume will be stuck in the creating state forever and cannot be deleted.

Comment 2 Eric Harney 2013-10-11 15:27:45 UTC

(In reply to Dafna Ron from comment #0)
> This was opened in folsom and fixed. 
> its a regression to Havana. 

What was the behavior in Folsom?

It should be possible to delete the volume w/ cinder force-delete.

Comment 3 Dafna Ron 2013-10-11 16:18:10 UTC

this is the original bug from launchpad: 

https://bugs.launchpad.net/nova/+bug/1053931

according to the bug, if the creation fails in the scheduler the status should change to error allowing the user to delete it.

Comment 7 Flavio Percoco 2013-12-03 14:47:06 UTC

The issue here seems to be the Volume manager needing an initialized driver. Since it was not configured correctly, the driver was disabled on boot. Since this is verified by a python decorator *before* getting into the actual manager method, the failure is not being processed correctly.

https://git.openstack.org/cgit/openstack/cinder/tree/cinder/volume/manager.py#n236

Comment 8 Flavio Percoco 2014-01-15 12:40:18 UTC

*** Bug 1051605 has been marked as a duplicate of this bug. ***

Comment 9 Flavio Percoco 2014-01-16 10:31:39 UTC

Patch backported upstream to stable/havana

Comment 10 Flavio Percoco 2014-01-23 09:31:12 UTC

Moving to high since this is quite frustrating from users and leaves the volumes in inconsistent status.

Comment 13 Dafna Ron 2014-02-18 17:58:07 UTC

I think that we have a race in the fix. 
I tested in several ways (iptables, unmount and service stop). 

I found a problem when I have a cinder stand alone, I stopped the service in the stand alone and then right away created a volume in the controller 
it seems that because there is a delay in the update of the scheduler the command is sent and get stuck in the scheduler. 
if we wait a minute and than send the create volume from the controller it will fail right away... 
so this seems to be a race but I can still reproduce the bug. 

I think that the bug severity can be lowered because
1. this is now a race
2. we can use the state-reset in cinder to change the status of the volume to available and than delete it. 

moving back to dev.

to reproduce: 

semi distributed setup: 
1. stand alone cinder
2. stand alone glance
3. nova network controller (all other components on this server)
4. one more compute. 

gluster remote storage configured for cinder

1. on the cinder stand alone host run: '/etc/init.d/openstack-cinder-volume stop'
2. quickly after step 1 (within seconds) run 'cinder create --display-name volume 10'

results: cinder volume is stuck in creating. 

[root@orange-vdsf ~(keystone_admin)]# /etc/init.d/openstack-cinder-volume stop
Stopping openstack-cinder-volume:                          [  OK  ]
 
[root@puma31 ~(keystone_admin)]# cinder create --display-name remote-volume-create 10
+---------------------+--------------------------------------+
|       Property      |                Value                 |
+---------------------+--------------------------------------+
|     attachments     |                  []                  |
|  availability_zone  |                 nova                 |
|       bootable      |                false                 |
|      created_at     |      2014-02-18T17:45:12.668313      |
| display_description |                 None                 |
|     display_name    |         remote-volume-create         |
|          id         | cec31979-20dd-4d81-b989-a875c26d22cd |
|       metadata      |                  {}                  |
|         size        |                  10                  |
|     snapshot_id     |                 None                 |
|     source_volid    |                 None                 |
|        status       |               creating               |
|     volume_type     |                 None                 |
+---------------------+--------------------------------------+
[root@puma31 ~(keystone_admin)]# cinder create --display-name remote-volume-create1 10
+---------------------+--------------------------------------+
|       Property      |                Value                 |
+---------------------+--------------------------------------+
|     attachments     |                  []                  |
|  availability_zone  |                 nova                 |
|       bootable      |                false                 |
|      created_at     |      2014-02-18T17:46:30.287351      |
| display_description |                 None                 |
|     display_name    |        remote-volume-create1         |
|          id         | 01a710f4-fe8f-462b-b2c6-74f539ccb1aa |
|       metadata      |                  {}                  |
|         size        |                  10                  |
|     snapshot_id     |                 None                 |
|     source_volid    |                 None                 |
|        status       |               creating               |
|     volume_type     |                 None                 |
+---------------------+--------------------------------------+


As you can see, the first volume is stuck in creating while the second one I created will move to error right away - race? 

[root@puma31 ~(keystone_admin)]# cinder list 
+--------------------------------------+-----------+-----------------------+------+-------------+----------+-------------+
|                  ID                  |   Status  |      Display Name     | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+-----------------------+------+-------------+----------+-------------+
| 01a710f4-fe8f-462b-b2c6-74f539ccb1aa |   error   | remote-volume-create1 |  10  |     None    |  false   |             |
| 9681b5da-975b-4cfe-af5a-a418b19fec81 | available |   copy-of-dafna-vol   |  12  |     None    |  false   |             |
| cec31979-20dd-4d81-b989-a875c26d22cd |  creating |  remote-volume-create |  10  |     None    |  false   |             |
+--------------------------------------+-----------+-----------------------+------+-------------+----------+-------------+
[root@puma31 ~(keystone_admin)]# 

logs will be attached.

Comment 14 Dafna Ron 2014-02-18 18:00:56 UTC

Created attachment 864672 [details]
logs

Comment 15 Flavio Percoco 2014-02-19 11:21:19 UTC

This sounds like a separate issue. Although it is true that this race can happen, it's quite unlikely and not completely related to the original issue.

I cloned the bug #1066955 and I'll move this one to ON_QA again. 

Thanks a lot Dafna!

Comment 16 Dafna Ron 2014-02-19 11:39:25 UTC

moving to verified as agreed with Flavio since the issue is currently reproduced as a race only

Comment 18 errata-xmlrpc 2014-03-04 20:12:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0213.html

Note You need to log in before you can comment on or make changes to this bug.