Bug 1016224 - cinder: create volume stuck in creating even though scheduler reports failure
cinder: create volume stuck in creating even though scheduler reports failure
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-cinder (Show other bugs)
4.0
x86_64 Linux
urgent Severity high
: z2
: 4.0
Assigned To: Flavio Percoco
Dafna Ron
: ZStream
: 1051605 (view as bug list)
Depends On:
Blocks: 977865 1016216 1035891 1051605 1051606 1066955
  Show dependency treegraph
 
Reported: 2013-10-07 13:58 EDT by Dafna Ron
Modified: 2016-04-26 11:51 EDT (History)
11 users (show)

See Also:
Fixed In Version: openstack-cinder-2013.2.1-6.el6ost
Doc Type: Bug Fix
Doc Text:
Prior to this update, the driver initialization check was done prior to calling the method responsible for processing the RPC request. Consequently, volumes would enter an inconsistent state ('creating' instead of 'error'), which resulted in unusable and stuck volumes. This issue has been resolved with this update, and the driver check has been moved into the method itself.
Story Points: ---
Clone Of:
: 1051605 1066955 (view as bug list)
Environment:
Last Closed: 2014-03-04 15:12:46 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
logs (14.49 MB, application/x-gzip)
2013-10-07 13:58 EDT, Dafna Ron
no flags Details
logs (368.25 KB, application/x-gzip)
2014-02-18 13:00 EST, Dafna Ron
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1053931 None None None Never
Launchpad 1211839 None None None Never
Launchpad 1242942 None None None Never
OpenStack gerrit 61088 None None None Never
OpenStack gerrit 67097 None None None Never

  None (edit)
Description Dafna Ron 2013-10-07 13:58:04 EDT
Created attachment 808972 [details]
logs

Description of problem:

This was opened in folsom and fixed. 
its a regression to Havana. 
This happened when I failed to mount my gluster share due to issues on the gluster server and tried to create a volume.

Version-Release number of selected component (if applicable):

[root@cougar06 ~(keystone_admin)]# rpm -qa |grep cinder
python-cinder-2013.2-0.9.b3.el6ost.noarch
python-cinderclient-1.0.5-1.el6ost.noarch
openstack-cinder-2013.2-0.9.b3.el6ost.noarch
[root@cougar06 ~(keystone_admin)]# rpm -qa |grep nova
openstack-nova-scheduler-2013.2-0.24.rc1.el6ost.noarch
openstack-nova-novncproxy-2013.2-0.24.rc1.el6ost.noarch
openstack-nova-common-2013.2-0.24.rc1.el6ost.noarch
python-novaclient-2.15.0-1.el6ost.noarch
python-nova-2013.2-0.24.rc1.el6ost.noarch
openstack-nova-console-2013.2-0.24.rc1.el6ost.noarch
openstack-nova-conductor-2013.2-0.24.rc1.el6ost.noarch
openstack-nova-cert-2013.2-0.24.rc1.el6ost.noarch
openstack-nova-compute-2013.2-0.24.rc1.el6ost.noarch
openstack-nova-api-2013.2-0.24.rc1.el6ost.noarch
openstack-nova-network-2013.2-0.24.rc1.el6ost.noarch

How reproducible:


Steps to Reproduce:
1. configure cinder to work with gluster
2. hard shut down gluster
3. send a command to create a volume 
4. start gluster again

Actual results:

scheduler reports a problem but the volume is stuck in creating 

Expected results:

if scheduler reports a problem move creating to error

Additional info: logs

[root@cougar06 ~(keystone_admin)]# cinder list 
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
|                  ID                  |   Status  | Display Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
| 1560fa00-752b-4d7b-a747-3ef9bf483692 | available |     new      |  1   |     None    |   True   |             |
| 22c3e84c-1d9b-4a45-9244-06b3ab6c401a |  creating |     bla      |  10  |     None    |  False   |             |
| aadc9c04-17ab-42c4-8bce-c2f63cd287fa | available |  image_new   |  1   |     None    |   True   |             |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+


https://bugs.launchpad.net/nova/+bug/1053931
Comment 1 Xavier Queralt 2013-10-11 08:23:38 EDT
This belongs to cinder as nova-volume doesn't exist any more.

I've been able to reproduce the issue with latest cinder without the need of using glusterfs:

1. kill cinder-volume service: service openstack-cinder-volume stop
2. create a volume before : cinder create 1
3. check the list of volumes: cinder list

The volume will be stuck in the creating state forever and cannot be deleted.
Comment 2 Eric Harney 2013-10-11 11:27:45 EDT
(In reply to Dafna Ron from comment #0)
> This was opened in folsom and fixed. 
> its a regression to Havana. 

What was the behavior in Folsom?

It should be possible to delete the volume w/ cinder force-delete.
Comment 3 Dafna Ron 2013-10-11 12:18:10 EDT
this is the original bug from launchpad: 

https://bugs.launchpad.net/nova/+bug/1053931

according to the bug, if the creation fails in the scheduler the status should change to error allowing the user to delete it.
Comment 7 Flavio Percoco 2013-12-03 09:47:06 EST
The issue here seems to be the Volume manager needing an initialized driver. Since it was not configured correctly, the driver was disabled on boot. Since this is verified by a python decorator *before* getting into the actual manager method, the failure is not being processed correctly.

https://git.openstack.org/cgit/openstack/cinder/tree/cinder/volume/manager.py#n236
Comment 8 Flavio Percoco 2014-01-15 07:40:18 EST
*** Bug 1051605 has been marked as a duplicate of this bug. ***
Comment 9 Flavio Percoco 2014-01-16 05:31:39 EST
Patch backported upstream to stable/havana
Comment 10 Flavio Percoco 2014-01-23 04:31:12 EST
Moving to high since this is quite frustrating from users and leaves the volumes in inconsistent status.
Comment 13 Dafna Ron 2014-02-18 12:58:07 EST
I think that we have a race in the fix. 
I tested in several ways (iptables, unmount and service stop). 

I found a problem when I have a cinder stand alone, I stopped the service in the stand alone and then right away created a volume in the controller 
it seems that because there is a delay in the update of the scheduler the command is sent and get stuck in the scheduler. 
if we wait a minute and than send the create volume from the controller it will fail right away... 
so this seems to be a race but I can still reproduce the bug. 

I think that the bug severity can be lowered because
1. this is now a race
2. we can use the state-reset in cinder to change the status of the volume to available and than delete it. 

moving back to dev.

to reproduce: 

semi distributed setup: 
1. stand alone cinder
2. stand alone glance
3. nova network controller (all other components on this server)
4. one more compute. 

gluster remote storage configured for cinder

1. on the cinder stand alone host run: '/etc/init.d/openstack-cinder-volume stop'
2. quickly after step 1 (within seconds) run 'cinder create --display-name volume 10'

results: cinder volume is stuck in creating. 

[root@orange-vdsf ~(keystone_admin)]# /etc/init.d/openstack-cinder-volume stop
Stopping openstack-cinder-volume:                          [  OK  ]
 
[root@puma31 ~(keystone_admin)]# cinder create --display-name remote-volume-create 10
+---------------------+--------------------------------------+
|       Property      |                Value                 |
+---------------------+--------------------------------------+
|     attachments     |                  []                  |
|  availability_zone  |                 nova                 |
|       bootable      |                false                 |
|      created_at     |      2014-02-18T17:45:12.668313      |
| display_description |                 None                 |
|     display_name    |         remote-volume-create         |
|          id         | cec31979-20dd-4d81-b989-a875c26d22cd |
|       metadata      |                  {}                  |
|         size        |                  10                  |
|     snapshot_id     |                 None                 |
|     source_volid    |                 None                 |
|        status       |               creating               |
|     volume_type     |                 None                 |
+---------------------+--------------------------------------+
[root@puma31 ~(keystone_admin)]# cinder create --display-name remote-volume-create1 10
+---------------------+--------------------------------------+
|       Property      |                Value                 |
+---------------------+--------------------------------------+
|     attachments     |                  []                  |
|  availability_zone  |                 nova                 |
|       bootable      |                false                 |
|      created_at     |      2014-02-18T17:46:30.287351      |
| display_description |                 None                 |
|     display_name    |        remote-volume-create1         |
|          id         | 01a710f4-fe8f-462b-b2c6-74f539ccb1aa |
|       metadata      |                  {}                  |
|         size        |                  10                  |
|     snapshot_id     |                 None                 |
|     source_volid    |                 None                 |
|        status       |               creating               |
|     volume_type     |                 None                 |
+---------------------+--------------------------------------+


As you can see, the first volume is stuck in creating while the second one I created will move to error right away - race? 

[root@puma31 ~(keystone_admin)]# cinder list 
+--------------------------------------+-----------+-----------------------+------+-------------+----------+-------------+
|                  ID                  |   Status  |      Display Name     | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+-----------------------+------+-------------+----------+-------------+
| 01a710f4-fe8f-462b-b2c6-74f539ccb1aa |   error   | remote-volume-create1 |  10  |     None    |  false   |             |
| 9681b5da-975b-4cfe-af5a-a418b19fec81 | available |   copy-of-dafna-vol   |  12  |     None    |  false   |             |
| cec31979-20dd-4d81-b989-a875c26d22cd |  creating |  remote-volume-create |  10  |     None    |  false   |             |
+--------------------------------------+-----------+-----------------------+------+-------------+----------+-------------+
[root@puma31 ~(keystone_admin)]# 

logs will be attached.
Comment 14 Dafna Ron 2014-02-18 13:00:56 EST
Created attachment 864672 [details]
logs
Comment 15 Flavio Percoco 2014-02-19 06:21:19 EST
This sounds like a separate issue. Although it is true that this race can happen, it's quite unlikely and not completely related to the original issue.

I cloned the bug #1066955 and I'll move this one to ON_QA again. 

Thanks a lot Dafna!
Comment 16 Dafna Ron 2014-02-19 06:39:25 EST
moving to verified as agreed with Flavio since the issue is currently reproduced as a race only
Comment 18 errata-xmlrpc 2014-03-04 15:12:46 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0213.html

Note You need to log in before you can comment on or make changes to this bug.