Bug 1035891

Summary: cinder: if cinder-volume is restarted during the execution of a command, the volume is left in a non-consistent status
Product: Red Hat OpenStack Reporter: Dafna Ron <dron>
Component: openstack-cinderAssignee: Flavio Percoco <fpercoco>
Status: CLOSED ERRATA QA Contact: Dafna Ron <dron>
Severity: high Docs Contact:
Priority: high    
Version: 4.0CC: abaron, eharney, fpercoco, mlopes, sclewis, scohen, yeylon
Target Milestone: z2Keywords: CodeChange, TestOnly, ZStream
Target Release: 4.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-cinder-2013.2.1-6.el6ost Doc Type: Bug Fix
Doc Text:
Prior to this update, the driver initialization check was done prior to calling the method responsible for processing the RPC request. Consequently, volumes would enter an inconsistent state ('creating' instead of 'error'), which resulted in unusable and stuck volumes. This issue has been resolved with this update, and the driver check has been moved into the method itself.
Story Points: ---
Clone Of:
: 1051606 1071184 (view as bug list) Environment:
Last Closed: 2014-03-04 20:13:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1016224, 1051605, 1066955    
Bug Blocks: 977865, 1016216, 1051606    
Attachments:
Description Flags
logs
none
logs none

Description Dafna Ron 2013-11-28 18:14:57 UTC
Created attachment 830340 [details]
logs

Description of problem:

if I extend a volume+ restart cinder-volumes service the volume state remains in "extending" while in actuality the action failed with the following error:

DriverNotInitialized: Volume driver 'GlusterfsDriver' not initialized

Version-Release number of selected component (if applicable):

openstack-cinder-2013.2-2.el6ost.noarch

How reproducible:

100%

Steps to Reproduce:
1. create a volume
2. run the following commands from two sessions simultaneously:
session1: cinder extend <volume_id> <new_size>
session2: /etc/init.d/openstack-cinder-volume restart

Actual results:

volume status will be "extending" 
there is no warning and user will have no inclination that the action failed 

Expected results:

if a service restarts there should be a state change for the volume to error_extending

Additional info:

>>>>>extend>>>>>
2013-11-28 20:01:08.210 12269 ERROR cinder.openstack.common.rpc.amqp [req-d060442d-61b0-48ce-b926-c177f7608790 f40889a38c4644fab9d91b73fcfde1bd ddbce226991a45bf97642120207897cb] Exception during message handling
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp Traceback (most recent call last):
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp   File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/amqp.py", line 441, in _process_data
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp     **args)
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp   File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/dispatcher.py", line 148, in dispatch
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp     return getattr(proxyobj, method)(ctxt, **kwargs)
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp   File "/usr/lib/python2.6/site-packages/cinder/utils.py", line 807, in wrapper
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp     raise exception.DriverNotInitialized(driver=driver_name)
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp DriverNotInitialized: Volume driver 'GlusterfsDriver' not initialized.
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp 
/var/log/cinder/volume.log (END)

Comment 1 Dafna Ron 2013-11-28 18:44:51 UTC
volume state remains in extending when we try to extend while the volume status is down: 

[root@cougar06 ~(keystone_admin)]# cinder list 
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
|                  ID                  |   Status  | Display Name | Size | Volume Type | Bootable |             Attached to              |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| 1121404d-372f-4c9e-9f00-8fa8e253c064 |   in-use  |    small     |  2   |     None    |  false   | 8401c157-032b-47d5-bcf9-e2fc413f333c |
| 2f2abeea-44b7-43f5-9536-655077875bf5 |   in-use  |    dafna     |  6   |     None    |  false   | 4ab92098-6587-4afa-9c41-c7adf93d76e0 |
| 990d2111-240d-4184-bb76-e6a82b2fda81 | available |     user     |  3   |     None    |  false   |                                      |
| a6b9603d-1047-4965-86a0-4f15d34876b5 | available |     test     |  4   |     None    |  false   |                                      |
| ea526067-a6bd-4e58-9db2-d0e58d166022 | available |    dafna1    |  9   |     None    |  false   |                                      |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
[root@cougar06 ~(keystone_admin)]# cinder list ^C
[root@cougar06 ~(keystone_admin)]# echo '>>>>>extend>>>>>' >> /var/log/cinder/scheduler.log 
[root@cougar06 ~(keystone_admin)]# cinder extend ea526067-a6bd-4e58-9db2-d0e58d166022 11^C
[root@cougar06 ~(keystone_admin)]# /etc/init.d/openstack-cinder-volume status
openstack-cinder-volume is stopped
[root@cougar06 ~(keystone_admin)]# cinder extend ea526067-a6bd-4e58-9db2-d0e58d166022 11
[root@cougar06 ~(keystone_admin)]# cinder list 
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
|                  ID                  |   Status  | Display Name | Size | Volume Type | Bootable |             Attached to              |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| 1121404d-372f-4c9e-9f00-8fa8e253c064 |   in-use  |    small     |  2   |     None    |  false   | 8401c157-032b-47d5-bcf9-e2fc413f333c |
| 2f2abeea-44b7-43f5-9536-655077875bf5 |   in-use  |    dafna     |  6   |     None    |  false   | 4ab92098-6587-4afa-9c41-c7adf93d76e0 |
| 990d2111-240d-4184-bb76-e6a82b2fda81 | available |     user     |  3   |     None    |  false   |                                      |
| a6b9603d-1047-4965-86a0-4f15d34876b5 | available |     test     |  4   |     None    |  false   |                                      |
| ea526067-a6bd-4e58-9db2-d0e58d166022 | extending |    dafna1    |  9   |     None    |  false   |                                      |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
[root@cougar06 ~(keystone_admin)]# less /var/log/cinder/volume.log 
[root@cougar06 ~(keystone_admin)]# 
[root@cougar06 ~(keystone_admin)]# 
[root@cougar06 ~(keystone_admin)]# cinder list 
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
|                  ID                  |   Status  | Display Name | Size | Volume Type | Bootable |             Attached to              |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| 1121404d-372f-4c9e-9f00-8fa8e253c064 |   in-use  |    small     |  2   |     None    |  false   | 8401c157-032b-47d5-bcf9-e2fc413f333c |
| 2f2abeea-44b7-43f5-9536-655077875bf5 |   in-use  |    dafna     |  6   |     None    |  false   | 4ab92098-6587-4afa-9c41-c7adf93d76e0 |
| 990d2111-240d-4184-bb76-e6a82b2fda81 | available |     user     |  3   |     None    |  false   |                                      |
| a6b9603d-1047-4965-86a0-4f15d34876b5 | available |     test     |  4   |     None    |  false   |                                      |
| ea526067-a6bd-4e58-9db2-d0e58d166022 | extending |    dafna1    |  9   |     None    |  false   |                                      |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+


>>>>>extend>>>>>
2013-11-28 20:01:08.210 12269 ERROR cinder.openstack.common.rpc.amqp [req-d060442d-61b0-48ce-b926-c177f7608790 f40889a38c4644fab9d91b73fcfde1bd ddbce226991a45bf97642120207897cb] Exception during message handling
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp Traceback (most recent call last):
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp   File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/amqp.py", line 441, in _process_data
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp     **args)
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp   File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/dispatcher.py", line 148, in dispatch
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp     return getattr(proxyobj, method)(ctxt, **kwargs)
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp   File "/usr/lib/python2.6/site-packages/cinder/utils.py", line 807, in wrapper
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp     raise exception.DriverNotInitialized(driver=driver_name)
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp DriverNotInitialized: Volume driver 'GlusterfsDriver' not initialized.
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp 
2013-11-28 20:27:42.585 12269 WARNING cinder.quota [req-9a315f82-53db-4e86-b2c1-9df042ea1ed5 f40889a38c4644fab9d91b73fcfde1bd ddbce226991a45bf97642120207897cb] Deprecated: Default quota for resource: gigabytes is set by the default quota flag: quota_gigabytes, it is now deprecated. Please use the the default quota class for default quota.
2013-11-28 20:28:51.112 12269 WARNING cinder.quota [req-ca0c5611-ec0f-44ea-8137-560c5134309e f40889a38c4644fab9d91b73fcfde1bd ddbce226991a45bf97642120207897cb] Deprecated: Default quota for resource: gigabytes is set by the default quota flag: quota_gigabytes, it is now deprecated. Please use the the default quota class for default quota.
2013-11-28 20:29:54.860 12269 WARNING cinder.quota [req-1614bec8-762d-4979-957d-274b7ca6ec68 f40889a38c4644fab9d91b73fcfde1bd ddbce226991a45bf97642120207897cb] Deprecated: Default quota for resource: gigabytes is set by the default quota flag: quota_gigabytes, it is now deprecated. Please use the the default quota class for default quota.
2013-11-28 20:31:18.986 12269 WARNING cinder.quota [req-59fc9d53-0d33-4304-ab3d-257dd24e8c0c f40889a38c4644fab9d91b73fcfde1bd ddbce226991a45bf97642120207897cb] Deprecated: Default quota for resource: gigabytes is set by the default quota flag: quota_gigabytes, it is now deprecated. Please use the the default quota class for default quota.
2013-11-28 20:32:19.883 12269 WARNING cinder.quota [req-18396ff0-3550-4acc-8e22-bf25feb4fe16 f40889a38c4644fab9d91b73fcfde1bd ddbce226991a45bf97642120207897cb] Deprecated: Default quota for resource: gigabytes is set by the default quota flag: quota_gigabytes, it is now deprecated. Please use the the default quota class for default quota.
2013-11-28 20:32:20.139 12269 ERROR cinder.volume.manager [req-18396ff0-3550-4acc-8e22-bf25feb4fe16 f40889a38c4644fab9d91b73fcfde1bd ddbce226991a45bf97642120207897cb] volume ea526067-a6bd-4e58-9db2-d0e58d166022: Error trying to extend volume
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager Traceback (most recent call last):
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager   File "/usr/lib/python2.6/site-packages/cinder/volume/manager.py", line 875, in extend_volume
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager     self.driver.extend_volume(volume, new_size)
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager   File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/glusterfs.py", line 919, in extend_volume
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager     info = self._qemu_img_info(volume_path)
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager   File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/glusterfs.py", line 132, in _qemu_img_info
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager     info = image_utils.qemu_img_info(path)
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager   File "/usr/lib/python2.6/site-packages/cinder/image/image_utils.py", line 191, in qemu_img_info
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager     out, err = utils.execute(*cmd, run_as_root=True)
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager   File "/usr/lib/python2.6/site-packages/cinder/utils.py", line 142, in execute
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager     return processutils.execute(*cmd, **kwargs)
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager   File "/usr/lib/python2.6/site-packages/cinder/openstack/common/processutils.py", line 173, in execute
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager     cmd=' '.join(cmd))
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager ProcessExecutionError: Unexpected error while running command.
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C LANG=C qemu-img info /var/lib/cinder/mnt/d1c4e0e3273b195c7d311f8303edd391/volume-ea526067-a6bd-4e58-9db2-d0e58d166022
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager Exit code: 1
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager Stdout: ''
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager Stderr: "Could not open '/var/lib/cinder/mnt/d1c4e0e3273b195c7d311f8303edd391/volume-ea526067-a6bd-4e58-9db2-d0e58d166022': No such file or directory\n"
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager 
(END)

Comment 2 Flavio Percoco 2013-12-04 08:25:34 UTC
This seems related to bug #1016224. The volume creation command fails before getting into the volume manager where the status management should happen.

Comment 4 Eric Harney 2014-01-13 15:58:43 UTC
*** Bug 1051606 has been marked as a duplicate of this bug. ***

Comment 5 Flavio Percoco 2014-01-16 10:31:10 UTC
Patch backported upstream for stable/havana

Comment 6 Flavio Percoco 2014-01-23 09:31:02 UTC
Moving to high since this is quite frustrating from users and leaves the volumes in inconsistent status.

Comment 8 Dafna Ron 2014-02-19 11:34:32 UTC
I am still able to reproduce this on openstack-cinder-2013.2.2-1.el6ost.noarch

moving back to dev with logs. 

it seems that it's a case of scheduler not knowing the status of a service (race).

Comment 9 Dafna Ron 2014-02-19 11:37:32 UTC
Created attachment 865043 [details]
logs

Comment 10 Dafna Ron 2014-02-19 11:49:15 UTC
[root@orange-vdsf ~(keystone_admin)]# cinder list 
+--------------------------------------+-----------+-------------------+------+-------------+----------+-------------+
|                  ID                  |   Status  |    Display Name   | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+-------------------+------+-------------+----------+-------------+
| 1be7427d-bec7-4afb-8891-34b350b1f613 | extending |       dafna3      |  2   |     None    |  false   |             |
| 2b441306-0064-4da9-bf8a-ac19bf4342ba | extending |       dafna2      |  20  |     None    |  false   |             |
| 885ab1a8-e9e3-4478-a39a-aa7af9af05dc | extending |    extend-dafna   |  12  |     None    |  false   |             |
| 9681b5da-975b-4cfe-af5a-a418b19fec81 | available | copy-of-dafna-vol |  12  |     None    |  false   |             |
+--------------------------------------+-----------+-------------------+------+-------------+----------+-------------+
 here is the output from cinder list (with volume UUID and state)

Comment 11 Flavio Percoco 2014-02-19 13:01:37 UTC
The title of this bug seems quite misleading to me now.

The issue here is that volume operations interrupted when cinder-volume goes down are not rolled back, which leaves the volume status in a not consistent state. However, there are a few other implications here:

1. There's no easy way to know when something has gone wrong with one of the cinder-volumes node after the rpc.cast is sent unless we add a sentinel that watches over volume's status changes.

2. If the cinder-volume goes down *while* extending a volume, we can't simply restore the volume status because we don't actually know what the status is. The volume at this point could be completely broken and there's no way - without manual inspection - to know that.

So, we could think about possible solutions for this issue but it would go way beyond this bug and Icehouse. It requires discussions upstream - if it is even considered a real issue.

Thing is, if cinder-scheduler sends a cast command to cinder-volume and then cinder-volume goes down *while* executing that command, I think the wrong status in the volume is the least important issue a cloud operator would have.

Comment 12 Flavio Percoco 2014-02-19 13:02:22 UTC
*** Bug 1066955 has been marked as a duplicate of this bug. ***

Comment 13 Flavio Percoco 2014-03-03 10:45:30 UTC
This bug was originally created to track a failure with driver initialization and evolved to ensure volume's status consistency between `cinder-volume` crashes. Based on what I've explained in c#11, I've cloned this bug to track the changes that will need to happen in cinder and I'll close this bug as verified since the original issues with regards to driver initialization where indeed fixed - as show in other bugs related to the same issue.

Comment 15 errata-xmlrpc 2014-03-04 20:13:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0213.html