Bug 1035891
Summary: | cinder: if cinder-volume is restarted during the execution of a command, the volume is left in a non-consistent status | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Dafna Ron <dron> | ||||||
Component: | openstack-cinder | Assignee: | Flavio Percoco <fpercoco> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Dafna Ron <dron> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 4.0 | CC: | abaron, eharney, fpercoco, mlopes, sclewis, scohen, yeylon | ||||||
Target Milestone: | z2 | Keywords: | CodeChange, TestOnly, ZStream | ||||||
Target Release: | 4.0 | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | openstack-cinder-2013.2.1-6.el6ost | Doc Type: | Bug Fix | ||||||
Doc Text: |
Prior to this update, the driver initialization check was done prior to calling the method responsible for processing the RPC request. Consequently, volumes would enter an inconsistent state ('creating' instead of 'error'), which resulted in unusable and stuck volumes.
This issue has been resolved with this update, and the driver check has been moved into the method itself.
|
Story Points: | --- | ||||||
Clone Of: | |||||||||
: | 1051606 1071184 (view as bug list) | Environment: | |||||||
Last Closed: | 2014-03-04 20:13:20 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1016224, 1051605, 1066955 | ||||||||
Bug Blocks: | 977865, 1016216, 1051606 | ||||||||
Attachments: |
|
Description
Dafna Ron
2013-11-28 18:14:57 UTC
volume state remains in extending when we try to extend while the volume status is down:
[root@cougar06 ~(keystone_admin)]# cinder list
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| ID | Status | Display Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| 1121404d-372f-4c9e-9f00-8fa8e253c064 | in-use | small | 2 | None | false | 8401c157-032b-47d5-bcf9-e2fc413f333c |
| 2f2abeea-44b7-43f5-9536-655077875bf5 | in-use | dafna | 6 | None | false | 4ab92098-6587-4afa-9c41-c7adf93d76e0 |
| 990d2111-240d-4184-bb76-e6a82b2fda81 | available | user | 3 | None | false | |
| a6b9603d-1047-4965-86a0-4f15d34876b5 | available | test | 4 | None | false | |
| ea526067-a6bd-4e58-9db2-d0e58d166022 | available | dafna1 | 9 | None | false | |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
[root@cougar06 ~(keystone_admin)]# cinder list ^C
[root@cougar06 ~(keystone_admin)]# echo '>>>>>extend>>>>>' >> /var/log/cinder/scheduler.log
[root@cougar06 ~(keystone_admin)]# cinder extend ea526067-a6bd-4e58-9db2-d0e58d166022 11^C
[root@cougar06 ~(keystone_admin)]# /etc/init.d/openstack-cinder-volume status
openstack-cinder-volume is stopped
[root@cougar06 ~(keystone_admin)]# cinder extend ea526067-a6bd-4e58-9db2-d0e58d166022 11
[root@cougar06 ~(keystone_admin)]# cinder list
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| ID | Status | Display Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| 1121404d-372f-4c9e-9f00-8fa8e253c064 | in-use | small | 2 | None | false | 8401c157-032b-47d5-bcf9-e2fc413f333c |
| 2f2abeea-44b7-43f5-9536-655077875bf5 | in-use | dafna | 6 | None | false | 4ab92098-6587-4afa-9c41-c7adf93d76e0 |
| 990d2111-240d-4184-bb76-e6a82b2fda81 | available | user | 3 | None | false | |
| a6b9603d-1047-4965-86a0-4f15d34876b5 | available | test | 4 | None | false | |
| ea526067-a6bd-4e58-9db2-d0e58d166022 | extending | dafna1 | 9 | None | false | |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
[root@cougar06 ~(keystone_admin)]# less /var/log/cinder/volume.log
[root@cougar06 ~(keystone_admin)]#
[root@cougar06 ~(keystone_admin)]#
[root@cougar06 ~(keystone_admin)]# cinder list
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| ID | Status | Display Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| 1121404d-372f-4c9e-9f00-8fa8e253c064 | in-use | small | 2 | None | false | 8401c157-032b-47d5-bcf9-e2fc413f333c |
| 2f2abeea-44b7-43f5-9536-655077875bf5 | in-use | dafna | 6 | None | false | 4ab92098-6587-4afa-9c41-c7adf93d76e0 |
| 990d2111-240d-4184-bb76-e6a82b2fda81 | available | user | 3 | None | false | |
| a6b9603d-1047-4965-86a0-4f15d34876b5 | available | test | 4 | None | false | |
| ea526067-a6bd-4e58-9db2-d0e58d166022 | extending | dafna1 | 9 | None | false | |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
>>>>>extend>>>>>
2013-11-28 20:01:08.210 12269 ERROR cinder.openstack.common.rpc.amqp [req-d060442d-61b0-48ce-b926-c177f7608790 f40889a38c4644fab9d91b73fcfde1bd ddbce226991a45bf97642120207897cb] Exception during message handling
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp Traceback (most recent call last):
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/amqp.py", line 441, in _process_data
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp **args)
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/dispatcher.py", line 148, in dispatch
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp return getattr(proxyobj, method)(ctxt, **kwargs)
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/utils.py", line 807, in wrapper
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp raise exception.DriverNotInitialized(driver=driver_name)
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp DriverNotInitialized: Volume driver 'GlusterfsDriver' not initialized.
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp
2013-11-28 20:27:42.585 12269 WARNING cinder.quota [req-9a315f82-53db-4e86-b2c1-9df042ea1ed5 f40889a38c4644fab9d91b73fcfde1bd ddbce226991a45bf97642120207897cb] Deprecated: Default quota for resource: gigabytes is set by the default quota flag: quota_gigabytes, it is now deprecated. Please use the the default quota class for default quota.
2013-11-28 20:28:51.112 12269 WARNING cinder.quota [req-ca0c5611-ec0f-44ea-8137-560c5134309e f40889a38c4644fab9d91b73fcfde1bd ddbce226991a45bf97642120207897cb] Deprecated: Default quota for resource: gigabytes is set by the default quota flag: quota_gigabytes, it is now deprecated. Please use the the default quota class for default quota.
2013-11-28 20:29:54.860 12269 WARNING cinder.quota [req-1614bec8-762d-4979-957d-274b7ca6ec68 f40889a38c4644fab9d91b73fcfde1bd ddbce226991a45bf97642120207897cb] Deprecated: Default quota for resource: gigabytes is set by the default quota flag: quota_gigabytes, it is now deprecated. Please use the the default quota class for default quota.
2013-11-28 20:31:18.986 12269 WARNING cinder.quota [req-59fc9d53-0d33-4304-ab3d-257dd24e8c0c f40889a38c4644fab9d91b73fcfde1bd ddbce226991a45bf97642120207897cb] Deprecated: Default quota for resource: gigabytes is set by the default quota flag: quota_gigabytes, it is now deprecated. Please use the the default quota class for default quota.
2013-11-28 20:32:19.883 12269 WARNING cinder.quota [req-18396ff0-3550-4acc-8e22-bf25feb4fe16 f40889a38c4644fab9d91b73fcfde1bd ddbce226991a45bf97642120207897cb] Deprecated: Default quota for resource: gigabytes is set by the default quota flag: quota_gigabytes, it is now deprecated. Please use the the default quota class for default quota.
2013-11-28 20:32:20.139 12269 ERROR cinder.volume.manager [req-18396ff0-3550-4acc-8e22-bf25feb4fe16 f40889a38c4644fab9d91b73fcfde1bd ddbce226991a45bf97642120207897cb] volume ea526067-a6bd-4e58-9db2-d0e58d166022: Error trying to extend volume
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager Traceback (most recent call last):
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager File "/usr/lib/python2.6/site-packages/cinder/volume/manager.py", line 875, in extend_volume
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager self.driver.extend_volume(volume, new_size)
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/glusterfs.py", line 919, in extend_volume
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager info = self._qemu_img_info(volume_path)
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/glusterfs.py", line 132, in _qemu_img_info
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager info = image_utils.qemu_img_info(path)
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager File "/usr/lib/python2.6/site-packages/cinder/image/image_utils.py", line 191, in qemu_img_info
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager out, err = utils.execute(*cmd, run_as_root=True)
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager File "/usr/lib/python2.6/site-packages/cinder/utils.py", line 142, in execute
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager return processutils.execute(*cmd, **kwargs)
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager File "/usr/lib/python2.6/site-packages/cinder/openstack/common/processutils.py", line 173, in execute
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager cmd=' '.join(cmd))
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager ProcessExecutionError: Unexpected error while running command.
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C LANG=C qemu-img info /var/lib/cinder/mnt/d1c4e0e3273b195c7d311f8303edd391/volume-ea526067-a6bd-4e58-9db2-d0e58d166022
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager Exit code: 1
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager Stdout: ''
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager Stderr: "Could not open '/var/lib/cinder/mnt/d1c4e0e3273b195c7d311f8303edd391/volume-ea526067-a6bd-4e58-9db2-d0e58d166022': No such file or directory\n"
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager
(END)
This seems related to bug #1016224. The volume creation command fails before getting into the volume manager where the status management should happen. *** Bug 1051606 has been marked as a duplicate of this bug. *** Patch backported upstream for stable/havana Moving to high since this is quite frustrating from users and leaves the volumes in inconsistent status. I am still able to reproduce this on openstack-cinder-2013.2.2-1.el6ost.noarch moving back to dev with logs. it seems that it's a case of scheduler not knowing the status of a service (race). Created attachment 865043 [details]
logs
[root@orange-vdsf ~(keystone_admin)]# cinder list +--------------------------------------+-----------+-------------------+------+-------------+----------+-------------+ | ID | Status | Display Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+-----------+-------------------+------+-------------+----------+-------------+ | 1be7427d-bec7-4afb-8891-34b350b1f613 | extending | dafna3 | 2 | None | false | | | 2b441306-0064-4da9-bf8a-ac19bf4342ba | extending | dafna2 | 20 | None | false | | | 885ab1a8-e9e3-4478-a39a-aa7af9af05dc | extending | extend-dafna | 12 | None | false | | | 9681b5da-975b-4cfe-af5a-a418b19fec81 | available | copy-of-dafna-vol | 12 | None | false | | +--------------------------------------+-----------+-------------------+------+-------------+----------+-------------+ here is the output from cinder list (with volume UUID and state) The title of this bug seems quite misleading to me now. The issue here is that volume operations interrupted when cinder-volume goes down are not rolled back, which leaves the volume status in a not consistent state. However, there are a few other implications here: 1. There's no easy way to know when something has gone wrong with one of the cinder-volumes node after the rpc.cast is sent unless we add a sentinel that watches over volume's status changes. 2. If the cinder-volume goes down *while* extending a volume, we can't simply restore the volume status because we don't actually know what the status is. The volume at this point could be completely broken and there's no way - without manual inspection - to know that. So, we could think about possible solutions for this issue but it would go way beyond this bug and Icehouse. It requires discussions upstream - if it is even considered a real issue. Thing is, if cinder-scheduler sends a cast command to cinder-volume and then cinder-volume goes down *while* executing that command, I think the wrong status in the volume is the least important issue a cloud operator would have. *** Bug 1066955 has been marked as a duplicate of this bug. *** This bug was originally created to track a failure with driver initialization and evolved to ensure volume's status consistency between `cinder-volume` crashes. Based on what I've explained in c#11, I've cloned this bug to track the changes that will need to happen in cinder and I'll close this bug as verified since the original issues with regards to driver initialization where indeed fixed - as show in other bugs related to the same issue. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0213.html |