Description of problem: When attempting to create multiple VMs from this image via the Horizon UI, only one will be successful, because the same qemu image name is being used for both the source and destination copy. The cinder volume is backed by NFS Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: Cu uses cinder backend in glance, and see error caused by the issue when creating multiple volumes from the same image.
Let me add some more information about reproducibility. --- Version-Release number of selected component (if applicable): RHOSP13z7 openstack-cinder-volume:13.0-79 How reproducible: Always Steps to Reproduce: 1. Create a source cinder volume 2. Create a multiple cloned volumes from the source volume Actual results: Some of the volume creations fails with ERROR status Expected results: All volumes are created without any errors. ---
The problem is that we have two concurrent process using the same snapshot name => tmp-snap-c13874e6-fad2-4d6a-a515-51b957f09786 2019-08-15 21:30:33.281 78 INFO cinder.volume.drivers.remotefs [req-a38eb5e0-a52b-458c-a299-d7b59a454995 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Cloning volume c13874e6-fad2-4d6a-a515-51b957f09 786 to volume 22be6bbd-9e5c-4a66-b769-e812137cf491 ... 2019-08-15 21:30:33.283 78 DEBUG cinder.volume.drivers.remotefs [req-a38eb5e0-a52b-458c-a299-d7b59a454995 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Creating offline snapshot tmp-snap-c13874e6-fad2-4d6a-a515-51b957f09786 of volume c13874e6-fad2-4d6a-a515-51b957f09786 _create_snapshot /usr/lib/python2.7/site-packages/cinder/volume/drivers/remotefs.py:1386 ... 2019-08-15 21:30:33.285 78 DEBUG oslo_concurrency.processutils [req-a38eb5e0-a52b-458c-a299-d7b59a454995 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Running cmd (subprocess): /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=8 -- env LC_ALL=C qemu-img info --force-share /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786 execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372 ... 2019-08-15 21:30:35.667 78 DEBUG cinder.volume.drivers.nfs [req-a38eb5e0-a52b-458c-a299-d7b59a454995 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Copying snapshot: tmp-snap-c13874e6-fad2-4d6a-a515-51b957f09786 -> volume: 22be6bbd-9e5c-4a66-b769-e812137cf491, volume_size: 20 GB _copy_volume_from_snapshot /usr/lib/python2.7/site-packages/cinder/volume/drivers/nfs.py:586
Perhaps this would be sufficient: [dhill@knox cinder]$ git diff diff --git a/cinder/volume/drivers/remotefs.py b/cinder/volume/drivers/remotefs.py index 196117827..40bff414a 100644 --- a/cinder/volume/drivers/remotefs.py +++ b/cinder/volume/drivers/remotefs.py @@ -941,7 +941,7 @@ class RemoteFSSnapDriverBase(RemoteFSDriver, driver.SnapshotVD): volume_size=src_vref.size, name='clone-snap-%s' % src_vref.id, volume_id=src_vref.id, - id='tmp-snap-%s' % src_vref.id, + id='tmp-snap-%s' % volume.id, volume=src_vref) self._create_snapshot(temp_snapshot) Where instead of using the src_vref.id, we'd be using the destination volume.id in the snapshot name.
@David, thanks for working on this! As Gandalf might say, the cinder squad will help you bear this burden.
We got an issue with concurrency . The patch works but something else breaks: 2019-08-20 15:20:45.974 49 DEBUG oslo_concurrency.processutils [req-11b89aec-5a0d-4b5b-8d87-e90cb8adfd78 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Running cmd (subprocess): qemu-img commit -d /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372 2019-08-20 15:20:47.055 49 DEBUG oslo_concurrency.processutils [req-11b89aec-5a0d-4b5b-8d87-e90cb8adfd78 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] CMD "qemu-img commit -d /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a" returned: 1 in 1.081s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409 2019-08-20 15:20:47.056 49 DEBUG oslo_concurrency.processutils [req-11b89aec-5a0d-4b5b-8d87-e90cb8adfd78 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] u'qemu-img commit -d /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a' failed. Not Retrying. execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:457 Command: qemu-img commit -d /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a 2019-08-20 15:20:47.066 49 ERROR cinder.volume.manager Command: qemu-img commit -d /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a Command: qemu-img commit -d /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a Command: qemu-img commit -d /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a 2019-08-20 15:20:48.732 49 ERROR oslo_messaging.rpc.server Command: qemu-img commit -d /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a And until we create that file, we're suck in that loop and no other volumes can be created ... Even restarting cinder-volume doesn't solve this issue. I've got the cinder-volume.log for this failure .
This is as if the Snapshot object wouldn't initialized for each call and some older data would be used. Wy is it copying 811fe7bc-c8e7-49ba-84c6-3af4c595118a to 319dcfba-8cae-4411-88ec-a8af191c69ef ? 2019-08-20 15:20:12.639 78 DEBUG oslo_concurrency.processutils [req-ae5ddb1e-6cd1-4732-a3eb-0e9d79ac6e9d 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Running cmd (subprocess): qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-319dcfba-8cae-4411-88ec-a8af191c69ef execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372 tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a should be copied to 811fe7bc-c8e7-49ba-84c6-3af4c595118a only.
Maybe this is a nova bug with max_count and block device mapping ?
We created 5 VMs and we get this: 2019-08-20 15:19:55.185 49 DEBUG oslo_concurrency.processutils [req-67104a80-91d4-4cd1-b606-b6e86bc8cc84 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Running cmd (subprocess): qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-37839d77-df02-4119-ae33-ea1c55882dbc /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-7acf3974-eeb5-451d-a855-f3ad16264bd1 execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372 2019-08-20 15:19:57.672 78 DEBUG oslo_concurrency.processutils [req-79e3c5cf-b584-4dd4-a8dd-820ca7cc058f 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Running cmd (subprocess): qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-37839d77-df02-4119-ae33-ea1c55882dbc /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-a74cd576-847d-4462-b128-802d0d322a88 execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372 2019-08-20 15:20:00.490 78 DEBUG oslo_concurrency.processutils [req-c258d369-687f-4649-a754-d0c54dd3ee6c 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Running cmd (subprocess): qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-37839d77-df02-4119-ae33-ea1c55882dbc /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-cc3726ff-1a9a-4e02-9e87-1bc928a2f7c1 execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372 2019-08-20 15:20:02.618 49 DEBUG oslo_concurrency.processutils [req-962f0cb1-e306-49e3-a48d-74aea04798d0 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Running cmd (subprocess): qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-7acf3974-eeb5-451d-a855-f3ad16264bd1 /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-30341363-768d-4cf2-ba6e-88acd1352bd2 execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372 2019-08-20 15:20:03.747 49 DEBUG oslo_concurrency.processutils [req-79ea6db6-8192-4454-ba6d-6b436c82c2b2 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Running cmd (subprocess): qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-a74cd576-847d-4462-b128-802d0d322a88 /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-d253b8aa-cd8e-4446-a6e8-6598f33fbc22 execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372 2019-08-20 15:20:05.577 49 DEBUG oslo_concurrency.processutils [req-11b89aec-5a0d-4b5b-8d87-e90cb8adfd78 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Running cmd (subprocess): qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-a74cd576-847d-4462-b128-802d0d322a88 /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-811fe7bc-c8e7-49ba-84c6-3af4c595118a execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372 2019-08-20 15:20:10.881 78 DEBUG oslo_concurrency.processutils [req-96cdf880-045b-4509-9da9-ad36cd20a0a8 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Running cmd (subprocess): qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-d253b8aa-cd8e-4446-a6e8-6598f33fbc22 /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-57adf8e7-848e-48ac-b445-2af6f4ab15d4 execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372 2019-08-20 15:20:12.639 78 DEBUG oslo_concurrency.processutils [req-ae5ddb1e-6cd1-4732-a3eb-0e9d79ac6e9d 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Running cmd (subprocess): qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-319dcfba-8cae-4411-88ec-a8af191c69ef execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372 2019-08-20 15:20:35.014 78 DEBUG oslo_concurrency.processutils [req-79e3c5cf-b584-4dd4-a8dd-820ca7cc058f 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] CMD "qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-37839d77-df02-4119-ae33-ea1c55882dbc /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-a74cd576-847d-4462-b128-802d0d322a88" returned: 0 in 37.343s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409 2019-08-20 15:20:35.141 49 DEBUG oslo_concurrency.processutils [req-11b89aec-5a0d-4b5b-8d87-e90cb8adfd78 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] CMD "qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-a74cd576-847d-4462-b128-802d0d322a88 /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-811fe7bc-c8e7-49ba-84c6-3af4c595118a" returned: 0 in 29.564s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409 2019-08-20 15:20:35.142 49 DEBUG oslo_concurrency.processutils [req-67104a80-91d4-4cd1-b606-b6e86bc8cc84 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] CMD "qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-37839d77-df02-4119-ae33-ea1c55882dbc /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-7acf3974-eeb5-451d-a855-f3ad16264bd1" returned: 0 in 39.957s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409 2019-08-20 15:20:35.143 49 DEBUG oslo_concurrency.processutils [req-79ea6db6-8192-4454-ba6d-6b436c82c2b2 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] CMD "qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-a74cd576-847d-4462-b128-802d0d322a88 /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-d253b8aa-cd8e-4446-a6e8-6598f33fbc22" returned: 0 in 31.396s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409 2019-08-20 15:20:35.143 49 DEBUG oslo_concurrency.processutils [req-962f0cb1-e306-49e3-a48d-74aea04798d0 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] CMD "qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-7acf3974-eeb5-451d-a855-f3ad16264bd1 /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-30341363-768d-4cf2-ba6e-88acd1352bd2" returned: 0 in 32.525s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409 2019-08-20 15:20:35.554 78 DEBUG oslo_concurrency.processutils [req-c258d369-687f-4649-a754-d0c54dd3ee6c 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] CMD "qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-37839d77-df02-4119-ae33-ea1c55882dbc /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-cc3726ff-1a9a-4e02-9e87-1bc928a2f7c1" returned: 0 in 35.064s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409 2019-08-20 15:20:35.556 78 DEBUG oslo_concurrency.processutils [req-96cdf880-045b-4509-9da9-ad36cd20a0a8 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] CMD "qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-d253b8aa-cd8e-4446-a6e8-6598f33fbc22 /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-57adf8e7-848e-48ac-b445-2af6f4ab15d4" returned: 0 in 24.675s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409 2019-08-20 15:20:35.557 78 DEBUG oslo_concurrency.processutils [req-ae5ddb1e-6cd1-4732-a3eb-0e9d79ac6e9d 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] CMD "qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-319dcfba-8cae-4411-88ec-a8af191c69ef" returned: 0 in 22.918s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409
Reassigning to Eric, who is our NFS driver expert.
Customer hit this new issue after applying the hotfix : <</usr/lib/python2.7/site-packages/cinder/volume/drivers/remotefs.py>> 1662 def create_cloned_volume(self, volume, src_vref): 1663 """Creates a clone of the specified volume.""" 1664 1665 return self._create_cloned_volume(volume, src_vref, 1666 src_vref.obj_context) 1667 1668 @coordination.synchronized('{self.driver_prefix}-{volume.id}') ---Cinder-Volume Error --- 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server [req-fce84ebf-1a56-4739-bc1b-314e8339c53c 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Exception during message handling: AttributeError: 'Volume' object has no attribute 'obj_context' 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server File "<string>", line 2, in create_volume 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/objects/cleanable.py", line 207, in wrapper 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 690, in create_volume 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server _run_flow() 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 682, in _run_flow 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server flow_engine.run() 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/engine.py", line 247, in run 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server for _state in self.run_iter(timeout=timeout): 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/engine.py", line 340, in run_iter 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server failure.Failure.reraise_if_any(er_failures) 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/taskflow/types/failure.py", line 336, in reraise_if_any 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server failures[0].reraise() 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/taskflow/types/failure.py", line 343, in reraise 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server six.reraise(*self._exc_info) 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server result = task.execute(**arguments) 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 1028, in execute 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server **volume_spec) 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 912, in _create_from_image 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server image_meta) 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 642, in _clone_image_volume 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server ret = self.driver.create_cloned_volume(volume, image_volume) 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server File "<string>", line 2, in create_cloned_volume 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/coordination.py", line 151, in _synchronized 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server return f(*a, **k) 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server File "<string>", line 2, in create_cloned_volume 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/coordination.py", line 151, in _synchronized 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server return f(*a, **k) 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/remotefs.py", line 1666, in create_cloned_volume 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server src_vref.obj_context) 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server AttributeError: 'Volume' object has no attribute 'obj_context' 2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server
I suspect we're hitting a bug introduced by 8d7ecfb0c9f769923e20452b8027eb117bf2be52 ...
Created attachment 1687660 [details] Cinder logs Rajat, First of all just to be sure of verification steps: 1. Glance using Cinder, Cinder backed by NFS. 3. Create 5 (or more) volumes simultaneously from same image. Or should we boot 5 VMs from an image create a volume? One more thing bothers me while setting this up, Glance using Cinder is configured and working, Cinder still uses default LVM, didn't switch to NFS yet. Out of curiosity figured lets have a go at things. Issued this command: (overcloud) [stack@undercloud-0 ~]$ for i in {1..9}; do cinder create 1 --image cirros ; done And got: (overcloud) [stack@undercloud-0 ~]$ cinder list +--------------------------------------+-----------+------+------+-------------+----------+-------------+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+-----------+------+------+-------------+----------+-------------+ | 17552814-94de-4d68-840a-372df955df9b | error | - | 1 | tripleo | false | | | 2e6097d9-e07a-4c9b-b5fc-56a10aed83d7 | error | - | 1 | tripleo | false | | | 330e215c-8892-428e-8ebb-6b2a1ce507f1 | error | - | 1 | tripleo | false | | | 4d4347cb-cba6-48f3-8cf4-d82d0df72d69 | available | - | 1 | tripleo | true | | | a5b728e4-064c-45bc-85fe-e26f2a057adf | available | - | 1 | tripleo | true | | | a63d81df-60d9-4449-ae45-be287b576ebb | available | - | 1 | tripleo | true | | | c0cfdcf7-9df3-49f9-bd83-3b3f4b0a19a4 | error | - | 1 | tripleo | false | | | d8d4c0bc-f382-4c96-8364-29748410ef50 | error | - | 1 | tripleo | false | | | fedac94e-7811-4d86-9f63-d6c389166717 | error | - | 1 | tripleo | false | | +--------------------------------------+-----------+------+------+-------------+----------+-------------+ This doesn't look good even before we switch Cinder to NFS. I'm guessing from c-vol log oslo is overloaded probably. Out of 9 volumes only 3 are available, this was tested twice same results. attached logs only show last attempt. Add more resources to controller and retry?
Rajat, See above, couldn't set needinfo when attaching logs.
Hi Tzach, I haven't made any changes to the LVM code in glance store and AFAIR it was mentioned that this feature was tested with LVM + iSCSI configuration before. Also when i looked into the logs, the error i see are from glance side. Can you attach glance logs as well? Regarding the testing part: 1) The main operation should be to create multiple images simultaneously, that would ensure the volumes are created and images are properly written on them. 2) Next i would spawn multiple VMs from the image to check if we're able to access and read the image inside the volume. Creating bootable volume from an image seems like a valid usecase. But, I don't think cloning the volume or using the volume to spawn a VM is a good idea because only the image is stored inside the volume and not that the volume has become bootable. Thanks Rajat Dhasmana
Rajat (thanks) pointed out the fact that we have a Glance dependency: https://bugzilla.redhat.com/show_bug.cgi?id=1807123 (MODIFIED) Depends on field was hidden didn't notice it before. Waiting for Glance to reach ON_QA to verify both.
We need a new hotfix for the customer.
@Rajat: if the second option ("enabling cinder image cache") is "perfect for the customer usecase and works as expected", is there any way for us to ship that to them in z12?
Hi Cyril, I'm trying to get the backports merged as fast as possible and as soon as it merges in queen, i will create a quick build of it but there's no ETA for upstream merging. Also it will require QE testing (which should be pretty straightforward) but i'm unable to comment on it as i'm not aware of the current z12 schedule.
Another upstream patch is required; it's merged on master and is currently being backported.
Let me ask the customer... it looks like the case was closed.
Verified on: openstack-cinder-12.0.10-19.el7ost.noarch python2-glance-store-0.23.1-0.20190916165252.cc7ecc1.el7ost.noarch On a system with OSP13, Glance uses Cinder as it's backend. Cinder uses Netapp NFS as it's backend. Image_volume_cache is enabled on Cinder.conf first test simultaneously create 10 volumes from same image: #for i in {1..10}; do cinder create 1 --image cirros ; done Resulting 10 volumes all in available state as expected: (overcloud) [stack@undercloud-0 ~]$ cinder list +--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+-------------+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+-------------+ | 11a62820-6745-4876-ab32-71363e42e353 | available | - | 1 | tripleo | true | | | 2a46f3c7-fa9f-4b72-9dee-4bd4dea7a631 | available | - | 1 | tripleo | true | | | 3c5b8cfb-75d5-4c09-98b6-5ecfed4c1972 | available | - | 1 | tripleo | true | | | 4e68f832-bce2-43e9-a547-8d29242e1512 | available | - | 1 | tripleo | true | | | 5eff40e0-5eda-407b-b084-63604d6c521d | available | - | 1 | tripleo | true | | | 6c58d022-0ee6-4f64-9bcd-d744612c6d78 | available | - | 1 | tripleo | true | | | 7eb389b6-ff05-4c83-a6f3-c3f7cebc5274 | available | - | 1 | tripleo | true | | | aebf32e4-0855-494e-9ae9-fd9aa054d6ff | available | - | 1 | tripleo | true | | | c7714e66-5a91-45da-a531-d4097f13081b | available | - | 1 | tripleo | true | | | ccd6fd5c-4738-4363-9364-a680531abadf | available | - | 1 | tripleo | true | | | edfb3cdb-c608-4b0f-a6ac-0a964772b16f | available | image-1e30b583-a24d-45ea-a9b8-a6f7699f605a | 1 | tripleo | false | | +--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+-------------+ Worked as expected, all volumes are available. Delete all volumes: #for i in $(cinder list | grep true | awk '{print $2}'); do cinder delete $i; done Now lets simulate the original issue, simultaneously booting up 3 instances. We use boot from image create volume, thus multiple volumes are created from same image. #nova boot --flavor tiny --block-device source=image,id=1e30b583-a24d-45ea-a9b8-a6f7699f605a,dest=volume,size=1,shutdown=remove,bootindex=0 ints1 --nic net-id=af6fedd3-2926-4aaf-bcb2-8b1296bb02db --min-count 3 After a while: (overcloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+---------+--------+----------------------+-------------+-----------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+---------+--------+----------------------+-------------+-----------------------+ | 16547b6e-fecd-4e88-b9b3-506b6b79cb13 | ints1-1 | BUILD | block_device_mapping | NOSTATE | internal=192.168.0.14 | | 8641cb2b-0d48-4eec-914c-c0f1a54b4e2f | ints1-2 | BUILD | block_device_mapping | NOSTATE | internal=192.168.0.19 | | e5f0e8a4-3c94-450d-9059-56d493927e62 | ints1-3 | BUILD | block_device_mapping | NOSTATE | internal=192.168.0.25 | +--------------------------------------+---------+--------+----------------------+-------------+-----------------------+ After a few more seconds: (overcloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+---------+--------+------------+-------------+-----------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+---------+--------+------------+-------------+-----------------------+ | 16547b6e-fecd-4e88-b9b3-506b6b79cb13 | ints1-1 | ACTIVE | - | Running | internal=192.168.0.14 | | 8641cb2b-0d48-4eec-914c-c0f1a54b4e2f | ints1-2 | ACTIVE | - | Running | internal=192.168.0.19 | | e5f0e8a4-3c94-450d-9059-56d493927e62 | ints1-3 | ACTIVE | - | Running | internal=192.168.0.25 | +--------------------------------------+---------+--------+------------+-------------+-----------------------+ (overcloud) [stack@undercloud-0 ~]$ cinder list +--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+--------------------------------------+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+--------------------------------------+ | 7358f758-c242-44ac-8dd8-13a304c2b016 | in-use | | 1 | tripleo | true | 8641cb2b-0d48-4eec-914c-c0f1a54b4e2f | | dc943647-c9c0-436d-b15a-ddceec0bd9f6 | in-use | | 1 | tripleo | true | 16547b6e-fecd-4e88-b9b3-506b6b79cb13 | | edfb3cdb-c608-4b0f-a6ac-0a964772b16f | available | image-1e30b583-a24d-45ea-a9b8-a6f7699f605a | 1 | tripleo | false | | | fb96d987-69ae-4031-97de-6827f261c117 | in-use | | 1 | tripleo | true | e5f0e8a4-3c94-450d-9059-56d493927e62 | +--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+--------------------------------------+ We were able to boot up 3 instances from same image and create 3 boot volumes. What failed before is now working, albeit in a small scale. Again due to limited compute resources I couldn't bump up the instance count considerably. Reducing Nova flavor's RAM usage I was able to boot up 6 instances/boot volumes simultaneously: (overcloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+---------+--------+------------+-------------+-----------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+---------+--------+------------+-------------+-----------------------+ | 25fde923-c9f4-4e23-b96a-99f61d95dae3 | ints1-1 | ACTIVE | - | Running | internal=192.168.0.18 | | 0069eb6d-98b2-4502-a0f5-7cefc9417db5 | ints1-2 | ACTIVE | - | Running | internal=192.168.0.21 | | cfcb602c-70ac-49c6-9fbc-d112c9e3971d | ints1-3 | ACTIVE | - | Running | internal=192.168.0.26 | | 9d199d28-bdf6-4b83-8be3-2cfaaacd722d | ints1-4 | ACTIVE | - | Running | internal=192.168.0.16 | | adab78d8-58bd-4405-ae7c-a8ab2521b96f | ints1-5 | ACTIVE | - | Running | internal=192.168.0.14 | | 9990979c-0fba-4297-aa44-98f138fdc682 | ints1-6 | ACTIVE | - | Running | internal=192.168.0.27 | +--------------------------------------+---------+--------+------------+-------------+-----------------------+ (overcloud) [stack@undercloud-0 ~]$ cinder list +--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+--------------------------------------+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+--------------------------------------+ | 24c4ba7f-0511-4c2b-b614-ee4a2f867abc | in-use | | 1 | tripleo | true | 25fde923-c9f4-4e23-b96a-99f61d95dae3 | | 3eb59a2e-cb81-4f11-bdcb-c8a4ae994689 | in-use | | 1 | tripleo | true | 0069eb6d-98b2-4502-a0f5-7cefc9417db5 | | 40ca43d9-ee31-4f98-b718-3f8dadec06e0 | in-use | | 1 | tripleo | true | cfcb602c-70ac-49c6-9fbc-d112c9e3971d | | 8ac016bb-68d6-4812-b172-6e53fe457923 | in-use | | 1 | tripleo | true | adab78d8-58bd-4405-ae7c-a8ab2521b96f | | c14603be-9985-42f5-b901-2e8074679c7d | in-use | | 1 | tripleo | true | 9d199d28-bdf6-4b83-8be3-2cfaaacd722d | | edfb3cdb-c608-4b0f-a6ac-0a964772b16f | available | image-1e30b583-a24d-45ea-a9b8-a6f7699f605a | 1 | tripleo | false | | | f1c29bf5-30d1-42e6-808a-306fc5b168a0 | in-use | | 1 | tripleo | true | 9990979c-0fba-4297-aa44-98f138fdc682 | +--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+--------------------------------------+ I'd also tested another scenario, simultaneously create nova snapshot of all the instances. $ for i in $(nova list | grep Running | awk '{print $2}'); do nova image-create $i snap-$i;done $ glance image-list +--------------------------------------+-------------------------------------------+ | ID | Name | +--------------------------------------+-------------------------------------------+ | 1e30b583-a24d-45ea-a9b8-a6f7699f605a | cirros | | 8810456d-4876-4bd2-aba7-0dd5eaf7e98e | snap-0069eb6d-98b2-4502-a0f5-7cefc9417db5 | | b141d4d8-190b-4dac-be66-94447921aaf5 | snap-25fde923-c9f4-4e23-b96a-99f61d95dae3 | | f08c0411-90fd-4274-9b69-861cb32644b0 | snap-9990979c-0fba-4297-aa44-98f138fdc682 | | 0e0b6369-8c25-4fdd-83c4-b72df9d756ce | snap-9d199d28-bdf6-4b83-8be3-2cfaaacd722d | | 5e3db635-5724-47f1-912c-ba5e394d73dc | snap-adab78d8-58bd-4405-ae7c-a8ab2521b96f | | 898160bc-6bf8-49c6-b83d-0c2f6426a2e9 | snap-cfcb602c-70ac-49c6-9fbc-d112c9e3971d | +--------------------------------------+-------------------------------------------+ cinder snapshot-list +--------------------------------------+--------------------------------------+-----------+--------------------------------------------------------+------+ | ID | Volume ID | Status | Name | Size | +--------------------------------------+--------------------------------------+-----------+--------------------------------------------------------+------+ | 0ba24d77-56d4-44ad-aba0-af2cc7525636 | 40ca43d9-ee31-4f98-b718-3f8dadec06e0 | available | snapshot for snap-cfcb602c-70ac-49c6-9fbc-d112c9e3971d | 1 | | 37afbe74-188b-4dcd-8843-13b5b9179627 | 8ac016bb-68d6-4812-b172-6e53fe457923 | available | snapshot for snap-adab78d8-58bd-4405-ae7c-a8ab2521b96f | 1 | | 8ba35672-fb7a-4dce-a1bb-e30c2b5d886d | c14603be-9985-42f5-b901-2e8074679c7d | available | snapshot for snap-9d199d28-bdf6-4b83-8be3-2cfaaacd722d | 1 | | 8dacc788-ee48-4f98-9e6d-e147d65b03df | f1c29bf5-30d1-42e6-808a-306fc5b168a0 | available | snapshot for snap-9990979c-0fba-4297-aa44-98f138fdc682 | 1 | | 939ba84a-5e7a-4687-a44a-e7b795fc563f | 24c4ba7f-0511-4c2b-b614-ee4a2f867abc | available | snapshot for snap-25fde923-c9f4-4e23-b96a-99f61d95dae3 | 1 | | e48b6edb-31e0-4ef8-9746-b6a881c33747 | 3eb59a2e-cb81-4f11-bdcb-c8a4ae994689 | available | snapshot for snap-0069eb6d-98b2-4502-a0f5-7cefc9417db5 | 1 | +--------------------------------------+--------------------------------------+-----------+--------------------------------------------------------+------+ This too works as expected. Confirm simultaneously booting multiple instance from an image, creating persistent boot volumes, where Glance is backed my Cinder and Cinder is backed by Netapp NFS backend works as expected.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: openstack-cinder security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:4391