Bug 1741730

Summary: Concurrent cloning of the same volume fails with cinder NFS driver when using cinder backed glance images
Product: Red Hat OpenStack Reporter: chrisbro <chrisbro>
Component: openstack-cinderAssignee: Rajat Dhasmana <rdhasman>
Status: CLOSED ERRATA QA Contact: Tzach Shefi <tshefi>
Severity: high Docs Contact: Chuck Copello <ccopello>
Priority: high    
Version: 13.0 (Queens)CC: abishop, apevec, cinder-bugs, cyril, dhill, dmaley, eharney, ekuvaja, gfidente, jobernar, jschluet, jvisser, lhh, ltoscano, mgarciac, pgrist, rdhasman, rlondhe, senrique, ssigwald, tkajinam
Target Milestone: z13Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-cinder-12.0.10-19.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-28 18:22:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1807123, 1843088, 1844532    
Bug Blocks:    
Attachments:
Description Flags
Cinder logs none

Description chrisbro@redhat.com 2019-08-16 01:22:35 UTC
Description of problem:

When attempting to create multiple VMs from this image via the Horizon UI, only one will be successful, because the same qemu image name is being used for both the source and destination copy. The cinder volume is backed by NFS

Version-Release number of selected component (if applicable):


How reproducible:



Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Cu uses cinder backend in glance, and see error caused by the issue when creating multiple volumes from the same image.

Comment 2 Takashi Kajinami 2019-08-16 01:40:25 UTC
Let me add some more information about reproducibility.

---
Version-Release number of selected component (if applicable):
RHOSP13z7
openstack-cinder-volume:13.0-79

How reproducible:
Always

Steps to Reproduce:
1. Create a source cinder volume
2. Create a multiple cloned volumes from the source volume

Actual results:
Some of the volume creations fails with ERROR status

Expected results:
All volumes are created without any errors.
---

Comment 4 David Hill 2019-08-19 20:06:01 UTC
The problem is that we have two concurrent process using the same snapshot name => tmp-snap-c13874e6-fad2-4d6a-a515-51b957f09786

2019-08-15 21:30:33.281 78 INFO cinder.volume.drivers.remotefs [req-a38eb5e0-a52b-458c-a299-d7b59a454995 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Cloning volume c13874e6-fad2-4d6a-a515-51b957f09
786 to volume 22be6bbd-9e5c-4a66-b769-e812137cf491
...
2019-08-15 21:30:33.283 78 DEBUG cinder.volume.drivers.remotefs [req-a38eb5e0-a52b-458c-a299-d7b59a454995 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Creating offline snapshot tmp-snap-c13874e6-fad2-4d6a-a515-51b957f09786 of volume c13874e6-fad2-4d6a-a515-51b957f09786 _create_snapshot /usr/lib/python2.7/site-packages/cinder/volume/drivers/remotefs.py:1386
...
2019-08-15 21:30:33.285 78 DEBUG oslo_concurrency.processutils [req-a38eb5e0-a52b-458c-a299-d7b59a454995 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Running cmd (subprocess): /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=8 -- env LC_ALL=C qemu-img info --force-share /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786 execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372

...
2019-08-15 21:30:35.667 78 DEBUG cinder.volume.drivers.nfs [req-a38eb5e0-a52b-458c-a299-d7b59a454995 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Copying snapshot: tmp-snap-c13874e6-fad2-4d6a-a515-51b957f09786 -> volume: 22be6bbd-9e5c-4a66-b769-e812137cf491, volume_size: 20 GB _copy_volume_from_snapshot /usr/lib/python2.7/site-packages/cinder/volume/drivers/nfs.py:586

Comment 5 David Hill 2019-08-19 20:19:13 UTC
Perhaps this would be sufficient:

[dhill@knox cinder]$ git diff
diff --git a/cinder/volume/drivers/remotefs.py b/cinder/volume/drivers/remotefs.py
index 196117827..40bff414a 100644
--- a/cinder/volume/drivers/remotefs.py
+++ b/cinder/volume/drivers/remotefs.py
@@ -941,7 +941,7 @@ class RemoteFSSnapDriverBase(RemoteFSDriver, driver.SnapshotVD):
                                  volume_size=src_vref.size,
                                  name='clone-snap-%s' % src_vref.id,
                                  volume_id=src_vref.id,
-                                 id='tmp-snap-%s' % src_vref.id,
+                                 id='tmp-snap-%s' % volume.id,
                                  volume=src_vref)
 
         self._create_snapshot(temp_snapshot)


Where instead of using the src_vref.id, we'd be using the destination volume.id in the snapshot name.

Comment 6 Alan Bishop 2019-08-20 13:30:38 UTC
@David, thanks for working on this! As Gandalf might say, the cinder squad will help you bear this burden.

Comment 7 David Hill 2019-08-20 18:56:22 UTC
We got an issue with concurrency .   The patch works but something else breaks:

2019-08-20 15:20:45.974 49 DEBUG oslo_concurrency.processutils [req-11b89aec-5a0d-4b5b-8d87-e90cb8adfd78 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Running cmd (subprocess): qemu-img commit -d /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372
2019-08-20 15:20:47.055 49 DEBUG oslo_concurrency.processutils [req-11b89aec-5a0d-4b5b-8d87-e90cb8adfd78 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] CMD "qemu-img commit -d /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a" returned: 1 in 1.081s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409
2019-08-20 15:20:47.056 49 DEBUG oslo_concurrency.processutils [req-11b89aec-5a0d-4b5b-8d87-e90cb8adfd78 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] u'qemu-img commit -d /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a' failed. Not Retrying. execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:457
Command: qemu-img commit -d /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a
2019-08-20 15:20:47.066 49 ERROR cinder.volume.manager Command: qemu-img commit -d /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a
Command: qemu-img commit -d /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a
Command: qemu-img commit -d /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a
2019-08-20 15:20:48.732 49 ERROR oslo_messaging.rpc.server Command: qemu-img commit -d /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a


And until we create that file, we're suck in that loop and no other volumes can be created ...  Even restarting cinder-volume doesn't solve this issue.  I've got the cinder-volume.log for this failure .

Comment 8 David Hill 2019-08-20 19:03:40 UTC
This is as if the Snapshot object wouldn't initialized for each call and some older data would be used.

Wy is it copying 811fe7bc-c8e7-49ba-84c6-3af4c595118a to 319dcfba-8cae-4411-88ec-a8af191c69ef ?
2019-08-20 15:20:12.639 78 DEBUG oslo_concurrency.processutils [req-ae5ddb1e-6cd1-4732-a3eb-0e9d79ac6e9d 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Running cmd (subprocess): qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-319dcfba-8cae-4411-88ec-a8af191c69ef execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372

tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a should be copied to 811fe7bc-c8e7-49ba-84c6-3af4c595118a only.

Comment 9 David Hill 2019-08-20 19:06:47 UTC
Maybe this is a nova bug with max_count and block device mapping ?

Comment 10 David Hill 2019-08-20 19:19:24 UTC
We created 5 VMs and we get this:

2019-08-20 15:19:55.185 49 DEBUG oslo_concurrency.processutils [req-67104a80-91d4-4cd1-b606-b6e86bc8cc84 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Running cmd (subprocess): qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-37839d77-df02-4119-ae33-ea1c55882dbc /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-7acf3974-eeb5-451d-a855-f3ad16264bd1 execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372
2019-08-20 15:19:57.672 78 DEBUG oslo_concurrency.processutils [req-79e3c5cf-b584-4dd4-a8dd-820ca7cc058f 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Running cmd (subprocess): qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-37839d77-df02-4119-ae33-ea1c55882dbc /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-a74cd576-847d-4462-b128-802d0d322a88 execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372
2019-08-20 15:20:00.490 78 DEBUG oslo_concurrency.processutils [req-c258d369-687f-4649-a754-d0c54dd3ee6c 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Running cmd (subprocess): qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-37839d77-df02-4119-ae33-ea1c55882dbc /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-cc3726ff-1a9a-4e02-9e87-1bc928a2f7c1 execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372
2019-08-20 15:20:02.618 49 DEBUG oslo_concurrency.processutils [req-962f0cb1-e306-49e3-a48d-74aea04798d0 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Running cmd (subprocess): qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-7acf3974-eeb5-451d-a855-f3ad16264bd1 /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-30341363-768d-4cf2-ba6e-88acd1352bd2 execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372
2019-08-20 15:20:03.747 49 DEBUG oslo_concurrency.processutils [req-79ea6db6-8192-4454-ba6d-6b436c82c2b2 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Running cmd (subprocess): qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-a74cd576-847d-4462-b128-802d0d322a88 /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-d253b8aa-cd8e-4446-a6e8-6598f33fbc22 execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372
2019-08-20 15:20:05.577 49 DEBUG oslo_concurrency.processutils [req-11b89aec-5a0d-4b5b-8d87-e90cb8adfd78 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Running cmd (subprocess): qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-a74cd576-847d-4462-b128-802d0d322a88 /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-811fe7bc-c8e7-49ba-84c6-3af4c595118a execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372
2019-08-20 15:20:10.881 78 DEBUG oslo_concurrency.processutils [req-96cdf880-045b-4509-9da9-ad36cd20a0a8 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Running cmd (subprocess): qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-d253b8aa-cd8e-4446-a6e8-6598f33fbc22 /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-57adf8e7-848e-48ac-b445-2af6f4ab15d4 execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372
2019-08-20 15:20:12.639 78 DEBUG oslo_concurrency.processutils [req-ae5ddb1e-6cd1-4732-a3eb-0e9d79ac6e9d 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Running cmd (subprocess): qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-319dcfba-8cae-4411-88ec-a8af191c69ef execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372
2019-08-20 15:20:35.014 78 DEBUG oslo_concurrency.processutils [req-79e3c5cf-b584-4dd4-a8dd-820ca7cc058f 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] CMD "qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-37839d77-df02-4119-ae33-ea1c55882dbc /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-a74cd576-847d-4462-b128-802d0d322a88" returned: 0 in 37.343s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409
2019-08-20 15:20:35.141 49 DEBUG oslo_concurrency.processutils [req-11b89aec-5a0d-4b5b-8d87-e90cb8adfd78 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] CMD "qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-a74cd576-847d-4462-b128-802d0d322a88 /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-811fe7bc-c8e7-49ba-84c6-3af4c595118a" returned: 0 in 29.564s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409
2019-08-20 15:20:35.142 49 DEBUG oslo_concurrency.processutils [req-67104a80-91d4-4cd1-b606-b6e86bc8cc84 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] CMD "qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-37839d77-df02-4119-ae33-ea1c55882dbc /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-7acf3974-eeb5-451d-a855-f3ad16264bd1" returned: 0 in 39.957s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409
2019-08-20 15:20:35.143 49 DEBUG oslo_concurrency.processutils [req-79ea6db6-8192-4454-ba6d-6b436c82c2b2 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] CMD "qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-a74cd576-847d-4462-b128-802d0d322a88 /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-d253b8aa-cd8e-4446-a6e8-6598f33fbc22" returned: 0 in 31.396s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409
2019-08-20 15:20:35.143 49 DEBUG oslo_concurrency.processutils [req-962f0cb1-e306-49e3-a48d-74aea04798d0 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] CMD "qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-7acf3974-eeb5-451d-a855-f3ad16264bd1 /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-30341363-768d-4cf2-ba6e-88acd1352bd2" returned: 0 in 32.525s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409
2019-08-20 15:20:35.554 78 DEBUG oslo_concurrency.processutils [req-c258d369-687f-4649-a754-d0c54dd3ee6c 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] CMD "qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-37839d77-df02-4119-ae33-ea1c55882dbc /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-cc3726ff-1a9a-4e02-9e87-1bc928a2f7c1" returned: 0 in 35.064s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409
2019-08-20 15:20:35.556 78 DEBUG oslo_concurrency.processutils [req-96cdf880-045b-4509-9da9-ad36cd20a0a8 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] CMD "qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-d253b8aa-cd8e-4446-a6e8-6598f33fbc22 /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-57adf8e7-848e-48ac-b445-2af6f4ab15d4" returned: 0 in 24.675s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409
2019-08-20 15:20:35.557 78 DEBUG oslo_concurrency.processutils [req-ae5ddb1e-6cd1-4732-a3eb-0e9d79ac6e9d 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] CMD "qemu-img convert -O raw /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-c13874e6-fad2-4d6a-a515-51b957f09786.tmp-snap-811fe7bc-c8e7-49ba-84c6-3af4c595118a /var/lib/cinder/mnt/3e906e502cb6342e08eff1e805bf1d89/volume-319dcfba-8cae-4411-88ec-a8af191c69ef" returned: 0 in 22.918s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409

Comment 16 Alan Bishop 2019-10-01 20:24:34 UTC
Reassigning to Eric, who is our NFS driver expert.

Comment 25 David Hill 2019-11-27 18:53:09 UTC
Customer hit this new issue after applying the hotfix :

<</usr/lib/python2.7/site-packages/cinder/volume/drivers/remotefs.py>>

1662     def create_cloned_volume(self, volume, src_vref):
   1663         """Creates a clone of the specified volume."""
   1664
   1665         return self._create_cloned_volume(volume, src_vref,
   1666                                           src_vref.obj_context)
   1667
   1668     @coordination.synchronized('{self.driver_prefix}-{volume.id}')


---Cinder-Volume Error ---



2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server [req-fce84ebf-1a56-4739-bc1b-314e8339c53c 2708e4b6a36d4c61b96363e07d7f87e0 1445bbc727c54d1bab30d308c340eded - default default] Exception during message handling: AttributeError: 'Volume' object has no attribute 'obj_context'
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server   File "<string>", line 2, in create_volume
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/cinder/objects/cleanable.py", line 207, in wrapper
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server     result = f(*args, **kwargs)
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 690, in create_volume
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server     _run_flow()
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 682, in _run_flow
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server     flow_engine.run()
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/engine.py", line 247, in run
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server     for _state in self.run_iter(timeout=timeout):
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/engine.py", line 340, in run_iter
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server     failure.Failure.reraise_if_any(er_failures)
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/taskflow/types/failure.py", line 336, in reraise_if_any
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server     failures[0].reraise()
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/taskflow/types/failure.py", line 343, in reraise
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server     six.reraise(*self._exc_info)
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server     result = task.execute(**arguments)
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 1028, in execute
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server     **volume_spec)
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 912, in _create_from_image
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server     image_meta)
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 642, in _clone_image_volume
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server     ret = self.driver.create_cloned_volume(volume, image_volume)
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server   File "<string>", line 2, in create_cloned_volume
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/cinder/coordination.py", line 151, in _synchronized
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server     return f(*a, **k)
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server   File "<string>", line 2, in create_cloned_volume
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/cinder/coordination.py", line 151, in _synchronized
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server     return f(*a, **k)
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/remotefs.py", line 1666, in create_cloned_volume
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server     src_vref.obj_context)
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server AttributeError: 'Volume' object has no attribute 'obj_context'
2019-11-27 18:14:32.328 77 ERROR oslo_messaging.rpc.server

Comment 26 David Hill 2019-11-28 14:55:04 UTC
I suspect we're hitting a bug introduced by 8d7ecfb0c9f769923e20452b8027eb117bf2be52 ...

Comment 47 Tzach Shefi 2020-05-12 11:42:28 UTC
Created attachment 1687660 [details]
Cinder logs

Rajat, 


First of all just to be sure of verification steps:

1. Glance using Cinder, Cinder backed by NFS. 

3. Create 5 (or more) volumes simultaneously from same image. 

Or should we boot 5 VMs from an image create a volume? 



One more thing bothers me while setting this up, 
Glance using Cinder is configured and working,
Cinder still uses default LVM, didn't switch to NFS yet.

Out of curiosity figured lets have a go at things.
Issued this command:
(overcloud) [stack@undercloud-0 ~]$ for i in {1..9}; do cinder create 1 --image cirros ; done

And got:
(overcloud) [stack@undercloud-0 ~]$ cinder list
+--------------------------------------+-----------+------+------+-------------+----------+-------------+
| ID                                   | Status    | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+------+------+-------------+----------+-------------+
| 17552814-94de-4d68-840a-372df955df9b | error     | -    | 1    | tripleo     | false    |             |
| 2e6097d9-e07a-4c9b-b5fc-56a10aed83d7 | error     | -    | 1    | tripleo     | false    |             |
| 330e215c-8892-428e-8ebb-6b2a1ce507f1 | error     | -    | 1    | tripleo     | false    |             |
| 4d4347cb-cba6-48f3-8cf4-d82d0df72d69 | available | -    | 1    | tripleo     | true     |             |
| a5b728e4-064c-45bc-85fe-e26f2a057adf | available | -    | 1    | tripleo     | true     |             |
| a63d81df-60d9-4449-ae45-be287b576ebb | available | -    | 1    | tripleo     | true     |             |
| c0cfdcf7-9df3-49f9-bd83-3b3f4b0a19a4 | error     | -    | 1    | tripleo     | false    |             |
| d8d4c0bc-f382-4c96-8364-29748410ef50 | error     | -    | 1    | tripleo     | false    |             |
| fedac94e-7811-4d86-9f63-d6c389166717 | error     | -    | 1    | tripleo     | false    |             |
+--------------------------------------+-----------+------+------+-------------+----------+-------------+


This doesn't look good even before we switch Cinder to NFS. 
I'm guessing from c-vol log oslo is overloaded probably. 
Out of 9 volumes only 3 are available, this was tested twice same results. 
attached logs only show last attempt.

Add more resources to controller and retry?

Comment 48 Tzach Shefi 2020-05-12 11:44:57 UTC
Rajat, 
See above, couldn't set needinfo when attaching logs.

Comment 49 Rajat Dhasmana 2020-05-12 13:21:21 UTC
Hi Tzach,

I haven't made any changes to the LVM code in glance store and AFAIR it was mentioned that this feature was tested with LVM + iSCSI configuration before.

Also when i looked into the logs, the error i see are from glance side.
Can you attach glance logs as well?

Regarding the testing part:

1) The main operation should be to create multiple images simultaneously, that would ensure the volumes are created and images are properly written on them.

2) Next i would spawn multiple VMs from the image to check if we're able to access and read the image inside the volume.

Creating bootable volume from an image seems like a valid usecase.

But, I don't think cloning the volume or using the volume to spawn a VM is a good idea because only the image is stored inside the volume and not that the volume has become bootable.

Thanks
Rajat Dhasmana

Comment 50 Tzach Shefi 2020-05-14 06:26:04 UTC
Rajat (thanks) pointed out the fact that we have a Glance dependency:
https://bugzilla.redhat.com/show_bug.cgi?id=1807123 (MODIFIED)

Depends on field was hidden didn't notice it before.  
Waiting for Glance to reach ON_QA to verify both.

Comment 51 David Hill 2020-05-14 15:20:20 UTC
We need a new hotfix for the customer.

Comment 55 Cyril Roelandt 2020-06-01 17:21:11 UTC
@Rajat: if the second option ("enabling cinder image cache") is "perfect for the customer usecase and works as expected", is there any way for us to ship that to them in z12?

Comment 56 Rajat Dhasmana 2020-06-01 17:34:35 UTC
Hi Cyril,

I'm trying to get the backports merged as fast as possible and as soon as it merges in queen, i will create a quick build of it but there's no ETA for upstream merging. Also it will require QE testing (which should be pretty straightforward) but i'm unable to comment on it as i'm not aware of the current z12 schedule.

Comment 58 Alan Bishop 2020-06-01 20:35:10 UTC
Another upstream patch is required; it's merged on master and is currently being backported.

Comment 63 David Hill 2020-06-03 19:51:49 UTC
Let me ask the customer... it looks like the case was closed.

Comment 79 Tzach Shefi 2020-09-22 12:52:22 UTC
Verified on:
openstack-cinder-12.0.10-19.el7ost.noarch
python2-glance-store-0.23.1-0.20190916165252.cc7ecc1.el7ost.noarch

On a system with OSP13, Glance uses Cinder as it's backend. 
Cinder uses Netapp NFS as it's backend. 
Image_volume_cache is enabled on Cinder.conf 


first test simultaneously create 10 volumes from same image:
#for i in {1..10}; do cinder create 1 --image cirros ; done

Resulting 10 volumes all in available state as expected:
(overcloud) [stack@undercloud-0 ~]$ cinder list
+--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+-------------+
| ID                                   | Status    | Name                                       | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+-------------+
| 11a62820-6745-4876-ab32-71363e42e353 | available | -                                          | 1    | tripleo     | true     |             |
| 2a46f3c7-fa9f-4b72-9dee-4bd4dea7a631 | available | -                                          | 1    | tripleo     | true     |             |
| 3c5b8cfb-75d5-4c09-98b6-5ecfed4c1972 | available | -                                          | 1    | tripleo     | true     |             |
| 4e68f832-bce2-43e9-a547-8d29242e1512 | available | -                                          | 1    | tripleo     | true     |             |
| 5eff40e0-5eda-407b-b084-63604d6c521d | available | -                                          | 1    | tripleo     | true     |             |
| 6c58d022-0ee6-4f64-9bcd-d744612c6d78 | available | -                                          | 1    | tripleo     | true     |             |
| 7eb389b6-ff05-4c83-a6f3-c3f7cebc5274 | available | -                                          | 1    | tripleo     | true     |             |
| aebf32e4-0855-494e-9ae9-fd9aa054d6ff | available | -                                          | 1    | tripleo     | true     |             |
| c7714e66-5a91-45da-a531-d4097f13081b | available | -                                          | 1    | tripleo     | true     |             |
| ccd6fd5c-4738-4363-9364-a680531abadf | available | -                                          | 1    | tripleo     | true     |             |
| edfb3cdb-c608-4b0f-a6ac-0a964772b16f | available | image-1e30b583-a24d-45ea-a9b8-a6f7699f605a | 1    | tripleo     | false    |             |
+--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+-------------+

Worked as expected, all volumes are available.
Delete all volumes:
#for i in $(cinder list | grep true | awk '{print $2}'); do  cinder delete $i; done


Now lets simulate the original issue, simultaneously booting up 3 instances.
We use boot from image create volume, thus multiple volumes are created from same image. 
#nova boot --flavor tiny --block-device source=image,id=1e30b583-a24d-45ea-a9b8-a6f7699f605a,dest=volume,size=1,shutdown=remove,bootindex=0 ints1  --nic net-id=af6fedd3-2926-4aaf-bcb2-8b1296bb02db --min-count 3

After a while:
(overcloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+---------+--------+----------------------+-------------+-----------------------+
| ID                                   | Name    | Status | Task State           | Power State | Networks              |
+--------------------------------------+---------+--------+----------------------+-------------+-----------------------+
| 16547b6e-fecd-4e88-b9b3-506b6b79cb13 | ints1-1 | BUILD  | block_device_mapping | NOSTATE     | internal=192.168.0.14 |
| 8641cb2b-0d48-4eec-914c-c0f1a54b4e2f | ints1-2 | BUILD  | block_device_mapping | NOSTATE     | internal=192.168.0.19 |
| e5f0e8a4-3c94-450d-9059-56d493927e62 | ints1-3 | BUILD  | block_device_mapping | NOSTATE     | internal=192.168.0.25 |
+--------------------------------------+---------+--------+----------------------+-------------+-----------------------+

After a few more seconds:
(overcloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+---------+--------+------------+-------------+-----------------------+
| ID                                   | Name    | Status | Task State | Power State | Networks              |
+--------------------------------------+---------+--------+------------+-------------+-----------------------+
| 16547b6e-fecd-4e88-b9b3-506b6b79cb13 | ints1-1 | ACTIVE | -          | Running     | internal=192.168.0.14 |
| 8641cb2b-0d48-4eec-914c-c0f1a54b4e2f | ints1-2 | ACTIVE | -          | Running     | internal=192.168.0.19 |
| e5f0e8a4-3c94-450d-9059-56d493927e62 | ints1-3 | ACTIVE | -          | Running     | internal=192.168.0.25 |
+--------------------------------------+---------+--------+------------+-------------+-----------------------+


(overcloud) [stack@undercloud-0 ~]$ cinder list
+--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+--------------------------------------+
| ID                                   | Status    | Name                                       | Size | Volume Type | Bootable | Attached to                          |
+--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+--------------------------------------+
| 7358f758-c242-44ac-8dd8-13a304c2b016 | in-use    |                                            | 1    | tripleo     | true     | 8641cb2b-0d48-4eec-914c-c0f1a54b4e2f |
| dc943647-c9c0-436d-b15a-ddceec0bd9f6 | in-use    |                                            | 1    | tripleo     | true     | 16547b6e-fecd-4e88-b9b3-506b6b79cb13 |
| edfb3cdb-c608-4b0f-a6ac-0a964772b16f | available | image-1e30b583-a24d-45ea-a9b8-a6f7699f605a | 1    | tripleo     | false    |                                      |
| fb96d987-69ae-4031-97de-6827f261c117 | in-use    |                                            | 1    | tripleo     | true     | e5f0e8a4-3c94-450d-9059-56d493927e62 |
+--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+--------------------------------------+

We were able to boot up 3 instances from same image and create 3 boot volumes. 
What failed before is now working, albeit in a small scale. 

Again due to limited compute resources I couldn't bump up the instance count considerably. 
Reducing Nova flavor's RAM usage I was able to boot up 6 instances/boot volumes simultaneously:

(overcloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+---------+--------+------------+-------------+-----------------------+
| ID                                   | Name    | Status | Task State | Power State | Networks              |
+--------------------------------------+---------+--------+------------+-------------+-----------------------+
| 25fde923-c9f4-4e23-b96a-99f61d95dae3 | ints1-1 | ACTIVE | -          | Running     | internal=192.168.0.18 |
| 0069eb6d-98b2-4502-a0f5-7cefc9417db5 | ints1-2 | ACTIVE | -          | Running     | internal=192.168.0.21 |
| cfcb602c-70ac-49c6-9fbc-d112c9e3971d | ints1-3 | ACTIVE | -          | Running     | internal=192.168.0.26 |
| 9d199d28-bdf6-4b83-8be3-2cfaaacd722d | ints1-4 | ACTIVE | -          | Running     | internal=192.168.0.16 |
| adab78d8-58bd-4405-ae7c-a8ab2521b96f | ints1-5 | ACTIVE | -          | Running     | internal=192.168.0.14 |
| 9990979c-0fba-4297-aa44-98f138fdc682 | ints1-6 | ACTIVE | -          | Running     | internal=192.168.0.27 |
+--------------------------------------+---------+--------+------------+-------------+-----------------------+
(overcloud) [stack@undercloud-0 ~]$ cinder list
+--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+--------------------------------------+
| ID                                   | Status    | Name                                       | Size | Volume Type | Bootable | Attached to                          |
+--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+--------------------------------------+
| 24c4ba7f-0511-4c2b-b614-ee4a2f867abc | in-use    |                                            | 1    | tripleo     | true     | 25fde923-c9f4-4e23-b96a-99f61d95dae3 |
| 3eb59a2e-cb81-4f11-bdcb-c8a4ae994689 | in-use    |                                            | 1    | tripleo     | true     | 0069eb6d-98b2-4502-a0f5-7cefc9417db5 |
| 40ca43d9-ee31-4f98-b718-3f8dadec06e0 | in-use    |                                            | 1    | tripleo     | true     | cfcb602c-70ac-49c6-9fbc-d112c9e3971d |
| 8ac016bb-68d6-4812-b172-6e53fe457923 | in-use    |                                            | 1    | tripleo     | true     | adab78d8-58bd-4405-ae7c-a8ab2521b96f |
| c14603be-9985-42f5-b901-2e8074679c7d | in-use    |                                            | 1    | tripleo     | true     | 9d199d28-bdf6-4b83-8be3-2cfaaacd722d |
| edfb3cdb-c608-4b0f-a6ac-0a964772b16f | available | image-1e30b583-a24d-45ea-a9b8-a6f7699f605a | 1    | tripleo     | false    |                                      |
| f1c29bf5-30d1-42e6-808a-306fc5b168a0 | in-use    |                                            | 1    | tripleo     | true     | 9990979c-0fba-4297-aa44-98f138fdc682 |
+--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+--------------------------------------+


I'd also tested another scenario, simultaneously create nova snapshot of all the instances. 

$ for i in $(nova list | grep Running  | awk '{print $2}'); do nova image-create  $i snap-$i;done
$ glance image-list
+--------------------------------------+-------------------------------------------+
| ID                                   | Name                                      |
+--------------------------------------+-------------------------------------------+
| 1e30b583-a24d-45ea-a9b8-a6f7699f605a | cirros                                    |
| 8810456d-4876-4bd2-aba7-0dd5eaf7e98e | snap-0069eb6d-98b2-4502-a0f5-7cefc9417db5 |
| b141d4d8-190b-4dac-be66-94447921aaf5 | snap-25fde923-c9f4-4e23-b96a-99f61d95dae3 |
| f08c0411-90fd-4274-9b69-861cb32644b0 | snap-9990979c-0fba-4297-aa44-98f138fdc682 |
| 0e0b6369-8c25-4fdd-83c4-b72df9d756ce | snap-9d199d28-bdf6-4b83-8be3-2cfaaacd722d |
| 5e3db635-5724-47f1-912c-ba5e394d73dc | snap-adab78d8-58bd-4405-ae7c-a8ab2521b96f |
| 898160bc-6bf8-49c6-b83d-0c2f6426a2e9 | snap-cfcb602c-70ac-49c6-9fbc-d112c9e3971d |
+--------------------------------------+-------------------------------------------+

cinder snapshot-list
+--------------------------------------+--------------------------------------+-----------+--------------------------------------------------------+------+
| ID                                   | Volume ID                            | Status    | Name                                                   | Size |
+--------------------------------------+--------------------------------------+-----------+--------------------------------------------------------+------+
| 0ba24d77-56d4-44ad-aba0-af2cc7525636 | 40ca43d9-ee31-4f98-b718-3f8dadec06e0 | available | snapshot for snap-cfcb602c-70ac-49c6-9fbc-d112c9e3971d | 1    |
| 37afbe74-188b-4dcd-8843-13b5b9179627 | 8ac016bb-68d6-4812-b172-6e53fe457923 | available | snapshot for snap-adab78d8-58bd-4405-ae7c-a8ab2521b96f | 1    |
| 8ba35672-fb7a-4dce-a1bb-e30c2b5d886d | c14603be-9985-42f5-b901-2e8074679c7d | available | snapshot for snap-9d199d28-bdf6-4b83-8be3-2cfaaacd722d | 1    |
| 8dacc788-ee48-4f98-9e6d-e147d65b03df | f1c29bf5-30d1-42e6-808a-306fc5b168a0 | available | snapshot for snap-9990979c-0fba-4297-aa44-98f138fdc682 | 1    |
| 939ba84a-5e7a-4687-a44a-e7b795fc563f | 24c4ba7f-0511-4c2b-b614-ee4a2f867abc | available | snapshot for snap-25fde923-c9f4-4e23-b96a-99f61d95dae3 | 1    |
| e48b6edb-31e0-4ef8-9746-b6a881c33747 | 3eb59a2e-cb81-4f11-bdcb-c8a4ae994689 | available | snapshot for snap-0069eb6d-98b2-4502-a0f5-7cefc9417db5 | 1    |
+--------------------------------------+--------------------------------------+-----------+--------------------------------------------------------+------+
This too works as expected. 

Confirm simultaneously booting multiple instance from an image, creating persistent boot volumes, where Glance is backed my Cinder
and Cinder is backed by Netapp NFS backend works as expected.

Comment 86 errata-xmlrpc 2020-10-28 18:22:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: openstack-cinder security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4391