Bug 1415252
Summary: | [RFE] QEMU image file locking (RHV) | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Ademar Reis <areis> | ||||
Component: | RFEs | Assignee: | Rob Young <royoung> | ||||
Status: | CLOSED WONTFIX | QA Contact: | Raz Tamir <ratamir> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | high | ||||||
Version: | unspecified | CC: | areis, chhu, dfediuck, dyuan, eskultet, famz, hhan, jiyan, jsuchane, lmen, lsurette, meili, michal.skrivanek, mtessun, pkrempa, rbalakri, srevivo, virt-bugs, virt-maint, xuzhang, ykaul | ||||
Target Milestone: | --- | Keywords: | FutureFeature | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Enhancement | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | 1378242 | Environment: | |||||
Last Closed: | 2018-06-18 10:05:14 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1378241, 1378242, 1417306, 1432523 | ||||||
Bug Blocks: | 1415250 | ||||||
Attachments: |
|
Description
Ademar Reis
2017-01-20 16:41:03 UTC
Hi Jarda, could you please add some detail about what needs to be done here (if anything at all). My understanding is that the local locking does not have any negative impact as long as the disk is not shared between different VMs. So specifically: Anything that needs to be set, in case the disk is shared or is this already done? From my PoV nothing is needed, as the "<sharable/>" tag should take care about this in libvirt. Non-Sharable disk: <disk type='block' device='disk' snapshot='no'> <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/> <source dev='<DEVICE>'/> <backingStore/> [...] </disk> Shareable disk: <disk type='block' device='disk' snapshot='no'> <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/> <source dev='<DEVICE>'/> <backingStore/> <shareable/> [...] </disk> If that is already enough to avoid access issues, I don't think anything else is currently needed, especially as this locking feature does only work when the vDisk is accessed from the same host. Thanks, Martin (In reply to Martin Tessun from comment #1) > Hi Jarda, > > could you please add some detail about what needs to be done here (if > anything at all). > My understanding is that the local locking does not have any negative impact > as long as the disk is not shared between different VMs. Well, It has been discussed in qemu-kvm-rhev bz: https://bugzilla.redhat.com/show_bug.cgi?id=1378241#c8 In summary, it is internal qemu mechanism how to prevent concurrent write to the image on a host. While there might be little benefit for RHV as it handles locking of shared storage itself, there is potentially huge benefit for RHEL. GSS and consequently ENG teams have experienced very complicated virtual disk corruptions. Image locking would have had prevented it. > > So specifically: Anything that needs to be set, in case the disk is shared > or is this already done? Libvirt needs to have an option how to enable/disable the locking. > From my PoV nothing is needed, as the "<sharable/>" tag should take care > about this in libvirt. > <shareble/> flag disables locking so the disk can be accessed by multiple vms, for example during live migration. > > Non-Sharable disk: > > <disk type='block' device='disk' snapshot='no'> > <driver name='qemu' type='raw' cache='none' error_policy='stop' > io='native'/> > <source dev='<DEVICE>'/> > <backingStore/> > [...] > </disk> > > > Shareable disk: > > <disk type='block' device='disk' snapshot='no'> > <driver name='qemu' type='raw' cache='none' error_policy='stop' > io='native'/> > <source dev='<DEVICE>'/> > <backingStore/> > <shareable/> > [...] > </disk> > > If that is already enough to avoid access issues, I don't think anything > else is currently needed, especially as this locking feature does only work > when the vDisk is accessed from the same host. > RHV definitely needs to test it and be able to switch it on/off. J. We need tge ability to turn it off, definitely. (In reply to Yaniv Kaul from comment #4) > We need the ability to turn it off, definitely. Do you say that because you expect to disable it in some specific scenario (or even by default), or because you want to have the ability to disable in case something goes wrong in the field? Again: we're introducing it in RHEL-7.4, but disabled by default, so there's nothing to worry about for now. No changes in libvirt or in RHV are expected. We'll be testing it and fixing any issues until it gets flipped to ON by default in RHEL-7.5. (In reply to Ademar Reis from comment #5) > (In reply to Yaniv Kaul from comment #4) > > We need the ability to turn it off, definitely. > > Do you say that because you expect to disable it in some specific scenario > (or even by default), or because you want to have the ability to disable in > case something goes wrong in the field? I don't see a use for it in RHV, and everything we don't use we probably should have disabled by default (especially as I don't see us even switching it on in the future). > > Again: we're introducing it in RHEL-7.4, but disabled by default, so there's > nothing to worry about for now. No changes in libvirt or in RHV are expected. > > We'll be testing it and fixing any issues until it gets flipped to ON by > default in RHEL-7.5. So we should probably switch this RFE to 'turn off qemu image locking' - as we probably want to explicitly turn it off already in 7.4. (In reply to Yaniv Kaul from comment #6) > (In reply to Ademar Reis from comment #5) > > (In reply to Yaniv Kaul from comment #4) > > > We need the ability to turn it off, definitely. > > > > Do you say that because you expect to disable it in some specific scenario > > (or even by default), or because you want to have the ability to disable in > > case something goes wrong in the field? > > I don't see a use for it in RHV, and everything we don't use we probably > should have disabled by default (especially as I don't see us even switching > it on in the future). > It's supposed to be transparent to RHEV. If you disable it, you'll miss the benefit of all the testing we've done, plus the benefit of the locking itself. You would be doing something strongly discouraged by us. And then naturally, the first thing we would ask if/when a corruption gets reported would be "why was image locking disabled", thus shifting the burden of the investigation to RHV. I don't see how you can accomplish anything useful by disabling this lock by default. > > > > Again: we're introducing it in RHEL-7.4, but disabled by default, so there's > > nothing to worry about for now. No changes in libvirt or in RHV are expected. > > > > We'll be testing it and fixing any issues until it gets flipped to ON by > > default in RHEL-7.5. > > So we should probably switch this RFE to 'turn off qemu image locking' - as > we probably want to explicitly turn it off already in 7.4. I don't think libvirt will have the knobs to "explicitly turn it off", given it'll already be turned off. We don't plan any changes in libvirt for this feature in RHEL-7.4. Created attachment 1364630 [details]
vdsm.log
When create the 3rd snapshot in RHV(RHV-4.1.8.2-0.1.el7,libvirt-3.9.0-5.el7.x86_64,qemu-kvm-rhev-2.10.0-11.el7.x86_64,vdsm-4.19.41-1.el7ev.x86_64
), snapshot created failed and got error like following:
2017-12-07 22:30:46,068-0500 ERROR (tasks/1) [storage.Volume] cannot clone image 70cc7ab9-aa5f-40c3-861b-580da60614a7 volume 6f5f65d1-114f-428e-8566-7b14818fe2ab to /rhev/data-center/e2d7693c-2474-4ebe-a77b-a9a6823d46f5/6fdb0d21-3910-4721-9b66-f554b7e06269/images/70cc7ab9-aa5f-40c3-861b-580da60614a7/f5620a36-d901-485f-bd75-6df5b95880b9 (volume:899)
Traceback (most recent call last):
File "/usr/share/vdsm/storage/volume.py", line 895, in clone
backingFormat=sc.fmt2str(self.getFormat()))
File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 146, in create
_run_cmd(cmd, cwd=cwdPath)
File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 426, in _run_cmd
raise QImgError(cmd, rc, out, err)
QImgError: cmd=['/usr/bin/qemu-img', 'create', '-f', 'qcow2', '-o', 'compat=1.1', '-b', u'6f5f65d1-114f-428e-8566-7b14818fe2ab', '-F', 'qcow2', u'/rhev/data-center/e2d7693c-2474-4ebe-a77b-a9a6823d46f5/6fdb0d21-3910-4721-9b66-f554b7e06269/images/70cc7ab9-aa5f-40c3-861b-580da60614a7/f5620a36-d901-485f-bd75-6df5b95880b9'], ecode=1, stdout=, stderr=qemu-img: /rhev/data-center/e2d7693c-2474-4ebe-a77b-a9a6823d46f5/6fdb0d21-3910-4721-9b66-f554b7e06269/images/70cc7ab9-aa5f-40c3-861b-580da60614a7/f5620a36-d901-485f-bd75-6df5b95880b9: Failed to get shared "write" lock
Is another process using the image?
Could not open backing image to determine size.
, message=None
2017-12-07 22:30:46,075-0500 ERROR (tasks/1) [storage.Volume] Unexpected error (volume:1110)
Traceback (most recent call last):
File "/usr/share/vdsm/storage/volume.py", line 1067, in create
initialSize=initialSize)
File "/usr/share/vdsm/storage/fileVolume.py", line 472, in _create
volParent.clone(volPath, volFormat)
File "/usr/share/vdsm/storage/volume.py", line 904, in clone
raise se.CannotCloneVolume(self.volumePath, dstPath, str(e))
CannotCloneVolume: Cannot clone volume: u'src=/rhev/data-center/e2d7693c-2474-4ebe-a77b-a9a6823d46f5/6fdb0d21-3910-4721-9b66-f554b7e06269/images/70cc7ab9-aa5f-40c3-861b-580da60614a7/6f5f65d1-114f-428e-8566-7b14818fe2ab, dst=/rhev/data-center/e2d7693c-2474-4ebe-a77b-a9a6823d46f5/6fdb0d21-3910-4721-9b66-f554b7e06269/images/70cc7ab9-aa5f-40c3-861b-580da60614a7/f5620a36-d901-485f-bd75-6df5b95880b9: cmd=[\'/usr/bin/qemu-img\', \'create\', \'-f\', \'qcow2\', \'-o\', \'compat=1.1\', \'-b\', u\'6f5f65d1-114f-428e-8566-7b14818fe2ab\', \'-F\', \'qcow2\', u\'/rhev/data-center/e2d7693c-2474-4ebe-a77b-a9a6823d46f5/6fdb0d21-3910-4721-9b66-f554b7e06269/images/70cc7ab9-aa5f-40c3-861b-580da60614a7/f5620a36-d901-485f-bd75-6df5b95880b9\'], ecode=1, stdout=, stderr=qemu-img: /rhev/data-center/e2d7693c-2474-4ebe-a77b-a9a6823d46f5/6fdb0d21-3910-4721-9b66-f554b7e06269/images/70cc7ab9-aa5f-40c3-861b-580da60614a7/f5620a36-d901-485f-bd75-6df5b95880b9: Failed to get shared "write" lock\nIs another process using the image?\nCould not open backing image to determine size.\n, message=None'
2017-12-07 22:30:46,080-0500 ERROR (tasks/1) [storage.TaskManager.Task] (Task='7c13b2b7-4970-41e4-8b8e-50b639b506ad') Unexpected error (task:872)
Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 879, in _run
return fn(*args, **kargs)
File "/usr/share/vdsm/storage/task.py", line 333, in run
return self.cmd(*self.argslist, **self.argsdict)
File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
return method(self, *args, **kwargs)
File "/usr/share/vdsm/storage/sp.py", line 1958, in createVolume
initialSize=initialSize)
File "/usr/share/vdsm/storage/sd.py", line 758, in createVolume
initialSize=initialSize)
File "/usr/share/vdsm/storage/volume.py", line 1112, in create
(volUUID, e))
VolumeCreationError: Error creating a new volume: (u'Volume creation f5620a36-d901-485f-bd75-6df5b95880b9 failed: Cannot clone volume: u\'src=/rhev/data-center/e2d7693c-2474-4ebe-a77b-a9a6823d46f5/6fdb0d21-3910-4721-9b66-f554b7e06269/images/70cc7ab9-aa5f-40c3-861b-580da60614a7/6f5f65d1-114f-428e-8566-7b14818fe2ab, dst=/rhev/data-center/e2d7693c-2474-4ebe-a77b-a9a6823d46f5/6fdb0d21-3910-4721-9b66-f554b7e06269/images/70cc7ab9-aa5f-40c3-861b-580da60614a7/f5620a36-d901-485f-bd75-6df5b95880b9: cmd=\\\'/usr/bin/qemu-img\\\', \\\'create\\\', \\\'-f\\\', \\\'qcow2\\\', \\\'-o\\\', \\\'compat=1.1\\\', \\\'-b\\\', u\\\'6f5f65d1-114f-428e-8566-7b14818fe2ab\\\', \\\'-F\\\', \\\'qcow2\\\', u\\\'/rhev/data-center/e2d7693c-2474-4ebe-a77b-a9a6823d46f5/6fdb0d21-3910-4721-9b66-f554b7e06269/images/70cc7ab9-aa5f-40c3-861b-580da60614a7/f5620a36-d901-485f-bd75-6df5b95880b9\\\', ecode=1, stdout=, stderr=qemu-img: /rhev/data-center/e2d7693c-2474-4ebe-a77b-a9a6823d46f5/6fdb0d21-3910-4721-9b66-f554b7e06269/images/70cc7ab9-aa5f-40c3-861b-580da60614a7/f5620a36-d901-485f-bd75-6df5b95880b9: Failed to get shared "write" lock
nIs another process using the image?
nCould not open backing image to determine size.
n, message=None\'',)
It seems qemu locking feature blocks current vdsm snapshot creation.
Do we need to open a new bug for the issue in comment8? It affects the function of creating snapshots in RHV. (In reply to Han Han from comment #9) > Do we need to open a new bug for the issue in comment8? It affects the > function of creating snapshots in RHV. If we plan to enable locking, this will become a regression. I suggest we have a BZ to track it. (In reply to Han Han from comment #9) > Do we need to open a new bug for the issue in comment8? It affects the > function of creating snapshots in RHV. Of course. It's a regression. We do not plan to use this locking. Note that in recent qemu the image locking feature is enabled by default and requires additional parameters to be turned off. Said that qemu-img should not need a write lock for creating the overlay image when using the following command line: /usr/bin/qemu-img create -f qcow2 -o compat=1.1 -b REL_PATH_TO_EXISTING_IMAGE -F qcow2 ABS_PATH_TO_NEW_IMAGE So the above should be reported as a qemu bug. Additionally note that libvirt does not support turning locking off fully since the statement from the qemu team is that it "Should work". (There is one exception to this where libguestfs is accepting the risk of reading corrupted data and requests a way to disable locking in libvirt. See: https://bugzilla.redhat.com/show_bug.cgi?id=1519242 ) (In reply to Gil Klein from comment #10) > (In reply to Han Han from comment #9) > > Do we need to open a new bug for the issue in comment8? It affects the > > function of creating snapshots in RHV. > If we plan to enable locking, this will become a regression. I suggest we > have a BZ to track it. According to comment12, file a new bug on qemu-kvm-rhev to track it: https://bugzilla.redhat.com/show_bug.cgi?id=1526212 switched to use -U (unsafe) in all relevant places. oVirt/RHV doesn't plan to use this feature BZ<2>Jira Resync |