Bug 1415252 - [RFE] QEMU image file locking (RHV)
Summary: [RFE] QEMU image file locking (RHV)
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: RFEs
Version: unspecified
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: ---
Assignee: Rob Young
QA Contact: Raz Tamir
URL:
Whiteboard:
Depends On: 1378241 1378242 1417306 1432523
Blocks: 1415250
TreeView+ depends on / blocked
 
Reported: 2017-01-20 16:41 UTC by Ademar Reis
Modified: 2019-05-16 13:04 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of: 1378242
Environment:
Last Closed: 2018-06-18 10:05:14 UTC
oVirt Team: Virt
Target Upstream Version:


Attachments (Terms of Use)
vdsm.log (57.84 KB, text/plain)
2017-12-08 07:08 UTC, Han Han
no flags Details

Description Ademar Reis 2017-01-20 16:41:03 UTC
For more details, see the discussion on virt-devel

+++ This bug was initially created as a clone of Bug #1378242 +++

+++ This bug was initially created as a clone of Bug #1378241 +++

QEMU image locking, which should prevent multiple runs of QEMU or qemu-img when a VM is running.

Upstream series (v7): https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg01306.html

Comment 1 Martin Tessun 2017-03-06 10:10:00 UTC
Hi Jarda,

could you please add some detail about what needs to be done here (if anything at all).
My understanding is that the local locking does not have any negative impact as long as the disk is not shared between different VMs.

So specifically: Anything that needs to be set, in case the disk is shared or is this already done?
From my PoV nothing is needed, as the "<sharable/>" tag should take care about this in libvirt.


Non-Sharable disk:

    <disk type='block' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/>
      <source dev='<DEVICE>'/>
      <backingStore/>
[...]
    </disk>


Shareable disk:

    <disk type='block' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/>
      <source dev='<DEVICE>'/>
      <backingStore/>
      <shareable/>
[...]
    </disk>

If that is already enough to avoid access issues, I don't think anything else is currently needed, especially as this locking feature does only work when the vDisk is accessed from the same host.

Thanks,
Martin

Comment 2 Jaroslav Suchanek 2017-03-17 16:10:54 UTC
(In reply to Martin Tessun from comment #1)
> Hi Jarda,
> 
> could you please add some detail about what needs to be done here (if
> anything at all).
> My understanding is that the local locking does not have any negative impact
> as long as the disk is not shared between different VMs.


Well, It has been discussed in qemu-kvm-rhev bz:
https://bugzilla.redhat.com/show_bug.cgi?id=1378241#c8

In summary, it is internal qemu mechanism how to prevent concurrent write to the image on a host. While there might be little benefit for RHV as it handles locking of shared storage itself, there is potentially huge benefit for RHEL. GSS and consequently ENG teams have experienced very complicated virtual disk corruptions. Image locking would have had prevented it.


> 
> So specifically: Anything that needs to be set, in case the disk is shared
> or is this already done?

Libvirt needs to have an option how to enable/disable the locking.

> From my PoV nothing is needed, as the "<sharable/>" tag should take care
> about this in libvirt.
> 

<shareble/> flag disables locking so the disk can be accessed by multiple vms, for example during live migration.

> 
> Non-Sharable disk:
> 
>     <disk type='block' device='disk' snapshot='no'>
>       <driver name='qemu' type='raw' cache='none' error_policy='stop'
> io='native'/>
>       <source dev='<DEVICE>'/>
>       <backingStore/>
> [...]
>     </disk>
> 
> 
> Shareable disk:
> 
>     <disk type='block' device='disk' snapshot='no'>
>       <driver name='qemu' type='raw' cache='none' error_policy='stop'
> io='native'/>
>       <source dev='<DEVICE>'/>
>       <backingStore/>
>       <shareable/>
> [...]
>     </disk>
> 
> If that is already enough to avoid access issues, I don't think anything
> else is currently needed, especially as this locking feature does only work
> when the vDisk is accessed from the same host.
> 

RHV definitely needs to test it and be able to switch it on/off.

J.

Comment 4 Yaniv Kaul 2017-03-17 18:53:42 UTC
We need tge ability to turn it off, definitely.

Comment 5 Ademar Reis 2017-03-17 19:47:42 UTC
(In reply to Yaniv Kaul from comment #4)
> We need the ability to turn it off, definitely.

Do you say that because you expect to disable it in some specific scenario (or even by default), or because you want to have the ability to disable in case something goes wrong in the field?

Again: we're introducing it in RHEL-7.4, but disabled by default, so there's nothing to worry about for now. No changes in libvirt or in RHV are expected.

We'll be testing it and fixing any issues until it gets flipped to ON by default in RHEL-7.5.

Comment 6 Yaniv Kaul 2017-03-17 19:56:32 UTC
(In reply to Ademar Reis from comment #5)
> (In reply to Yaniv Kaul from comment #4)
> > We need the ability to turn it off, definitely.
> 
> Do you say that because you expect to disable it in some specific scenario
> (or even by default), or because you want to have the ability to disable in
> case something goes wrong in the field?

I don't see a use for it in RHV, and everything we don't use we probably should have disabled by default (especially as I don't see us even switching it on in the future).

> 
> Again: we're introducing it in RHEL-7.4, but disabled by default, so there's
> nothing to worry about for now. No changes in libvirt or in RHV are expected.
> 
> We'll be testing it and fixing any issues until it gets flipped to ON by
> default in RHEL-7.5.

So we should probably switch this RFE to 'turn off qemu image locking' - as we probably want to explicitly turn it off already in 7.4.

Comment 7 Ademar Reis 2017-03-17 20:15:17 UTC
(In reply to Yaniv Kaul from comment #6)
> (In reply to Ademar Reis from comment #5)
> > (In reply to Yaniv Kaul from comment #4)
> > > We need the ability to turn it off, definitely.
> > 
> > Do you say that because you expect to disable it in some specific scenario
> > (or even by default), or because you want to have the ability to disable in
> > case something goes wrong in the field?
> 
> I don't see a use for it in RHV, and everything we don't use we probably
> should have disabled by default (especially as I don't see us even switching
> it on in the future).
> 

It's supposed to be transparent to RHEV. If you disable it, you'll miss the benefit of all the testing we've done, plus the benefit of the locking itself. 

You would be doing something strongly discouraged by us. And then naturally, the first thing we would ask if/when a corruption gets reported would be "why was image locking disabled", thus shifting the burden of the investigation to RHV. I don't see how you can accomplish anything useful by disabling this lock by default.

> > 
> > Again: we're introducing it in RHEL-7.4, but disabled by default, so there's
> > nothing to worry about for now. No changes in libvirt or in RHV are expected.
> > 
> > We'll be testing it and fixing any issues until it gets flipped to ON by
> > default in RHEL-7.5.
> 
> So we should probably switch this RFE to 'turn off qemu image locking' - as
> we probably want to explicitly turn it off already in 7.4.

I don't think libvirt will have the knobs to "explicitly turn it off", given it'll already be turned off. We don't plan any changes in libvirt for this feature in RHEL-7.4.

Comment 8 Han Han 2017-12-08 07:08:26 UTC
Created attachment 1364630 [details]
vdsm.log

When create the 3rd snapshot in RHV(RHV-4.1.8.2-0.1.el7,libvirt-3.9.0-5.el7.x86_64,qemu-kvm-rhev-2.10.0-11.el7.x86_64,vdsm-4.19.41-1.el7ev.x86_64
), snapshot created failed and got error like following:

2017-12-07 22:30:46,068-0500 ERROR (tasks/1) [storage.Volume] cannot clone image 70cc7ab9-aa5f-40c3-861b-580da60614a7 volume 6f5f65d1-114f-428e-8566-7b14818fe2ab to /rhev/data-center/e2d7693c-2474-4ebe-a77b-a9a6823d46f5/6fdb0d21-3910-4721-9b66-f554b7e06269/images/70cc7ab9-aa5f-40c3-861b-580da60614a7/f5620a36-d901-485f-bd75-6df5b95880b9 (volume:899)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/volume.py", line 895, in clone
    backingFormat=sc.fmt2str(self.getFormat()))
  File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 146, in create
    _run_cmd(cmd, cwd=cwdPath)
  File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 426, in _run_cmd
    raise QImgError(cmd, rc, out, err)
QImgError: cmd=['/usr/bin/qemu-img', 'create', '-f', 'qcow2', '-o', 'compat=1.1', '-b', u'6f5f65d1-114f-428e-8566-7b14818fe2ab', '-F', 'qcow2', u'/rhev/data-center/e2d7693c-2474-4ebe-a77b-a9a6823d46f5/6fdb0d21-3910-4721-9b66-f554b7e06269/images/70cc7ab9-aa5f-40c3-861b-580da60614a7/f5620a36-d901-485f-bd75-6df5b95880b9'], ecode=1, stdout=, stderr=qemu-img: /rhev/data-center/e2d7693c-2474-4ebe-a77b-a9a6823d46f5/6fdb0d21-3910-4721-9b66-f554b7e06269/images/70cc7ab9-aa5f-40c3-861b-580da60614a7/f5620a36-d901-485f-bd75-6df5b95880b9: Failed to get shared "write" lock
Is another process using the image?
Could not open backing image to determine size.
, message=None
2017-12-07 22:30:46,075-0500 ERROR (tasks/1) [storage.Volume] Unexpected error (volume:1110)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/volume.py", line 1067, in create
    initialSize=initialSize)
  File "/usr/share/vdsm/storage/fileVolume.py", line 472, in _create
    volParent.clone(volPath, volFormat)
  File "/usr/share/vdsm/storage/volume.py", line 904, in clone
    raise se.CannotCloneVolume(self.volumePath, dstPath, str(e))
CannotCloneVolume: Cannot clone volume: u'src=/rhev/data-center/e2d7693c-2474-4ebe-a77b-a9a6823d46f5/6fdb0d21-3910-4721-9b66-f554b7e06269/images/70cc7ab9-aa5f-40c3-861b-580da60614a7/6f5f65d1-114f-428e-8566-7b14818fe2ab, dst=/rhev/data-center/e2d7693c-2474-4ebe-a77b-a9a6823d46f5/6fdb0d21-3910-4721-9b66-f554b7e06269/images/70cc7ab9-aa5f-40c3-861b-580da60614a7/f5620a36-d901-485f-bd75-6df5b95880b9: cmd=[\'/usr/bin/qemu-img\', \'create\', \'-f\', \'qcow2\', \'-o\', \'compat=1.1\', \'-b\', u\'6f5f65d1-114f-428e-8566-7b14818fe2ab\', \'-F\', \'qcow2\', u\'/rhev/data-center/e2d7693c-2474-4ebe-a77b-a9a6823d46f5/6fdb0d21-3910-4721-9b66-f554b7e06269/images/70cc7ab9-aa5f-40c3-861b-580da60614a7/f5620a36-d901-485f-bd75-6df5b95880b9\'], ecode=1, stdout=, stderr=qemu-img: /rhev/data-center/e2d7693c-2474-4ebe-a77b-a9a6823d46f5/6fdb0d21-3910-4721-9b66-f554b7e06269/images/70cc7ab9-aa5f-40c3-861b-580da60614a7/f5620a36-d901-485f-bd75-6df5b95880b9: Failed to get shared "write" lock\nIs another process using the image?\nCould not open backing image to determine size.\n, message=None'
2017-12-07 22:30:46,080-0500 ERROR (tasks/1) [storage.TaskManager.Task] (Task='7c13b2b7-4970-41e4-8b8e-50b639b506ad') Unexpected error (task:872)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 879, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 333, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1958, in createVolume
    initialSize=initialSize)
  File "/usr/share/vdsm/storage/sd.py", line 758, in createVolume
    initialSize=initialSize)
  File "/usr/share/vdsm/storage/volume.py", line 1112, in create
    (volUUID, e))
VolumeCreationError: Error creating a new volume: (u'Volume creation f5620a36-d901-485f-bd75-6df5b95880b9 failed: Cannot clone volume: u\'src=/rhev/data-center/e2d7693c-2474-4ebe-a77b-a9a6823d46f5/6fdb0d21-3910-4721-9b66-f554b7e06269/images/70cc7ab9-aa5f-40c3-861b-580da60614a7/6f5f65d1-114f-428e-8566-7b14818fe2ab, dst=/rhev/data-center/e2d7693c-2474-4ebe-a77b-a9a6823d46f5/6fdb0d21-3910-4721-9b66-f554b7e06269/images/70cc7ab9-aa5f-40c3-861b-580da60614a7/f5620a36-d901-485f-bd75-6df5b95880b9: cmd=\\\'/usr/bin/qemu-img\\\', \\\'create\\\', \\\'-f\\\', \\\'qcow2\\\', \\\'-o\\\', \\\'compat=1.1\\\', \\\'-b\\\', u\\\'6f5f65d1-114f-428e-8566-7b14818fe2ab\\\', \\\'-F\\\', \\\'qcow2\\\', u\\\'/rhev/data-center/e2d7693c-2474-4ebe-a77b-a9a6823d46f5/6fdb0d21-3910-4721-9b66-f554b7e06269/images/70cc7ab9-aa5f-40c3-861b-580da60614a7/f5620a36-d901-485f-bd75-6df5b95880b9\\\', ecode=1, stdout=, stderr=qemu-img: /rhev/data-center/e2d7693c-2474-4ebe-a77b-a9a6823d46f5/6fdb0d21-3910-4721-9b66-f554b7e06269/images/70cc7ab9-aa5f-40c3-861b-580da60614a7/f5620a36-d901-485f-bd75-6df5b95880b9: Failed to get shared "write" lock
nIs another process using the image?
nCould not open backing image to determine size.
n, message=None\'',)

It seems qemu locking feature blocks current vdsm snapshot creation.

Comment 9 Han Han 2017-12-14 05:41:45 UTC
Do we need to open a new bug for the issue in comment8? It affects the function of creating snapshots in RHV.

Comment 10 Gil Klein 2017-12-14 05:59:51 UTC
(In reply to Han Han from comment #9)
> Do we need to open a new bug for the issue in comment8? It affects the
> function of creating snapshots in RHV.
If we plan to enable locking, this will become a regression. I suggest we have a BZ to track it.

Comment 11 Yaniv Kaul 2017-12-14 14:33:16 UTC
(In reply to Han Han from comment #9)
> Do we need to open a new bug for the issue in comment8? It affects the
> function of creating snapshots in RHV.

Of course. It's a regression.
We do not plan to use this locking.

Comment 12 Peter Krempa 2017-12-14 15:38:38 UTC
Note that in recent qemu the image locking feature is enabled by default and requires additional parameters to be turned off.

Said that qemu-img should not need a write lock for creating the overlay image when using the following command line:

/usr/bin/qemu-img create -f qcow2 -o compat=1.1 -b REL_PATH_TO_EXISTING_IMAGE -F qcow2 ABS_PATH_TO_NEW_IMAGE

So the above should be reported as a qemu bug.

Additionally note that libvirt does not support turning locking off fully since the statement from the qemu team is that it "Should work".

(There is one exception to this where libguestfs is accepting the risk of reading corrupted data and requests a way to disable locking in libvirt. See: https://bugzilla.redhat.com/show_bug.cgi?id=1519242 )

Comment 13 Han Han 2017-12-15 02:19:44 UTC
(In reply to Gil Klein from comment #10)
> (In reply to Han Han from comment #9)
> > Do we need to open a new bug for the issue in comment8? It affects the
> > function of creating snapshots in RHV.
> If we plan to enable locking, this will become a regression. I suggest we
> have a BZ to track it.

According to comment12, file a new bug on qemu-kvm-rhev to track it:
https://bugzilla.redhat.com/show_bug.cgi?id=1526212

Comment 15 Michal Skrivanek 2018-06-18 10:05:14 UTC
switched to use -U (unsafe) in all relevant places. oVirt/RHV doesn't plan to use this feature

Comment 16 Franta Kust 2019-05-16 13:04:08 UTC
BZ<2>Jira Resync


Note You need to log in before you can comment on or make changes to this bug.