Bug 1589115

Summary: libvirt fails to chown memory snapshot on shared (NFS) storage
Product: Red Hat Enterprise Linux 7 Reporter: Martin Polednik <mpoledni>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED ERRATA QA Contact: yisun
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.5CC: dyuan, eshenitz, fjin, lmen, mprivozn, mzamazal, pkrempa, xuzhang, yafu
Target Milestone: rcKeywords: Upstream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-4.5.0-5.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-30 09:55:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
domain.xml none

Description Martin Polednik 2018-06-08 12:33:29 UTC
Description of problem:
When creating a snapshot that includes memory, libvirt (with enabled dynamic_ownership) attempts to chown the memory backing file to 0:0. This fails in regular oVirt NFS setup (anonuid=36,anongid=36,all_squash). To work with dynamic_ownership enabled, oVirt needs libvirt to respect NFS permissions or at least allow developers to use mechanism similar to seclabel (DAC) to selectively disable dynamic_ownership behavior.

Version-Release number of selected component (if applicable):
libvirt-3.9.0-14.el7_5.5.x86_64
libvirt-daemon-3.9.0-14.el7_5.5.x86_64
libvirt-python-3.9.0-1.el7.x86_64
libvirt-daemon-driver-qemu-3.9.0-14.el7_5.5.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Prepare a NFS share with options (anonuid=36,anongid=36,all_squash) -- preferably for oVirt host.
2. Use that NFS store as a target path for virDomainSnapshotCreateXML. oVirt example of <domainsnapshot> looks as follows:

<?xml version='1.0' encoding='utf-8'?>
<domainsnapshot>
  <disks>
    <disk name="sda" snapshot="external" type="file">
      <source file="/rhev/data-center/mnt/REDACTED/e96cb7b3-97d9-459a-98f0-4b38659dccca/images/cf999339-0f1b-4490-9279-b94064221b10/4c79cf0e-4a65-4af1-9164-273fa74dc1de" type="file"/>
    </disk>
  </disks>
  <memory file="/rhev/data-center/mnt/REDACTED/e96cb7b3-97d9-459a-98f0-4b38659dccca/images/ecdba3a1-d38c-4f12-ad4e-9ed07f734f9c/27c0049d-adc4-4022-bd3b-0c6a64188d20" snapshot="external" />
</domainsnapshot>

The permissions under that NFS domain typically look as follows:
# ls -l
total 1077400
-rw-rw----. 1 vdsm kvm     2949120 Jun  8 13:39 93769776-01d8-4a3c-9f7b-c55ced7e9d81
-rw-rw----. 1 vdsm kvm     1048576 Jun  8 13:36 93769776-01d8-4a3c-9f7b-c55ced7e9d81.lease
-rw-r--r--. 1 vdsm kvm         265 Jun  8 13:36 93769776-01d8-4a3c-9f7b-c55ced7e9d81.meta
-rw-rw----. 1 vdsm kvm 42949672960 Jun  8 13:36 e7bd7620-693e-4139-a688-542f133cc9cd
-rw-rw----. 1 vdsm kvm     1048576 May 31 11:33 e7bd7620-693e-4139-a688-542f133cc9cd.lease
-rw-r--r--. 1 vdsm kvm         327 Jun  8 13:36 e7bd7620-693e-4139-a688-542f133cc9cd.meta

Actual results:
Snapshot creation fails. From journal:
Jun 08 13:32:04 localhost.localdomain libvirtd[16323]: 2018-06-08 11:32:04.914+0000: 16327: error : virFileOpenForceOwnerMode:2153 : cannot chown '/rhev/data-center/mnt/REDACTED/e96cb7b3-97d9-459a-98f0-4b38659dccca/images/ecdba3a1-d38c-4f12-ad4e-9ed07f734f9c/27c0049d-adc4-4022-bd3b-0c6a64188d20' to (0, 0): Operation not permitted
Jun 08 13:32:04 localhost.localdomain libvirtd[16323]: 2018-06-08 11:32:04.918+0000: 16327: error : qemuOpenFileAs:3212 : Error from child process creating '/rhev/data-center/mnt/REDACTED/e96cb7b3-97d9-459a-98f0-4b38659dccca/images/ecdba3a1-d38c-4f12-ad4e-9ed07f734f9c/27c0049d-adc4-4022-bd3b-0c6a64188d20': Operation not permitted

Expected results:
Memory portion of the snapshot succeeds, and has correct permissions set up.

Additional info:
I am able to provide oVirt environment to reproduce/investigate the issue. This behavior does not happen with dynamic_ownership=0.

Comment 2 Han Han 2018-06-11 08:00:45 UTC
Bug reproduced on libvirt-4.4 and libvirt-3.9.0-14.el7_5.6.x86_64 by these steps:
1. Buildup a NFS server with options (anonuid=36,anongid=36,all_squash,rw)
2. Mount NFS to /mnt
3. Create domain snapshot with following xml
<domainsnapshot>
  <disks>
    <disk name="vda" snapshot="external" type="file">
      <source file="/mnt/snap" type="file"/>
    </disk>
  </disks>
  <memory file="/mnt/mem" snapshot="external" />
</domainsnapshot>

Got following error:
# virsh -k0 -K0 snapshot-create pc snap.xml                                                                                                                              
error: Error from child process creating '/mnt/mem': Operation not permitted

Comment 3 Michal Privoznik 2018-06-21 09:49:17 UTC
(In reply to Martin Polednik from comment #0)
> Description of problem:
> When creating a snapshot that includes memory, libvirt (with enabled
> dynamic_ownership) attempts to chown the memory backing file to 0:0. This
> fails in regular oVirt NFS setup (anonuid=36,anongid=36,all_squash). To work
> with dynamic_ownership enabled, oVirt needs libvirt to respect NFS
> permissions or at least allow developers to use mechanism similar to
> seclabel (DAC) to selectively disable dynamic_ownership behavior.

Libvirt takes DAC label of your domain and uses that to access the file as a fallback if accessing as root:root fails. What does your domain XML look like? I'm not fully convinced that you can access the NFS (which has 36:36) if your domain is running as say 40:40.

Comment 4 Milan Zamazal 2018-06-22 20:36:02 UTC
Created attachment 1453847 [details]
domain.xml

Attaching an example domain.xml.

NFS should be normally accessible for all users as all_squash is used.

Comment 5 Michal Privoznik 2018-06-25 08:01:29 UTC
(In reply to Milan Zamazal from comment #4)
> Created attachment 1453847 [details]
> domain.xml
> 
> Attaching an example domain.xml.

So the domain is running under 107:107 but you have NFS squashed to 36:36.

> 
> NFS should be normally accessible for all users as all_squash is used.

Well, this is something libvirt can't know. Libvirt can only know the domain is running as 107:107 AND dynamicOwnership is set AND the path where you're trying to save memory is on NFS.
I see two options here:

1) make dynamicOwnership best effort only on NFS mounts. But this has great security implications on non-squashed NFS if chown() fails.

2) invent <seclabel/> for /domainsnapshot/memory so that different uid:gid can be specified. But that seems like a hack to workaround this one particular issue.

Peter, I recall talking to you about this. Do you have any other idea? I'm willing to with 2) since it's the only possible solution.

Comment 6 Michal Privoznik 2018-06-26 13:59:31 UTC
I've post a patch upstream to start some discussion:

https://www.redhat.com/archives/libvir-list/2018-June/msg01685.html

Basically, it's a version of 1) because it does not enforce dynamicOwnership for memory snapshot (which kind of makes sense - read the commit message for explanation).

Comment 10 yisun 2018-08-28 10:13:49 UTC
Test with:
qemu-kvm-rhev-2.12.0-10.el7.x86_64
libvirt-4.5.0-7.el7.x86_64
vdsm-4.20.31-1.el7ev.x86_64

Result:
1. create snapshot: PASSED
2. create snapshot with --reuse-external: FAILED (detailed info in test steps)

Steps as follow:
1. On vdsm host, prepare a nfs dir
# cat /etc/exports
/home/nfs *(anonuid=36,anongid=36,all_squash,rw)

# ll /home/ | grep nfs
drwxr-xr-x. 2 vdsm    kvm     23 Aug 28 06:09 nfs

# service nfs restart
Redirecting to /bin/systemctl restart nfs.service

2. mount the nfs locally
# mount | grep /home/nfs
10.73.73.57:/home/nfs on /mnt type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.73.73.57,local_lock=none,addr=10.73.73.57)

3. prepare a running vm and a snapshot xml
# cat /etc/libvirt/qemu.conf | grep dynamic_ownership
#dynamic_ownership = 1

# cat snap.xml 
<domainsnapshot>
<disks>
    <disk name="vda" snapshot="external" type="file">
        <source file="/mnt/disk.snap" type="file"/>
    </disk>
</disks>
<memory file="/mnt/mem.snap" snapshot="external" />
</domainsnapshot>

# virsh domstate vm2; virsh domblklist vm2
running

Target     Source
------------------------------------------------
vda        /var/lib/libvirt/images/vm2-1.qcow2

4. create the snapshot
# virsh snapshot-create vm2 snap.xml 
Domain snapshot 1535449611 created from 'snap.xml'

# ll /mnt
total 495056
-rw-------. 1 vdsm kvm    196768 Aug 28 05:46 disk.snap
-rw-------. 1 vdsm kvm 506736552 Aug 28 05:46 mem.snap

5. create the snapshot again with --reuse-external flag
# virsh snapshot-create vm2 snap.xml --reuse-external
error: internal error: unable to execute QEMU command 'cont': Failed to get "write" lock
<===== failed with lock issue

# virsh domstate vm2
paused
<===== vm pasued

# ll /mnt
total 1668
-rw-------. 1 vdsm kvm 1769472 Aug 28 05:47 disk.snap
<===== mem snapshot gone, only disk snap shot left

Pls help to check if this is expected behaviour or I configured something wrong?

Comment 11 Michal Privoznik 2018-08-30 09:16:30 UTC
I've managed to reproduce the issue. But I think what you came across are two new bugs:

1) qemu fails to get write lock because it collides with itself (after the first snapshot it has  /mnt/disk.snap plugged in, and it tries to lock it for the second time when --reuse-external is run).

2) libvirt deletes the memory even if the memory snapshot was done correctly.

I will post patch for 2). Not sure how to handle 1). Maybe Peter has an idea?

Comment 12 Michal Privoznik 2018-08-30 09:25:10 UTC
Patch for 2) posted upstream:

https://www.redhat.com/archives/libvir-list/2018-August/msg01867.html

Comment 14 Peter Krempa 2018-08-30 11:02:53 UTC
Case 1 is completely wrong usage. If you are using --reuse-external you need to make sure that the image is unused. The data in the image _will be destroyed_!

It was correct for qemu to fail the snapshot because of that reason and the only thing we can do is to go through the backing chains to see whether the file is shared. But that is a very complex fix for this. Additionally it will not save you from using a file used by some other VM.

Comment 15 Peter Krempa 2018-08-30 11:10:41 UTC
Also note that qemu should consider acquiring the write locks at the time when the transaction command is issued even when the vCPUs are paused, since that state can't be rolled back in any sane way.

Comment 16 Michal Privoznik 2018-08-30 11:23:06 UTC
Also, as Peter pointed out in discussion to the patch for 2), removing the file actually is expected behaviour. The patch will not be merged then.

Comment 17 yisun 2018-09-04 03:39:32 UTC
(In reply to Michal Privoznik from comment #16)
> Also, as Peter pointed out in discussion to the patch for 2), removing the
> file actually is expected behaviour. The patch will not be merged then.

thx Michal and Peter, then set VERIFIED.

Comment 18 Benny Zlotnik 2018-10-29 08:41:26 UTC
*** Bug 1636846 has been marked as a duplicate of this bug. ***

Comment 20 errata-xmlrpc 2018-10-30 09:55:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3113