Bug 1740024 - Refcount should not accumulate when the second vm using same disk fails to start
Summary: Refcount should not accumulate when the second vm using same disk fails to start
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 8.0
Assignee: Michal Privoznik
QA Contact: yafu
URL:
Whiteboard:
Depends On:
Blocks: 1652078 1771500 1780154
TreeView+ depends on / blocked
 
Reported: 2019-08-12 06:23 UTC by Fangge Jin
Modified: 2020-11-14 07:26 UTC (History)
11 users (show)

Fixed In Version: libvirt-5.6.0-9.el8
Doc Type: Bug Fix
Doc Text:
Cause: When starting up a guest libvirt needs to prepare all the files that qemu is going to touch. This involves setting up security labels, i.e. DAC labels (aka user:group owners), and SELinux labels so that qemu can access desired files. At the same time, when 'remember_owner' is set in qemu.conf (or if commented out it's on by default), libvirt uses extended attributes (aka XATTRs) to record original labels, so that they can be restored back when qemu shuts down. This comes handy when the file is in user's $HOME (e.g. installation ISO) - because once qemu no longer uses it, the file has original owner/SELinux label and thus ordinary user doesn't lose access to it. However, the XATTRs are a bit tricky - each security driver (= piece of code that handles only DAC, or only SELinux) uses its own namespace (so that they are independent) and also has their own reference counter (the reference counter reflects how many times given file is in use). This means that each file has two reference counters - one for DAC and one for SELinux. Ideally, these are in sync - the file has both labels after all. Anyway, libvirt does perform a rollback, that is - a list of [path; desired DAC label] pairs is gathered and then set at once. The same applies for SELinux labels (this means we have two independent lists). If something goes wrong while processing the list, a rollback is performed => the original owner is restored. However, libvirt did not perform rollback on DAC list when applying SELinux list failed. Another way to look at this (leaving out 'remember_owner' feature completely) is: when setting up labels DAC labels are set first (i.e. chown) and only after that SELinux labels are set (i.e. chcon). But, if setting up DAC labels succeeded and setting up SELinux failed, the files were left with DAC labels changed which creates a vulnerability. Consequence: If setting up SELinux labels failed at some point, the XATTR refcounter for DAC would increase (if remember_owner is enabled, which is default), and/or all files configured for the domain (disks, sockets, ...) are left with the owner that domain was configured to run under (if remember_owner is disabled). Fix: The fix consists of performing a rollback on DAC driver when SELinux driver fails. Result: The XATTR refcounter for DAC no longer increases and files keep their original owner.
Clone Of:
: 1771500 (view as bug list)
Environment:
Last Closed: 2020-02-04 18:28:48 UTC
Type: Bug
Target Upstream Version:
Embargoed:
yafu: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:0404 0 None None None 2020-02-04 18:29:57 UTC

Description Fangge Jin 2019-08-12 06:23:55 UTC
Description of problem:
Refcount should not accumulate when the second vm using same disk fails to start

Version-Release number of selected component (if applicable):
libvirt-5.6.0-1.virtcov.el8.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Prepare 2 vm using same disk image: 
/var/lib/libvirt/images/RHEL-8.1.0-20190806.2-x86_64.qcow2

2. Start the first vm:
# virsh start rhel7.6
Domain rhel7.6 started

3. Start the second vm(it fails to start, which is expected):
# virsh start q35
error: Failed to start domain q35
error: internal error: child reported (status=125): Requested operation is not valid: Setting different SELinux label on /var/lib/libvirt/images/RHEL-8.1.0-20190806.2-x86_64.qcow2 which is already in use

4. Destroy first vm:
# virsh destroy rhel7.6
Domain rhel7.6 destroyed

5. Check the selinux context of the disk image, it is not restored(which is not expected)
# ll /var/lib/libvirt/images/RHEL-8.1.0-20190806.2-x86_64.qcow2 -Z
-rw-r--r--. 1 qemu qemu system_u:object_r:svirt_image_t:s0:c140,c631 1314979840 Aug 12 02:06 /var/lib/libvirt/images/RHEL-8.1.0-20190806.2-x86_64.qcow2

6. Try to start the first vm, it fails to start(which is not expected)
# virsh start rhel7.6
error: Failed to start domain rhel7.6
error: internal error: child reported (status=125): Requested operation is not valid: Setting different SELinux label on /var/lib/libvirt/images/RHEL-8.1.0-20190806.2-x86_64.qcow2 which is already in use

Actual results:
As step 5&&6

Expected results:
Refcount of the disk image should not accumulate when the second vm fails to start.
In step5, disk image context should be restored and vm can start successfully in step6.

Additional info:
After line 1396, refcount should be minus 1.
1326 static int
1327 virSecuritySELinuxSetFileconHelper(virSecurityManagerPtr mgr,
1328                                    const char *path,
1329                                    const char *tcon,
1330                                    bool optional,
1331                                    bool remember)
1332 {
1333     bool privileged = virSecurityManagerGetPrivileged(mgr);
1334     security_context_t econ = NULL;
1335     int refcount;
1336     int rc;
1337     int ret = -1;
1338 
1339     if ((rc = virSecuritySELinuxTransactionAppend(path, tcon,
1340                                                   optional, remember, false)) < 0)
1341         return -1;
1342     else if (rc > 0)
1343         return 0;
1344 
1345     if (remember) {
1346         if (getfilecon_raw(path, &econ) < 0 &&
1347             errno != ENOTSUP && errno != ENODATA) {
1348             virReportSystemError(errno,
1349                                  _("unable to get SELinux context of %s"),
1350                                  path);
1351             goto cleanup;
1352         }
1353 
1354         if (econ) {
1355             refcount = virSecuritySELinuxRememberLabel(path, econ);
1356             if (refcount == -2) {
1357                 /* Not supported. Don't error though. */
1358             } else if (refcount < 0) {
1359                 goto cleanup;
1360             } else if (refcount > 1) {
1361                 /* Refcount is greater than 1 which means that there
1362                  * is @refcount domains using the @path. Do not
1363                  * change the label (as it would almost certainly
1364                  * cause the other domains to lose access to the
1365                  * @path). */
1366                 if (STRNEQ(econ, tcon)) {
1367                     virReportError(VIR_ERR_OPERATION_INVALID,
1368                                    _("Setting different SELinux label on %s "
1369                                      "which is already in use"), path);
1370                     goto cleanup;
1371                 }
1372             }
1373         }
1374     }

Comment 1 yafu 2019-08-13 09:21:12 UTC
Also meet the same issue when trying to attach the same image to two guests.

Reproduce steps:
1.Prepare two running guest;

2.Prepare a image:
#qemu-img create -f qcow2 test.qcow2 100M

3.Attach the disk to one guest:
#virsh attach-disk iommu2 /var/lib/libvirt/images/test.qcow2 sdb --subdriver qcow2
Disk attached successfully

4.Attach the disk to another guest:
# virsh attach-disk vm2 /var/lib/libvirt/images/test.qcow2 sdc --subdriver qcow2
error: Failed to attach disk
error: internal error: child reported (status=125): Requested operation is not valid: Setting different SELinux label on /var/lib/libvirt/images/test.qcow2 which is already in use

5.Detach the disk from guest one:
#virsh detach-disk iommu2 sdb
Disk detached successfully

6.Attach the disk to guest again:
# virsh attach-disk vm2 /var/lib/libvirt/images/test.qcow2 sdc --subdriver qcow2
error: Failed to attach disk
error: internal error: child reported (status=125): Requested operation is not valid: Setting different SELinux label on /var/lib/libvirt/images/test.qcow2 which is already in use

Comment 2 yafu 2019-08-15 03:49:31 UTC
Hi, Michal,

Would you help to check this bug please? I guess it is caused by the patches for Bug 1652078 - RFE: the security drivers must remember original permissions/labels and restore them after.
And this bug will cause the guest can not start any more easily.
Thanks a lot.

Comment 3 Michal Privoznik 2019-08-22 12:31:40 UTC
Patch proposed upstream:

https://www.redhat.com/archives/libvir-list/2019-August/msg01035.html

Comment 4 Michal Privoznik 2019-08-22 15:22:21 UTC
Fixed upstream as:

6a2806fd54 security: Don't increase XATTRs refcounter on failure

v5.6.0-298-g6a2806fd54

Comment 7 Luyao Huang 2019-09-05 07:53:25 UTC
Fail to verify this bug with libvirt-5.6.0-4.module+el8.1.0+4160+b50057dc.x86_64, the dac ref still increased when start a guest failed:

# virsh start vm1
Domain vm1 started

# getfattr -m trusted.libvirt.security -d /var/lib/libvirt/images/RHEL-8.1-x86_64-latest2.qcow2
getfattr: Removing leading '/' from absolute path names
# file: var/lib/libvirt/images/RHEL-8.1-x86_64-latest2.qcow2
trusted.libvirt.security.dac="+0:+0"
trusted.libvirt.security.ref_dac="1"
trusted.libvirt.security.ref_selinux="1"
trusted.libvirt.security.selinux="system_u:object_r:virt_image_t:s0"
trusted.libvirt.security.timestamp_dac="1567565944"
trusted.libvirt.security.timestamp_selinux="1567565944"

# virsh start vm2
error: Failed to start domain vm2
error: internal error: child reported (status=125): Requested operation is not valid: Setting different SELinux label on /var/lib/libvirt/images/RHEL-8.1-x86_64-latest2.qcow2 which is already in use

# getfattr -m trusted.libvirt.security -d /var/lib/libvirt/images/RHEL-8.1-x86_64-latest2.qcow2
getfattr: Removing leading '/' from absolute path names
# file: var/lib/libvirt/images/RHEL-8.1-x86_64-latest2.qcow2
trusted.libvirt.security.dac="+0:+0"
trusted.libvirt.security.ref_dac="2"
trusted.libvirt.security.ref_selinux="1"
trusted.libvirt.security.selinux="system_u:object_r:virt_image_t:s0"
trusted.libvirt.security.timestamp_dac="1567565944"
trusted.libvirt.security.timestamp_selinux="1567565944"

# virsh destroy vm1
Domain vm1 destroyed

# getfattr -m trusted.libvirt.security -d /var/lib/libvirt/images/RHEL-8.1-x86_64-latest2.qcow2
getfattr: Removing leading '/' from absolute path names
# file: var/lib/libvirt/images/RHEL-8.1-x86_64-latest2.qcow2
trusted.libvirt.security.dac="+0:+0"
trusted.libvirt.security.ref_dac="1"
trusted.libvirt.security.timestamp_dac="1567565944"

# ll -Z /var/lib/libvirt/images/RHEL-8.1-x86_64-latest2.qcow2
-rw-r--r--. 1 qemu qemu system_u:object_r:virt_image_t:s0 769589248 Sep  5 03:47 /var/lib/libvirt/images/RHEL-8.1-x86_64-latest2.qcow2

Comment 8 Michal Privoznik 2019-09-16 09:13:42 UTC
Patches posted upstream:

https://www.redhat.com/archives/libvir-list/2019-September/msg00610.html

Comment 9 Han Han 2019-09-23 07:16:03 UTC
Reproduced on libvirt-5.6.0-6.module+el8.1.0+4244+9aa4e6bb.x86_64 qemu-kvm-4.1.0-10.module+el8.1.0+4234+33aa4f57.x86_64.
My issue is the same scenario as this.

Comment 12 Michal Privoznik 2019-10-14 15:31:37 UTC
Patches pushed upstream:

9d03e9adf1 security_stack: Perform rollback if one of stacked drivers fails
cd355a526f security_stack: Turn list of nested drivers into a doubly linked list
3f968a8706 security: Introduce virSecurityManagerGetDriver()
81dbceea65 security: Rename virSecurityManagerGetDriver() to virSecurityManagerGetVirtDriver()
458d0a8c52 security: Pass @migrated to virSecurityManagerSetAllLabel

v5.8.0-134-g9d03e9adf1

Comment 21 yafu 2019-11-20 07:45:32 UTC
Verified with libvirt-5.6.0-7.module+el8.1.1+4483+2f45aaa2.x86_64.

Test steps:
# virsh start vm1
Domain vm1 started

# getfattr -m trusted.libvirt.security -d /var/lib/libvirt/images/78.qcow2 
getfattr: Removing leading '/' from absolute path names
# file: var/lib/libvirt/images/78.qcow2
trusted.libvirt.security.dac="+0:+0"
trusted.libvirt.security.ref_dac="1"
trusted.libvirt.security.ref_selinux="1"
trusted.libvirt.security.selinux="unconfined_u:object_r:virt_image_t:s0"
trusted.libvirt.security.timestamp_dac="1573809643"
trusted.libvirt.security.timestamp_selinux="1573809643"

# virsh start vm2
error: Failed to start domain vm2
error: internal error: child reported (status=125): Requested operation is not valid: Setting different SELinux label on /var/lib/libvirt/images/78.qcow2 which is already in use

#getfattr -m trusted.libvirt.security -d /var/lib/libvirt/images/78.qcow2 
getfattr: Removing leading '/' from absolute path names
# file: var/lib/libvirt/images/78.qcow2
trusted.libvirt.security.dac="+0:+0"
trusted.libvirt.security.ref_dac="1"
trusted.libvirt.security.ref_selinux="1"
trusted.libvirt.security.selinux="unconfined_u:object_r:virt_image_t:s0"
trusted.libvirt.security.timestamp_dac="1573809643"
trusted.libvirt.security.timestamp_selinux="1573809643"

# virsh destroy vm1
Domain vm1 destroyed

#getfattr -m trusted.libvirt.security -d /var/lib/libvirt/images/78.qcow2
no output

#virsh start vm1
Domain vm1 started

Also test disk hotplug, and other devices: such as: memory, console/serial device, input device, etc.

Comment 22 yafu 2019-12-04 03:42:48 UTC
Hi Michal,

The xattr was cleared when trying to start the second vm using same disk with libvirt-5.6.0-9.x86_64. And it works well with libvirt-5.6.0-7.x86_64.
Could you help to check the issue please?
Thanks.

Test steps:
1.# virsh start vm1
Domain vm1 started

2.# getfattr -m trusted.libvirt.security -d /var/lib/libvirt/images/test.qcow2
getfattr: Removing leading '/' from absolute path names
# file: var/lib/libvirt/images/test.qcow2
trusted.libvirt.security.dac="+0:+0"
trusted.libvirt.security.ref_dac="1"
trusted.libvirt.security.ref_selinux="1"
trusted.libvirt.security.selinux="system_u:object_r:virt_image_t:s0"
trusted.libvirt.security.timestamp_dac="1574640796"
trusted.libvirt.security.timestamp_selinux="1574640796"

3.Start another guest using /var/lib/libvirt/images/test.qcow2:
# virsh start vm2
error: Failed to start domain vm2
error: internal error: child reported (status=125): Requested operation is not valid: Setting different SELinux label on /var/lib/libvirt/images/test.qcow2 which is already in use

4.The xattr was cleared after step3:
# getfattr -m trusted.libvirt.security -d /var/lib/libvirt/images/test.qcow2
no output

Comment 23 Michal Privoznik 2019-12-04 16:42:16 UTC
Yes, this is a bug in the patch I've pushed. Proposing the fix here:

https://www.redhat.com/archives/libvir-list/2019-December/msg00246.html

Comment 24 Michal Privoznik 2019-12-31 06:04:16 UTC
Actually, I've backported the patches needed as a part of bug 1741456. Moving back to ON_QA.

Comment 26 yafu 2020-01-03 02:51:25 UTC
Track the issue in the comment 22 in bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1780154

And according to comment 21-24, move this bug to VERIFIED.

Comment 28 errata-xmlrpc 2020-02-04 18:28:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0404


Note You need to log in before you can comment on or make changes to this bug.