RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1823976 - Libdevmapper fails with "Device %s not found" when calling dm_task_set_name()
Summary: Libdevmapper fails with "Device %s not found" when calling dm_task_set_name()
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.7
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Michal Privoznik
QA Contact: Han Han
URL:
Whiteboard:
Depends On:
Blocks: 1834353
TreeView+ depends on / blocked
 
Reported: 2020-04-14 23:44 UTC by Eric Wheeler
Modified: 2021-12-10 12:52 UTC (History)
13 users (show)

Fixed In Version: libvirt-4.5.0-36.el7
Doc Type: Bug Fix
Doc Text:
Cause: When starting a domain with namespaces enabled, libvirt creates a private /dev for the domain and creates all nodes that domain is configured to have there. However, this is not a simple copy of the top level nodes (say /dev/sda). If a device consists of some other devices (e.g. multipath targets), then those dependent devices have to be created too. This is done recursively for every path that is created in the private /dev. To determine dependent devices, libvirt uses libdevmapper. However, due to a bug in the way libvirt asks, it was possible to mistakenly ask for a different device. For instance, when libvirt issues the API and passes "/dev/loop0", the libdevmapper sanitizes the output and uses the last component only. So effectively, libvirt is asking for dependent devices of "/dev/mapper/loop0" which in general is completely different device and might not event appear in the domain configuration. Consequence: Domains may get a different disk than configured. Fix: The fix consists of firstly asking libdevmapper: Hey, is this a device you manage? And only if it answers yes, then libvirt asks: alright, gimme all dependent devices, please. Result: Not only domains now get the disk they wanted, but also the namespace construction is a tiny bit faster because libvirt doesn't have to call expensive "get all dependent devices" operation on each path.
Clone Of:
: 1834353 (view as bug list)
Environment:
Last Closed: 2020-09-29 20:29:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Test script for dm_set_task_name() (282 bytes, text/x-csrc)
2020-04-14 23:44 UTC, Eric Wheeler
no flags Details
XML for test VM (2.61 KB, text/plain)
2020-04-14 23:44 UTC, Eric Wheeler
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2020:4000 0 None None None 2020-09-29 20:29:40 UTC

Description Eric Wheeler 2020-04-14 23:44:00 UTC
Created attachment 1678836 [details]
Test script for dm_set_task_name()

Description of problem:

This problem is easiest to see by creating a KVM virtual machine that uses a disk referencing a block device that is not a device mapper device but where a device mapper device with the same name also exists. The easiest example is to do this with the device /dev/loop0 and another device /dev/mapper/loop0. Note that /dev/loop0 and /dev/mapper/loop0 have nothing to do with each other, but they share the same filename (but not path!).

When libvirt starts a virtual machine, it reaches virDevMapperGetTargetsImpl() in virdevmapper.c which attempts to call dm_task_set_name() on /dev/loop0.

dm_task_set_name() realizes that this is a path, and calls _dm_task_set_name_from_path() which attempts to strip the name from the path ("loop0") and assumes that this could be the same as /dev/mapper/loop0. However, it is smart enough to check that st_rdev is the same for both /dev/loop0 and /dev/mapper/loop0. Since they are different (of course), it then invokes _find_dm_name_of_device() to scan the /dev/mapper directory for an entry that has the same st_rdev value, which of course it will not find. Since it cannot be found, it returns the rather confusing error from device mapper:
    "Device loop0 not found"

And since errno is set to 0 ("Success", not ENOENT), virDevMapperGetTargetsImpl() then gives the confusing error:
    "error: Unable to get devmapper targets for /dev/loop0: Success"

This is only an issue because there exists a filename /dev/mapper/loop0. If we remove /dev/mapper/loop0, but keep the virtual machine booting from /dev/loop0, we still get the same error "Device loop0 not found" from device mapper, but this time errno is set to ENOENT so virDevMapperGetTargetsImpl() ignores the failure and boots the VM because of this branch in the function:
    if (!dm_task_set_name(dmt, path)) {
        if (errno == ENOENT) {
            /* It's okay, @path is not managed by devmapper =>
             * not a devmapper device. */
            ret = 0;
        }
        goto cleanup;
    }

Note that the code above only assumes that "@path is not managed by devmapper" because dm_task_set_name() is expected to set errno == ENOENT, but it does not.

It is unclear whether this bug lies in libvirt or libdevmapper. Perhaps libvirt should not invoke libdevmapper calls on non-device mapper paths (check major = 253?)---or perhaps libdevmapper should fail early if the referenced path is a block device and does not have a major of 253. In the latter case, ENOENT is probably nonsensical because /dev/loop0 does exist but is not a device mapper target, so perhaps EINVALID is a better errno value in this case. At the moment I believe errno is being set by libc calls and not by libdevmapper directly so it may or may not be appropriate for libdevmapper to set errno. If errno were set to EINVALID by libdevmapper, then libvirt would have to support that as an "okay" reason to set ret = 0 and goto cleanup in virDevMapperGetTargetsImpl().

In any event, it may be necessary for both libdevmapper and libvirt teams to decide what the best solution is here, both packages may need to be modified to solve the problem.

Version-Release number of selected component (if applicable):

device-mapper-1.02.158-2.el7_7.2.x86_64
lvm2-2.02.185-2.el7_7.2.x86_64
libvirt-4.5.0-23.el7_7.6.x86_64


How reproducible: very


Steps to Reproduce:
1. Create a device mapper target called "loop0":
~]# dmsetup create --table '0 10240 zero' loop0

2. Create a loopback device to boot a virtual machine:
~]# losetup /dev/loop0 /some/bootimg.raw

3. virsh start someVmName:
~]# virsh start someVmName
error: Failed to start domain someVmName
error: Unable to get devmapper targets for /dev/loop0: Success

4. Remove the DM loop0 device and try again:
~]# dmsetup remove loop0
~]# virsh start someVmName
Domain someVmName started


Actual results:

VM fails to start with error:
error: Failed to start domain someVmName
error: Unable to get devmapper targets for /dev/loop0: Success


Expected results:

VM should start


Additional info:

In order to test the device mapper segment of this problem we wrote a small .c program (attached) which you can invoke as follows:
# To build:
~]# gcc -ldevmapper test-dm_set_task_name.c -o test-dm_set_task_name

# To test the case where libvirt would not give an error because it detects ENOENT:
~]# ./test-dm_set_task_name /dev/loop0
dm version   [ opencount flush ]   [16384] (*1)
Device loop0 not found
errno: No such file or directory

# To test the case where libvirt would give an error because errno = "Success"
~]# dmsetup create --table '0 10240 zero' loop0
~]# ./test-dm_set_task_name /dev/loop0 
dm version   [ opencount flush ]   [16384] (*1)
Device loop0 not found
errno: Success

Comment 2 Eric Wheeler 2020-04-14 23:44:20 UTC
Created attachment 1678837 [details]
XML for test VM

Comment 3 Alasdair Kergon 2020-04-15 00:26:04 UTC
The library returns 0 on failure, 1 on success.  That is all.  We don't provide errno.

So the snippet of code you pasted is relying on undefined behaviour, as is the attachment.

You'd need to look at more context around the original code to work out what it wanted to do and then work out what needs to change.

Comment 4 Alasdair Kergon 2020-04-15 00:31:30 UTC
Also 253 is a dynamic assignment - don't attempt to fix this by hard-coding that number.

Comment 5 Eric Wheeler 2020-04-15 18:37:56 UTC
Here is a link to the libvirt code (function virDevMapperGetTargetsImpl() on line 63): https://github.com/libvirt/libvirt/blob/master/src/util/virdevmapper.c

It sounds like the libvirt team should not use dm_set_task_name() to simultaneously detect whether or not the device is a device mapper device and set the task name.

What function should they be using instead to detect if the device is a device mapper device?

Also, if you believe this is a libvirt issue, please add them to the ticket.

Comment 6 bugzilla 2020-04-21 19:08:43 UTC
Alasdair,

Can you re-assign this bug to libvirt?

It does not let me adjust the component.

-Eric

Comment 7 Eric Wheeler 2020-04-23 23:06:24 UTC
Alasdair, it looks like I was able to reassign this to libvirt. Which function should they be using prior to dm_set_task_name to check if a device is managed by DM?

Comment 8 Alasdair Kergon 2020-04-23 23:09:19 UTC
Depends what the starting point is.
If they have a major number, we offer:

/*
 * Determine whether a major number belongs to device-mapper or not.
 */
int dm_is_dm_major(uint32_t major);

Comment 9 Jaroslav Suchanek 2020-04-24 09:08:02 UTC
Miso, can you please have a look. I assume this is not a rhel7 only problem. Thanks.

Comment 10 Han Han 2020-04-24 09:17:15 UTC
Reproduced on qemu-kvm-5.0.0-0.scrmod+el8.3.0+6312+cee4f348.x86_64 libvirt v6.2.0-191-gde011c60a1

Comment 11 Michal Privoznik 2020-04-24 10:09:58 UTC
(In reply to Alasdair Kergon from comment #8)
> Depends what the starting point is.
> If they have a major number, we offer:
> 
> /*
>  * Determine whether a major number belongs to device-mapper or not.
>  */
> int dm_is_dm_major(uint32_t major);

We have a path, e.g. a path to the disk for a VM. I can stat() it to get the major number. But since I'm looking into this and apparently libvirt is not doing the right thing, let me ask you how it should do it. I mean, libvirt is given the path to a device. Libvirt inspects the path so it allows the device in CGroups, create the the device in namespace (libvirt creates a mount namespace and manages private /dev for each virtual machine), etc. The thing is, if the device is a multipath device, libvirt must allow all targets creating the multipath device in CGroups (see bug 1557769). So what is the best way to get them? So far libvirt's using dm_task_create(DM_DEVICE_DEPS) + dm_task_set_name() (as Eric correctly points out):

https://github.com/libvirt/libvirt/blob/master/src/util/virdevmapper.c#L85

Comment 12 Michal Privoznik 2020-05-11 11:30:22 UTC
I've proposed the patch on the upstream list:

https://www.redhat.com/archives/libvir-list/2020-May/msg00392.html

Comment 13 Michal Privoznik 2020-05-11 13:46:30 UTC
I've merged the referenced patch upstream:

01626c668e virDevMapperGetTargetsImpl: quit early if device is not a devmapper target

v6.3.0-35-g01626c668e

Comment 17 Han Han 2020-05-18 10:09:26 UTC
Verified on libvirt-4.5.0-36.el7.x86_64 qemu-kvm-rhev-2.12.0-47.el7.x86_64:
1. Create loop0 by device mapper, create loop0 loop device
➜  ~ dmsetup create --table '0 10240 zero' loop0

➜  ~ dd if=/dev/zero of=new bs=1M count=10 
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.00452615 s, 2.3 GB/s
➜  ~ losetup /dev/loop0 new 

2. Start an VM with /dev/loop0
    </disk><disk type="block" device="disk">
      <driver name="qemu" type="raw"/>
      <source dev="/dev/loop0"/>
      <backingStore/>
      <target dev="vdb" bus="virtio"/>
      <alias name="virtio-disk1"/>
      <address type="pci" domain="0x0000" bus="0x07" slot="0x00" function="0x0"/>
    </disk>

➜  ~ virsh start seabios --console
Domain seabios started


3. Detach the vdb
➜  ~ virsh detach-disk seabios vdb                 
Disk detached successfully

Comment 19 errata-xmlrpc 2020-09-29 20:29:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: libvirt security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4000


Note You need to log in before you can comment on or make changes to this bug.