Bug 1526197 - Hotplug failed because libvirtError: internal error: unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-virtio-disk7' could not be initialized
Summary: Hotplug failed because libvirtError: internal error: unable to execute QEMU c...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.1.6
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ovirt-4.2.2
: ---
Assignee: Daniel Erez
QA Contact: Raz Tamir
URL:
Whiteboard:
Depends On: 1469235 1532183
Blocks: 1547358 1554290
TreeView+ depends on / blocked
 
Reported: 2017-12-14 22:36 UTC by Germano Veit Michel
Modified: 2021-12-10 15:29 UTC (History)
26 users (show)

Fixed In Version: vdsm v4.20.22
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1469235
: 1554290 (view as bug list)
Environment:
Last Closed: 2018-05-15 17:54:02 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1506072 1 None None None 2021-09-09 12:44:51 UTC
Red Hat Bugzilla 1528502 1 None None None 2021-08-30 11:50:24 UTC
Red Hat Knowledge Base (Solution) 3260121 0 None None None 2017-12-14 22:51:59 UTC
Red Hat Product Errata RHEA-2018:1489 0 None None None 2018-05-15 17:55:30 UTC
oVirt gerrit 88712 0 master MERGED spec: Require libvirt to fix hotplug issues 2020-08-06 15:29:33 UTC
oVirt gerrit 88743 0 ovirt-4.2 MERGED spec: Require libvirt to fix hotplug issues 2020-08-06 15:29:33 UTC

Internal Links: 1506072 1528502

Comment 1 Germano Veit Michel 2017-12-14 22:41:29 UTC
These versions also reproduce the issue:

libvirt-daemon-kvm-3.2.0-14.el7_4.3.x86_64
systemd-219-42.el7_4.4.x86_64
vdsm-4.19.36-1.el7ev.x86_64

We also have report that the operation succeeds in case the backup vm (commvault) and the vm being snapshoted run on the same host.

Raz, somehow I can't reproduce this. Could you please give me access to your reproduction env? 
I assume we can get a better idea on where the EPERM comes from using stap/ftrace.

When attaching the snapshot to the backup VM, it looks like qemu is able to open the first image of the chain, and the subsequent one fails:

Could not open backing file: Could not open '/rhev/data-center/58bdc405-028f-00dc-0026-0000000000aa/a3a4dfce-ca1b-4453-a40b-dc43c4f2191a/images/1c28bd60-8556-40fd-8229-f52bdd51ca7a/25165986-adaa-4454-a7de-2d757b34e8f5': Operation not permitted

Comment 2 Raz Tamir 2017-12-19 12:02:45 UTC
(In reply to Germano Veit Michel from comment #1)
> These versions also reproduce the issue:
> 
> libvirt-daemon-kvm-3.2.0-14.el7_4.3.x86_64
> systemd-219-42.el7_4.4.x86_64
> vdsm-4.19.36-1.el7ev.x86_64
> 
> We also have report that the operation succeeds in case the backup vm
> (commvault) and the vm being snapshoted run on the same host.
> 
> Raz, somehow I can't reproduce this. Could you please give me access to your
> reproduction env? 
I'm not sure you are trying the same scenario mentioned in the original bug
What are the steps you are doing?
> I assume we can get a better idea on where the EPERM comes from using
> stap/ftrace.
> 
> When attaching the snapshot to the backup VM, it looks like qemu is able to
> open the first image of the chain, and the subsequent one fails:
> 
> Could not open backing file: Could not open
> '/rhev/data-center/58bdc405-028f-00dc-0026-0000000000aa/a3a4dfce-ca1b-4453-
> a40b-dc43c4f2191a/images/1c28bd60-8556-40fd-8229-f52bdd51ca7a/25165986-adaa-
> 4454-a7de-2d757b34e8f5': Operation not permitted

Comment 4 Germano Veit Michel 2017-12-20 04:59:12 UTC
(In reply to Raz Tamir from comment #2)
> I'm not sure you are trying the same scenario mentioned in the original bug
> What are the steps you are doing?

I'm using the backup api script in a loop (similar to what Commvault and our customers do). But all the snapshot attachments work. What are you using? This?

> On VM with 4 disks - A, B, C, D where A is the bootable disk with OS. After
> starting the VM, if we will hot-unplug the 3 disks in order B, C and D and
> after that will try to hotplug back disk C, it will fail with:

Comment 5 Raz Tamir 2017-12-20 08:35:30 UTC
(In reply to Germano Veit Michel from comment #4)
> (In reply to Raz Tamir from comment #2)
> > I'm not sure you are trying the same scenario mentioned in the original bug
> > What are the steps you are doing?
> 
> I'm using the backup api script in a loop (similar to what Commvault and our
> customers do). But all the snapshot attachments work. What are you using?
> This?
> 
> > On VM with 4 disks - A, B, C, D where A is the bootable disk with OS. After
> > starting the VM, if we will hot-unplug the 3 disks in order B, C and D and
> > after that will try to hotplug back disk C, it will fail with:

Yes,

You can try step 1 - 3 and only then, use the backup api method and attached the snapshot disk.  I think it will trigger the issue

Comment 6 Germano Veit Michel 2017-12-21 06:42:16 UTC
(In reply to Raz Tamir from comment #5)
> Yes,
> 
> You can try step 1 - 3 and only then, use the backup api method and attached
> the snapshot disk.  I think it will trigger the issue

Thanks Raz, I was finally able to reproduce it.

systemd-219-42.el7_4.4.x86_64
libvirt-daemon-kvm-3.2.0-14.el7_4.5.x86_64
vdsm-4.19.42-1.el7ev.x86_64
kernel-3.10.0-693.11.1.el7.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.11.x86_64

It seems only reproducible on Block Storage (on NFS I could not).

These are the conditions I found:
VM with 4 disks, A, B, C, D
Boot
Unplug B, C an D
All hotplugs will fail unless I hotplug B. If B is plugged, the other 2 seem to work.
May not be 100% accurate, but it's what I found after playing with it.

Moving forward looks like the problem appears for qemu with a open syscall returning EPERM, doesn't seem related to SELinux.

# getenforce 
Permissive

# strace -p 3521 -e trace=open
strace: Process 3521 attached
open("/rhev/data-center/00000002-0002-0002-0002-000000000010/18579785-4d47-4237-9f53-3a2344e0e475/images/21d315fe-4516-4219-b484-87f716f6c2e2/673f331c-a599-4d43-bd24-3355042b41b8", O_RDONLY|O_NONBLOCK|O_CLOEXEC) = -1 EPERM (Operation not permitted)
open("/rhev/data-center/00000002-0002-0002-0002-000000000010/18579785-4d47-4237-9f53-3a2344e0e475/images/21d315fe-4516-4219-b484-87f716f6c2e2/673f331c-a599-4d43-bd24-3355042b41b8", O_RDWR|O_DIRECT|O_CLOEXEC) = -1 EPERM (Operation not permitted)

So open returns -1. It's a bit weird that it's a EPERM (not ACCESS).

Then I created an audit rule to track "open", capture both a success and a failure opening the same disk:
# auditctl -a always,exit -F arch=b64 -F pid=3521 -S open -k qemu-open

FAIL:
time->Thu Dec 21 14:30:27 2017
type=PROCTITLE msg=audit(1513830627.873:3248): proctitle=2F7573722F6C6962657865632F71656D752D6B766D002D6E616D650067756573743D54657374564D2C64656275672D746872656164733D6F6E002D53002D6F626A656374007365637265742C69643D6D61737465724B6579302C666F726D61743D7261772C66696C653D2F7661722F6C69622F6C6962766972742F71656D752F
type=PATH msg=audit(1513830627.873:3248): item=0 name="/rhev/data-center/00000002-0002-0002-0002-000000000010/18579785-4d47-4237-9f53-3a2344e0e475/images/21d315fe-4516-4219-b484-87f716f6c2e2/673f331c-a599-4d43-bd24-3355042b41b8" inode=53420 dev=00:27 mode=060660 ouid=36 ogid=107 rdev=fc:09 obj=system_u:object_r:svirt_image_t:s0:c83,c485 objtype=NORMAL
type=CWD msg=audit(1513830627.873:3248):  cwd="/"
type=SYSCALL msg=audit(1513830627.873:3248): arch=c000003e syscall=2 success=no exit=-1 a0=5638be345340 a1=84002 a2=0 a3=7fd3d2803b00 items=1 ppid=1 pid=3521 auid=4294967295 uid=107 gid=107 euid=107 suid=107 fsuid=107 egid=107 sgid=107 fsgid=107 tty=(none) ses=4294967295 comm="qemu-kvm" exe="/usr/libexec/qemu-kvm" subj=system_u:system_r:svirt_t:s0:c83,c485 key="qemu-open"

SUCCESS:
time->Thu Dec 21 14:31:41 2017
type=PROCTITLE msg=audit(1513830701.500:3321): proctitle=2F7573722F6C6962657865632F71656D752D6B766D002D6E616D650067756573743D54657374564D2C64656275672D746872656164733D6F6E002D53002D6F626A656374007365637265742C69643D6D61737465724B6579302C666F726D61743D7261772C66696C653D2F7661722F6C69622F6C6962766972742F71656D752F
type=PATH msg=audit(1513830701.500:3321): item=0 name="/rhev/data-center/00000002-0002-0002-0002-000000000010/18579785-4d47-4237-9f53-3a2344e0e475/images/21d315fe-4516-4219-b484-87f716f6c2e2/673f331c-a599-4d43-bd24-3355042b41b8" inode=53420 dev=00:27 mode=060660 ouid=36 ogid=107 rdev=fc:09 obj=system_u:object_r:svirt_image_t:s0:c83,c485 objtype=NORMAL
type=CWD msg=audit(1513830701.500:3321):  cwd="/"
type=SYSCALL msg=audit(1513830701.500:3321): arch=c000003e syscall=2 success=yes exit=77 a0=5638be346730 a1=80800 a2=0 a3=ffffff80 items=1 ppid=1 pid=3521 auid=4294967295 uid=107 gid=107 euid=107 suid=107 fsuid=107 egid=107 sgid=107 fsgid=107 tty=(none) ses=4294967295 comm="qemu-kvm" exe="/usr/libexec/qemu-kvm" subj=system_u:system_r:svirt_t:s0:c83,c485 key="qemu-open"

It seems everything is exactly the same? One succeeds and one fails?

I tried to stap to try to get some more info, but couldn't get far enough with my current stap, pretty much just confirmed the problem:

probe kernel.function("sys_open").return {
    if (pid() != 2090)
        next
    if ($return != -1)
        next
    printf("open returned -1, performed by %s, pid %d\n", execname(), pid())
    printf("  %s\n", @entry($$parms))    
    next
}

open returned -1, performed by qemu-kvm, pid 2038
  filename=0x55beedebdad0 flags=0x80800 mode=0x0
open returned -1, performed by qemu-kvm, pid 2038
  filename=0x55beedebdad0 flags=0x84002 mode=0x0

I would get to get a "forward trace" of all the functions sys_open called to get an idea where the -1 is coming from. But with a return probe, that is not going to happen.
And ftrace is showing too much data (mostly selinux stuff), pretty much unreadable.

And this is where I stopped today. Any ideas how to move forward? I think I'll try downgrading some packages tomorrow. Looking at the customer cases, this seem to have started with 7.4?

Comment 7 Germano Veit Michel 2017-12-22 00:37:29 UTC
Tracked it down to cgroups within the kernel:

# cat qemu_open.stp 
probe kernel.function("do_sys_open").return {
    if (pid() != 2632) 
        next
    if ($return != -1)
        next
    printf("do_sys_open returned -1, performed by %s, pid %d\n", execname(), pid())
    printf("  %s\n", @entry($$parms))
}

probe kernel.function("__devcgroup_check_permission").return {
    if (pid() != 2632) 
        next
    if ($return != -1)
        next
    printf("__devcgroup_check_permission returned -1, performed by %s, pid %d\n", execname(), pid())
    printf("  %s\n", @entry($$parms))
}

During failure, we have this:

# stap qemu_open.stp 
__devcgroup_check_permission returned -1, performed by qemu-kvm, pid 2632
  type=0x1 major=0xfc minor=0x9 access=0x2
do_sys_open returned -1, performed by qemu-kvm, pid 2632
  dfd=0xffffffffffffff9c filename=0x56354c7dfa20 flags=0x88800 mode=0x0
__devcgroup_check_permission returned -1, performed by qemu-kvm, pid 2632
  type=0x1 major=0xfc minor=0x9 access=0x6
do_sys_open returned -1, performed by qemu-kvm, pid 2632
  dfd=0xffffffffffffff9c filename=0x56354c7dfa20 flags=0x8c002 mode=0x0

So __devcgroup_check_permission generates the EPERM, which propagates all the way back to sys_do_open, and qemu-kvm gets this:

# strace -p 2632 -e trace=open
strace: Process 2632 attached
open("/rhev/data-center/00000002-0002-0002-0002-000000000010/18579785-4d47-4237-9f53-3a2344e0e475/images/21d315fe-4516-4219-b484-87f716f6c2e2/673f331c-a599-4d43-bd24-3355042b41b8", O_RDONLY|O_NONBLOCK|O_CLOEXEC) = -1 EPERM (Operation not permitted)
open("/rhev/data-center/00000002-0002-0002-0002-000000000010/18579785-4d47-4237-9f53-3a2344e0e475/images/21d315fe-4516-4219-b484-87f716f6c2e2/673f331c-a599-4d43-bd24-3355042b41b8", O_RDWR|O_DIRECT|O_CLOEXEC) = -1 EPERM (Operation not permitted)

Here an idea on what happens within the kernel at that point:


qemu-kvm-2632  [000]   816.223260: funcgraph_entry:            |            may_open() {
qemu-kvm-2632  [000]   816.223260: funcgraph_entry:            |              inode_permission() {
qemu-kvm-2632  [000]   816.223261: funcgraph_entry:            |                __inode_permission() {
qemu-kvm-2632  [000]   816.223261: funcgraph_entry:            |                  generic_permission() {
qemu-kvm-2632  [000]   816.223261: funcgraph_entry: 0.038 us   |                    get_acl();
qemu-kvm-2632  [000]   816.223261: funcgraph_entry: 0.043 us   |                    in_group_p();
qemu-kvm-2632  [000]   816.223261: funcgraph_exit:  0.557 us   |                  }
qemu-kvm-2632  [000]   816.223262: funcgraph_entry:            |                  __devcgroup_inode_permission() {
qemu-kvm-2632  [000]   816.223262: funcgraph_entry:            |                    __devcgroup_check_permission() {
qemu-kvm-2632  [000]   816.223262: funcgraph_entry: 0.317 us   |                      match_exception();
qemu-kvm-2632  [000]   816.223262: funcgraph_exit:  0.597 us   |                    }
qemu-kvm-2632  [000]   816.223263: funcgraph_exit:  0.856 us   |                  }
qemu-kvm-2632  [000]   816.223263: funcgraph_exit:  2.018 us   |                }
qemu-kvm-2632  [000]   816.223263: funcgraph_exit:  2.339 us   |              }
qemu-kvm-2632  [000]   816.223263: funcgraph_exit:  2.636 us   |            }

qemu-kvm-2632  [000]   816.223272: sys_exit_open:        0xffffffffffffffff

So, looks like a libvirt bug with cgroups?

I'm going on PTO for 2 weeks and wont be able to continue this. But I assume the next step would be to track cgroup manipulation within libvirt or involve libvirt devel.

Comment 9 Eduardo Habkost 2017-12-26 18:16:53 UTC
Is there a sosreport from a host after the failure is reproduced, somewhere? I would like to take a look at the croup data.

Comment 11 Eduardo Habkost 2017-12-26 23:02:37 UTC
I just found out that the audit logs are broken due to a libvirt bug. See https://github.com/ehabkost/libvirt/commit/fa7b97da69595ec4b8992ccaacfe5a7347436d6a

Comment 12 Eduardo Habkost 2017-12-27 18:00:10 UTC
I need help from somebody familiar with VDSM to reproduce the bug. Is there anybody available this week who is able to reproduce the bug and can give me access to the host where it can be seen?

Comment 14 Eduardo Habkost 2017-12-28 11:25:25 UTC
Good news: I think I have just reproduced it using libvirt
directly, after activating/deactivating the LVM volumes on every
attach/detach operation.  I think the bug is caused by block
device major/minor number reuse when the devices are re-enabled
in a different order.

I will investigate further and report back soon.

Comment 15 Eduardo Habkost 2017-12-28 14:30:57 UTC
I confirm that this a bug on libvirt's namespace handling.  It's possible to work around the bug by adding "namespaces = [ ]" to /etc/libvirt/qemu.conf.

Comment 16 Eduardo Habkost 2017-12-28 18:30:53 UTC
Experimental fix at:
https://github.com/ehabkost/libvirt/commit/89f1a08b9518148f6a86600c0ded6f52886e44b4

Comment 17 Raz Tamir 2017-12-31 09:44:18 UTC
(In reply to Eduardo Habkost from comment #15)
> I confirm that this a bug on libvirt's namespace handling.  It's possible to
> work around the bug by adding "namespaces = [ ]" to /etc/libvirt/qemu.conf.

The suggested W/A worked for me:

1) added namespaces = [ ] to /etc/libvirt/qemu.conf
2) # systemctl restart vdsmd
3) # systemctl restart libvirtd
4) and performed the steps to reproduce from the original bug.

Comment 18 Allon Mureinik 2018-01-22 13:26:53 UTC
A proper fix depends on a libvirt fix that is not yet released. Pushing out to 4.1.10.

Comment 22 Raz Tamir 2018-02-23 13:12:39 UTC
This bug is no longer depends on platform side as all depended bug was addressed.

Daniel,

Can you take a look at this?

Comment 23 Allon Mureinik 2018-02-25 11:02:41 UTC
(In reply to Raz Tamir from comment #22)
> This bug is no longer depends on platform side as all depended bug was
> addressed.
> 
> Daniel,
> 
> Can you take a look at this?

The libvirt build containing the fix was not released yet.
Raz, did you test with an unreleased libvirt referenced in the platform bug? Could you clarify the above statement?

Comment 24 Raz Tamir 2018-02-25 16:33:17 UTC
(In reply to Allon Mureinik from comment #23)
> (In reply to Raz Tamir from comment #22)
> > This bug is no longer depends on platform side as all depended bug was
> > addressed.
> > 
> > Daniel,
> > 
> > Can you take a look at this?
> 
> The libvirt build containing the fix was not released yet.
> Raz, did you test with an unreleased libvirt referenced in the platform bug?
> Could you clarify the above statement?

Hi Allon,

Both 'depends on' bugs are 'VERIFIED'.
We are currently working with the latest snapshot from platform before GA.
When should we get it?

Comment 25 Allon Mureinik 2018-02-26 08:01:56 UTC
(In reply to Raz Tamir from comment #24)
> (In reply to Allon Mureinik from comment #23)
> > (In reply to Raz Tamir from comment #22)
> > > This bug is no longer depends on platform side as all depended bug was
> > > addressed.
> > > 
> > > Daniel,
> > > 
> > > Can you take a look at this?
> > 
> > The libvirt build containing the fix was not released yet.
> > Raz, did you test with an unreleased libvirt referenced in the platform bug?
> > Could you clarify the above statement?
> 
> Hi Allon,
> 
> Both 'depends on' bugs are 'VERIFIED'.
> We are currently working with the latest snapshot from platform before GA.
> When should we get it?

I think we may be muddling two different things here.

From a QE perspective - testing with pre-release libvirt bits is the right way to go, IMHO. If the newer libvirt solves the issue - great; if not - the problem should be escalated to libvirt's devs.

From a RHV dev perspective - there's no AI on RHV's code. The only AI we have is to require the newer libvirt version that contains the fix in vdsm's spec file (and even that is just a nicety - you can always upgrade libvirt yourself and gain the benefits of the newer version regardless of vdsm's requirements). We cannot update vdsm's spec to require a libvirt version that isn't available yet, as this will break the ability to install vdsm.

According to libvirt's 7.4.z bug, bz#1532183, their fix will be delivered by erratum RHBA-2018:32240-04. It is currently scheduled for March 6th.

Comment 26 Raz Tamir 2018-02-26 09:21:24 UTC
Thanks for clarification Allon

Comment 31 Raz Tamir 2018-03-15 16:13:57 UTC
Verified on rhvm-4.2.2.2-0.1.el7

Followed steps to reproduce - Pass

Comment 36 errata-xmlrpc 2018-05-15 17:54:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1489

Comment 37 Franta Kust 2019-05-16 13:04:43 UTC
BZ<2>Jira Resync

Comment 38 Daniel Gur 2019-08-28 13:12:20 UTC
sync2jira

Comment 39 Daniel Gur 2019-08-28 13:16:32 UTC
sync2jira


Note You need to log in before you can comment on or make changes to this bug.