These versions also reproduce the issue: libvirt-daemon-kvm-3.2.0-14.el7_4.3.x86_64 systemd-219-42.el7_4.4.x86_64 vdsm-4.19.36-1.el7ev.x86_64 We also have report that the operation succeeds in case the backup vm (commvault) and the vm being snapshoted run on the same host. Raz, somehow I can't reproduce this. Could you please give me access to your reproduction env? I assume we can get a better idea on where the EPERM comes from using stap/ftrace. When attaching the snapshot to the backup VM, it looks like qemu is able to open the first image of the chain, and the subsequent one fails: Could not open backing file: Could not open '/rhev/data-center/58bdc405-028f-00dc-0026-0000000000aa/a3a4dfce-ca1b-4453-a40b-dc43c4f2191a/images/1c28bd60-8556-40fd-8229-f52bdd51ca7a/25165986-adaa-4454-a7de-2d757b34e8f5': Operation not permitted
(In reply to Germano Veit Michel from comment #1) > These versions also reproduce the issue: > > libvirt-daemon-kvm-3.2.0-14.el7_4.3.x86_64 > systemd-219-42.el7_4.4.x86_64 > vdsm-4.19.36-1.el7ev.x86_64 > > We also have report that the operation succeeds in case the backup vm > (commvault) and the vm being snapshoted run on the same host. > > Raz, somehow I can't reproduce this. Could you please give me access to your > reproduction env? I'm not sure you are trying the same scenario mentioned in the original bug What are the steps you are doing? > I assume we can get a better idea on where the EPERM comes from using > stap/ftrace. > > When attaching the snapshot to the backup VM, it looks like qemu is able to > open the first image of the chain, and the subsequent one fails: > > Could not open backing file: Could not open > '/rhev/data-center/58bdc405-028f-00dc-0026-0000000000aa/a3a4dfce-ca1b-4453- > a40b-dc43c4f2191a/images/1c28bd60-8556-40fd-8229-f52bdd51ca7a/25165986-adaa- > 4454-a7de-2d757b34e8f5': Operation not permitted
(In reply to Raz Tamir from comment #2) > I'm not sure you are trying the same scenario mentioned in the original bug > What are the steps you are doing? I'm using the backup api script in a loop (similar to what Commvault and our customers do). But all the snapshot attachments work. What are you using? This? > On VM with 4 disks - A, B, C, D where A is the bootable disk with OS. After > starting the VM, if we will hot-unplug the 3 disks in order B, C and D and > after that will try to hotplug back disk C, it will fail with:
(In reply to Germano Veit Michel from comment #4) > (In reply to Raz Tamir from comment #2) > > I'm not sure you are trying the same scenario mentioned in the original bug > > What are the steps you are doing? > > I'm using the backup api script in a loop (similar to what Commvault and our > customers do). But all the snapshot attachments work. What are you using? > This? > > > On VM with 4 disks - A, B, C, D where A is the bootable disk with OS. After > > starting the VM, if we will hot-unplug the 3 disks in order B, C and D and > > after that will try to hotplug back disk C, it will fail with: Yes, You can try step 1 - 3 and only then, use the backup api method and attached the snapshot disk. I think it will trigger the issue
(In reply to Raz Tamir from comment #5) > Yes, > > You can try step 1 - 3 and only then, use the backup api method and attached > the snapshot disk. I think it will trigger the issue Thanks Raz, I was finally able to reproduce it. systemd-219-42.el7_4.4.x86_64 libvirt-daemon-kvm-3.2.0-14.el7_4.5.x86_64 vdsm-4.19.42-1.el7ev.x86_64 kernel-3.10.0-693.11.1.el7.x86_64 qemu-kvm-rhev-2.9.0-16.el7_4.11.x86_64 It seems only reproducible on Block Storage (on NFS I could not). These are the conditions I found: VM with 4 disks, A, B, C, D Boot Unplug B, C an D All hotplugs will fail unless I hotplug B. If B is plugged, the other 2 seem to work. May not be 100% accurate, but it's what I found after playing with it. Moving forward looks like the problem appears for qemu with a open syscall returning EPERM, doesn't seem related to SELinux. # getenforce Permissive # strace -p 3521 -e trace=open strace: Process 3521 attached open("/rhev/data-center/00000002-0002-0002-0002-000000000010/18579785-4d47-4237-9f53-3a2344e0e475/images/21d315fe-4516-4219-b484-87f716f6c2e2/673f331c-a599-4d43-bd24-3355042b41b8", O_RDONLY|O_NONBLOCK|O_CLOEXEC) = -1 EPERM (Operation not permitted) open("/rhev/data-center/00000002-0002-0002-0002-000000000010/18579785-4d47-4237-9f53-3a2344e0e475/images/21d315fe-4516-4219-b484-87f716f6c2e2/673f331c-a599-4d43-bd24-3355042b41b8", O_RDWR|O_DIRECT|O_CLOEXEC) = -1 EPERM (Operation not permitted) So open returns -1. It's a bit weird that it's a EPERM (not ACCESS). Then I created an audit rule to track "open", capture both a success and a failure opening the same disk: # auditctl -a always,exit -F arch=b64 -F pid=3521 -S open -k qemu-open FAIL: time->Thu Dec 21 14:30:27 2017 type=PROCTITLE msg=audit(1513830627.873:3248): proctitle=2F7573722F6C6962657865632F71656D752D6B766D002D6E616D650067756573743D54657374564D2C64656275672D746872656164733D6F6E002D53002D6F626A656374007365637265742C69643D6D61737465724B6579302C666F726D61743D7261772C66696C653D2F7661722F6C69622F6C6962766972742F71656D752F type=PATH msg=audit(1513830627.873:3248): item=0 name="/rhev/data-center/00000002-0002-0002-0002-000000000010/18579785-4d47-4237-9f53-3a2344e0e475/images/21d315fe-4516-4219-b484-87f716f6c2e2/673f331c-a599-4d43-bd24-3355042b41b8" inode=53420 dev=00:27 mode=060660 ouid=36 ogid=107 rdev=fc:09 obj=system_u:object_r:svirt_image_t:s0:c83,c485 objtype=NORMAL type=CWD msg=audit(1513830627.873:3248): cwd="/" type=SYSCALL msg=audit(1513830627.873:3248): arch=c000003e syscall=2 success=no exit=-1 a0=5638be345340 a1=84002 a2=0 a3=7fd3d2803b00 items=1 ppid=1 pid=3521 auid=4294967295 uid=107 gid=107 euid=107 suid=107 fsuid=107 egid=107 sgid=107 fsgid=107 tty=(none) ses=4294967295 comm="qemu-kvm" exe="/usr/libexec/qemu-kvm" subj=system_u:system_r:svirt_t:s0:c83,c485 key="qemu-open" SUCCESS: time->Thu Dec 21 14:31:41 2017 type=PROCTITLE msg=audit(1513830701.500:3321): proctitle=2F7573722F6C6962657865632F71656D752D6B766D002D6E616D650067756573743D54657374564D2C64656275672D746872656164733D6F6E002D53002D6F626A656374007365637265742C69643D6D61737465724B6579302C666F726D61743D7261772C66696C653D2F7661722F6C69622F6C6962766972742F71656D752F type=PATH msg=audit(1513830701.500:3321): item=0 name="/rhev/data-center/00000002-0002-0002-0002-000000000010/18579785-4d47-4237-9f53-3a2344e0e475/images/21d315fe-4516-4219-b484-87f716f6c2e2/673f331c-a599-4d43-bd24-3355042b41b8" inode=53420 dev=00:27 mode=060660 ouid=36 ogid=107 rdev=fc:09 obj=system_u:object_r:svirt_image_t:s0:c83,c485 objtype=NORMAL type=CWD msg=audit(1513830701.500:3321): cwd="/" type=SYSCALL msg=audit(1513830701.500:3321): arch=c000003e syscall=2 success=yes exit=77 a0=5638be346730 a1=80800 a2=0 a3=ffffff80 items=1 ppid=1 pid=3521 auid=4294967295 uid=107 gid=107 euid=107 suid=107 fsuid=107 egid=107 sgid=107 fsgid=107 tty=(none) ses=4294967295 comm="qemu-kvm" exe="/usr/libexec/qemu-kvm" subj=system_u:system_r:svirt_t:s0:c83,c485 key="qemu-open" It seems everything is exactly the same? One succeeds and one fails? I tried to stap to try to get some more info, but couldn't get far enough with my current stap, pretty much just confirmed the problem: probe kernel.function("sys_open").return { if (pid() != 2090) next if ($return != -1) next printf("open returned -1, performed by %s, pid %d\n", execname(), pid()) printf(" %s\n", @entry($$parms)) next } open returned -1, performed by qemu-kvm, pid 2038 filename=0x55beedebdad0 flags=0x80800 mode=0x0 open returned -1, performed by qemu-kvm, pid 2038 filename=0x55beedebdad0 flags=0x84002 mode=0x0 I would get to get a "forward trace" of all the functions sys_open called to get an idea where the -1 is coming from. But with a return probe, that is not going to happen. And ftrace is showing too much data (mostly selinux stuff), pretty much unreadable. And this is where I stopped today. Any ideas how to move forward? I think I'll try downgrading some packages tomorrow. Looking at the customer cases, this seem to have started with 7.4?
Tracked it down to cgroups within the kernel: # cat qemu_open.stp probe kernel.function("do_sys_open").return { if (pid() != 2632) next if ($return != -1) next printf("do_sys_open returned -1, performed by %s, pid %d\n", execname(), pid()) printf(" %s\n", @entry($$parms)) } probe kernel.function("__devcgroup_check_permission").return { if (pid() != 2632) next if ($return != -1) next printf("__devcgroup_check_permission returned -1, performed by %s, pid %d\n", execname(), pid()) printf(" %s\n", @entry($$parms)) } During failure, we have this: # stap qemu_open.stp __devcgroup_check_permission returned -1, performed by qemu-kvm, pid 2632 type=0x1 major=0xfc minor=0x9 access=0x2 do_sys_open returned -1, performed by qemu-kvm, pid 2632 dfd=0xffffffffffffff9c filename=0x56354c7dfa20 flags=0x88800 mode=0x0 __devcgroup_check_permission returned -1, performed by qemu-kvm, pid 2632 type=0x1 major=0xfc minor=0x9 access=0x6 do_sys_open returned -1, performed by qemu-kvm, pid 2632 dfd=0xffffffffffffff9c filename=0x56354c7dfa20 flags=0x8c002 mode=0x0 So __devcgroup_check_permission generates the EPERM, which propagates all the way back to sys_do_open, and qemu-kvm gets this: # strace -p 2632 -e trace=open strace: Process 2632 attached open("/rhev/data-center/00000002-0002-0002-0002-000000000010/18579785-4d47-4237-9f53-3a2344e0e475/images/21d315fe-4516-4219-b484-87f716f6c2e2/673f331c-a599-4d43-bd24-3355042b41b8", O_RDONLY|O_NONBLOCK|O_CLOEXEC) = -1 EPERM (Operation not permitted) open("/rhev/data-center/00000002-0002-0002-0002-000000000010/18579785-4d47-4237-9f53-3a2344e0e475/images/21d315fe-4516-4219-b484-87f716f6c2e2/673f331c-a599-4d43-bd24-3355042b41b8", O_RDWR|O_DIRECT|O_CLOEXEC) = -1 EPERM (Operation not permitted) Here an idea on what happens within the kernel at that point: qemu-kvm-2632 [000] 816.223260: funcgraph_entry: | may_open() { qemu-kvm-2632 [000] 816.223260: funcgraph_entry: | inode_permission() { qemu-kvm-2632 [000] 816.223261: funcgraph_entry: | __inode_permission() { qemu-kvm-2632 [000] 816.223261: funcgraph_entry: | generic_permission() { qemu-kvm-2632 [000] 816.223261: funcgraph_entry: 0.038 us | get_acl(); qemu-kvm-2632 [000] 816.223261: funcgraph_entry: 0.043 us | in_group_p(); qemu-kvm-2632 [000] 816.223261: funcgraph_exit: 0.557 us | } qemu-kvm-2632 [000] 816.223262: funcgraph_entry: | __devcgroup_inode_permission() { qemu-kvm-2632 [000] 816.223262: funcgraph_entry: | __devcgroup_check_permission() { qemu-kvm-2632 [000] 816.223262: funcgraph_entry: 0.317 us | match_exception(); qemu-kvm-2632 [000] 816.223262: funcgraph_exit: 0.597 us | } qemu-kvm-2632 [000] 816.223263: funcgraph_exit: 0.856 us | } qemu-kvm-2632 [000] 816.223263: funcgraph_exit: 2.018 us | } qemu-kvm-2632 [000] 816.223263: funcgraph_exit: 2.339 us | } qemu-kvm-2632 [000] 816.223263: funcgraph_exit: 2.636 us | } qemu-kvm-2632 [000] 816.223272: sys_exit_open: 0xffffffffffffffff So, looks like a libvirt bug with cgroups? I'm going on PTO for 2 weeks and wont be able to continue this. But I assume the next step would be to track cgroup manipulation within libvirt or involve libvirt devel.
Is there a sosreport from a host after the failure is reproduced, somewhere? I would like to take a look at the croup data.
I just found out that the audit logs are broken due to a libvirt bug. See https://github.com/ehabkost/libvirt/commit/fa7b97da69595ec4b8992ccaacfe5a7347436d6a
I need help from somebody familiar with VDSM to reproduce the bug. Is there anybody available this week who is able to reproduce the bug and can give me access to the host where it can be seen?
Good news: I think I have just reproduced it using libvirt directly, after activating/deactivating the LVM volumes on every attach/detach operation. I think the bug is caused by block device major/minor number reuse when the devices are re-enabled in a different order. I will investigate further and report back soon.
I confirm that this a bug on libvirt's namespace handling. It's possible to work around the bug by adding "namespaces = [ ]" to /etc/libvirt/qemu.conf.
Experimental fix at: https://github.com/ehabkost/libvirt/commit/89f1a08b9518148f6a86600c0ded6f52886e44b4
(In reply to Eduardo Habkost from comment #15) > I confirm that this a bug on libvirt's namespace handling. It's possible to > work around the bug by adding "namespaces = [ ]" to /etc/libvirt/qemu.conf. The suggested W/A worked for me: 1) added namespaces = [ ] to /etc/libvirt/qemu.conf 2) # systemctl restart vdsmd 3) # systemctl restart libvirtd 4) and performed the steps to reproduce from the original bug.
A proper fix depends on a libvirt fix that is not yet released. Pushing out to 4.1.10.
This bug is no longer depends on platform side as all depended bug was addressed. Daniel, Can you take a look at this?
(In reply to Raz Tamir from comment #22) > This bug is no longer depends on platform side as all depended bug was > addressed. > > Daniel, > > Can you take a look at this? The libvirt build containing the fix was not released yet. Raz, did you test with an unreleased libvirt referenced in the platform bug? Could you clarify the above statement?
(In reply to Allon Mureinik from comment #23) > (In reply to Raz Tamir from comment #22) > > This bug is no longer depends on platform side as all depended bug was > > addressed. > > > > Daniel, > > > > Can you take a look at this? > > The libvirt build containing the fix was not released yet. > Raz, did you test with an unreleased libvirt referenced in the platform bug? > Could you clarify the above statement? Hi Allon, Both 'depends on' bugs are 'VERIFIED'. We are currently working with the latest snapshot from platform before GA. When should we get it?
(In reply to Raz Tamir from comment #24) > (In reply to Allon Mureinik from comment #23) > > (In reply to Raz Tamir from comment #22) > > > This bug is no longer depends on platform side as all depended bug was > > > addressed. > > > > > > Daniel, > > > > > > Can you take a look at this? > > > > The libvirt build containing the fix was not released yet. > > Raz, did you test with an unreleased libvirt referenced in the platform bug? > > Could you clarify the above statement? > > Hi Allon, > > Both 'depends on' bugs are 'VERIFIED'. > We are currently working with the latest snapshot from platform before GA. > When should we get it? I think we may be muddling two different things here. From a QE perspective - testing with pre-release libvirt bits is the right way to go, IMHO. If the newer libvirt solves the issue - great; if not - the problem should be escalated to libvirt's devs. From a RHV dev perspective - there's no AI on RHV's code. The only AI we have is to require the newer libvirt version that contains the fix in vdsm's spec file (and even that is just a nicety - you can always upgrade libvirt yourself and gain the benefits of the newer version regardless of vdsm's requirements). We cannot update vdsm's spec to require a libvirt version that isn't available yet, as this will break the ability to install vdsm. According to libvirt's 7.4.z bug, bz#1532183, their fix will be delivered by erratum RHBA-2018:32240-04. It is currently scheduled for March 6th.
Thanks for clarification Allon
Verified on rhvm-4.2.2.2-0.1.el7 Followed steps to reproduce - Pass
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:1489
BZ<2>Jira Resync
sync2jira