Bug 1540872 - Hotplug disk failing due to cgroups
Summary: Hotplug disk failing due to cgroups
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: libvirt
Version: 27
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Michal Privoznik
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1469235
TreeView+ depends on / blocked
 
Reported: 2018-02-01 08:53 UTC by Daniel Erez
Modified: 2018-06-26 14:42 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-26 14:42:42 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Daniel Erez 2018-02-01 08:53:34 UTC
Description of problem:

Hotplugging a disk fails due to -EPERM.

Version-Release number of selected component (if applicable):
systemd-219-42.el7_4.4.x86_64
libvirt-daemon-kvm-3.2.0-14.el7_4.5.x86_64
vdsm-4.19.42-1.el7ev.x86_64
kernel-3.10.0-693.11.1.el7.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.11.x86_64

Steps to Reproduce:
1. Start guest with 4 Disks (A,B,C,D block storage)
2. Hotunplug B, C and D
3. Hotplug B.

Actual results:
Hotplug fails, qemu-kvm open(2) gets EPERM.

Expected results:
Hotplug works.

Additional info:

# cat qemu_open.stp 
probe kernel.function("do_sys_open").return {
    if (pid() != 2632) 
        next
    if ($return != -1)
        next
    printf("do_sys_open returned -1, performed by %s, pid %d\n", execname(), pid())
    printf("  %s\n", @entry($$parms))
}

probe kernel.function("__devcgroup_check_permission").return {
    if (pid() != 2632) 
        next
    if ($return != -1)
        next
    printf("__devcgroup_check_permission returned -1, performed by %s, pid %d\n", execname(), pid())
    printf("  %s\n", @entry($$parms))
}

During failure, we have this:

# stap qemu_open.stp 
__devcgroup_check_permission returned -1, performed by qemu-kvm, pid 2632
  type=0x1 major=0xfc minor=0x9 access=0x2
do_sys_open returned -1, performed by qemu-kvm, pid 2632
  dfd=0xffffffffffffff9c filename=0x56354c7dfa20 flags=0x88800 mode=0x0
__devcgroup_check_permission returned -1, performed by qemu-kvm, pid 2632
  type=0x1 major=0xfc minor=0x9 access=0x6
do_sys_open returned -1, performed by qemu-kvm, pid 2632
  dfd=0xffffffffffffff9c filename=0x56354c7dfa20 flags=0x8c002 mode=0x0

So __devcgroup_check_permission generates the EPERM, which propagates all the way back to sys_do_open, and qemu-kvm gets this:

# strace -p 2632 -e trace=open
strace: Process 2632 attached
open("/rhev/data-center/00000002-0002-0002-0002-000000000010/18579785-4d47-4237-9f53-3a2344e0e475/images/21d315fe-4516-4219-b484-87f716f6c2e2/673f331c-a599-4d43-bd24-3355042b41b8", O_RDONLY|O_NONBLOCK|O_CLOEXEC) = -1 EPERM (Operation not permitted)
open("/rhev/data-center/00000002-0002-0002-0002-000000000010/18579785-4d47-4237-9f53-3a2344e0e475/images/21d315fe-4516-4219-b484-87f716f6c2e2/673f331c-a599-4d43-bd24-3355042b41b8", O_RDWR|O_DIRECT|O_CLOEXEC) = -1 EPERM (Operation not permitted)

Here an idea on what happens within the kernel at that point:


qemu-kvm-2632  [000]   816.223260: funcgraph_entry:            |            may_open() {
qemu-kvm-2632  [000]   816.223260: funcgraph_entry:            |              inode_permission() {
qemu-kvm-2632  [000]   816.223261: funcgraph_entry:            |                __inode_permission() {
qemu-kvm-2632  [000]   816.223261: funcgraph_entry:            |                  generic_permission() {
qemu-kvm-2632  [000]   816.223261: funcgraph_entry: 0.038 us   |                    get_acl();
qemu-kvm-2632  [000]   816.223261: funcgraph_entry: 0.043 us   |                    in_group_p();
qemu-kvm-2632  [000]   816.223261: funcgraph_exit:  0.557 us   |                  }
qemu-kvm-2632  [000]   816.223262: funcgraph_entry:            |                  __devcgroup_inode_permission() {
qemu-kvm-2632  [000]   816.223262: funcgraph_entry:            |                    __devcgroup_check_permission() {
qemu-kvm-2632  [000]   816.223262: funcgraph_entry: 0.317 us   |                      match_exception();
qemu-kvm-2632  [000]   816.223262: funcgraph_exit:  0.597 us   |                    }
qemu-kvm-2632  [000]   816.223263: funcgraph_exit:  0.856 us   |                  }
qemu-kvm-2632  [000]   816.223263: funcgraph_exit:  2.018 us   |                }
qemu-kvm-2632  [000]   816.223263: funcgraph_exit:  2.339 us   |              }
qemu-kvm-2632  [000]   816.223263: funcgraph_exit:  2.636 us   |            }

qemu-kvm-2632  [000]   816.223272: sys_exit_open:        0xffffffffffffffff

Comment 1 Peter Krempa 2018-02-01 09:01:01 UTC
The following commit should fix the issue:

commit db98e7f67ea0d7699410f514f01947cef5128a6c
Author:     Michal Privoznik <mprivozn>
AuthorDate: Thu Jan 4 11:11:53 2018 +0100
Commit:     Michal Privoznik <mprivozn>
CommitDate: Mon Jan 8 09:53:48 2018 +0100

    qemuDomainAttachDeviceMknodHelper: Remove symlink before creating it

    https://bugzilla.redhat.com/show_bug.cgi?id=1528502

    So imagine you have /dev/blah symlink which points to /dev/sda.
    You attach /dev/blah as disk to your domain. Libvirt correctly
    creates the /dev/blah -> /dev/sda symlink in the qemu namespace.
    However, then you detach the disk, change the symlink so that it
    points to /dev/sdb and tries to attach the disk again. This time,
    however, the attach fails (well, qemu attaches wrong disk)
    because the code assumes that symlinks don't change. Well they
    do.

    This is inspired by test fix written by Eduardo Habkost.

    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Andrea Bolognani <abologna>

v3.10.0-132-gdb98e7f67

Comment 2 Nir Soffer 2018-02-11 22:10:14 UTC
Michal, should this be MERGED?

Comment 3 Daniel Erez 2018-02-12 09:19:00 UTC
(In reply to Nir Soffer from comment #2)
> Michal, should this be MERGED?

I guess Nir meant MODIFIED :)

Also, which version of libvirt-python should we require now?
For libvirt-daemon-kvm it's >= 3.10.0-132, right?

Comment 4 Michal Privoznik 2018-02-12 12:39:02 UTC
I think this bug is only waiting for new Fedora build. Cole?

Comment 5 Fedora Update System 2018-02-13 20:32:12 UTC
libvirt-3.7.0-4.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-b22d46eabb

Comment 6 Fedora Update System 2018-02-14 18:27:46 UTC
libvirt-3.7.0-4.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-b22d46eabb

Comment 7 Cole Robinson 2018-02-14 23:11:21 UTC
Build pushed now, sorry for the delay

Comment 8 Nir Soffer 2018-02-28 00:34:41 UTC
Michal, is this fix available in libvirt-3.7.0-4.fc27?

Comment 9 Michal Privoznik 2018-02-28 06:45:06 UTC
(In reply to Nir Soffer from comment #8)
> Michal, is this fix available in libvirt-3.7.0-4.fc27?

Yes, it is.

Comment 10 Fedora Update System 2018-03-01 16:23:03 UTC
libvirt-3.7.0-4.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.