Bug 1540872

Summary: Hotplug disk failing due to cgroups
Product: [Fedora] Fedora Reporter: Daniel Erez <derez>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 27CC: agedosier, berrange, clalancette, crobinso, itamar, jforbes, laine, libvirt-maint, mprivozn, nsoffer, pkrempa, veillard
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-26 14:42:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1469235    

Description Daniel Erez 2018-02-01 08:53:34 UTC
Description of problem:

Hotplugging a disk fails due to -EPERM.

Version-Release number of selected component (if applicable):
systemd-219-42.el7_4.4.x86_64
libvirt-daemon-kvm-3.2.0-14.el7_4.5.x86_64
vdsm-4.19.42-1.el7ev.x86_64
kernel-3.10.0-693.11.1.el7.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.11.x86_64

Steps to Reproduce:
1. Start guest with 4 Disks (A,B,C,D block storage)
2. Hotunplug B, C and D
3. Hotplug B.

Actual results:
Hotplug fails, qemu-kvm open(2) gets EPERM.

Expected results:
Hotplug works.

Additional info:

# cat qemu_open.stp 
probe kernel.function("do_sys_open").return {
    if (pid() != 2632) 
        next
    if ($return != -1)
        next
    printf("do_sys_open returned -1, performed by %s, pid %d\n", execname(), pid())
    printf("  %s\n", @entry($$parms))
}

probe kernel.function("__devcgroup_check_permission").return {
    if (pid() != 2632) 
        next
    if ($return != -1)
        next
    printf("__devcgroup_check_permission returned -1, performed by %s, pid %d\n", execname(), pid())
    printf("  %s\n", @entry($$parms))
}

During failure, we have this:

# stap qemu_open.stp 
__devcgroup_check_permission returned -1, performed by qemu-kvm, pid 2632
  type=0x1 major=0xfc minor=0x9 access=0x2
do_sys_open returned -1, performed by qemu-kvm, pid 2632
  dfd=0xffffffffffffff9c filename=0x56354c7dfa20 flags=0x88800 mode=0x0
__devcgroup_check_permission returned -1, performed by qemu-kvm, pid 2632
  type=0x1 major=0xfc minor=0x9 access=0x6
do_sys_open returned -1, performed by qemu-kvm, pid 2632
  dfd=0xffffffffffffff9c filename=0x56354c7dfa20 flags=0x8c002 mode=0x0

So __devcgroup_check_permission generates the EPERM, which propagates all the way back to sys_do_open, and qemu-kvm gets this:

# strace -p 2632 -e trace=open
strace: Process 2632 attached
open("/rhev/data-center/00000002-0002-0002-0002-000000000010/18579785-4d47-4237-9f53-3a2344e0e475/images/21d315fe-4516-4219-b484-87f716f6c2e2/673f331c-a599-4d43-bd24-3355042b41b8", O_RDONLY|O_NONBLOCK|O_CLOEXEC) = -1 EPERM (Operation not permitted)
open("/rhev/data-center/00000002-0002-0002-0002-000000000010/18579785-4d47-4237-9f53-3a2344e0e475/images/21d315fe-4516-4219-b484-87f716f6c2e2/673f331c-a599-4d43-bd24-3355042b41b8", O_RDWR|O_DIRECT|O_CLOEXEC) = -1 EPERM (Operation not permitted)

Here an idea on what happens within the kernel at that point:


qemu-kvm-2632  [000]   816.223260: funcgraph_entry:            |            may_open() {
qemu-kvm-2632  [000]   816.223260: funcgraph_entry:            |              inode_permission() {
qemu-kvm-2632  [000]   816.223261: funcgraph_entry:            |                __inode_permission() {
qemu-kvm-2632  [000]   816.223261: funcgraph_entry:            |                  generic_permission() {
qemu-kvm-2632  [000]   816.223261: funcgraph_entry: 0.038 us   |                    get_acl();
qemu-kvm-2632  [000]   816.223261: funcgraph_entry: 0.043 us   |                    in_group_p();
qemu-kvm-2632  [000]   816.223261: funcgraph_exit:  0.557 us   |                  }
qemu-kvm-2632  [000]   816.223262: funcgraph_entry:            |                  __devcgroup_inode_permission() {
qemu-kvm-2632  [000]   816.223262: funcgraph_entry:            |                    __devcgroup_check_permission() {
qemu-kvm-2632  [000]   816.223262: funcgraph_entry: 0.317 us   |                      match_exception();
qemu-kvm-2632  [000]   816.223262: funcgraph_exit:  0.597 us   |                    }
qemu-kvm-2632  [000]   816.223263: funcgraph_exit:  0.856 us   |                  }
qemu-kvm-2632  [000]   816.223263: funcgraph_exit:  2.018 us   |                }
qemu-kvm-2632  [000]   816.223263: funcgraph_exit:  2.339 us   |              }
qemu-kvm-2632  [000]   816.223263: funcgraph_exit:  2.636 us   |            }

qemu-kvm-2632  [000]   816.223272: sys_exit_open:        0xffffffffffffffff

Comment 1 Peter Krempa 2018-02-01 09:01:01 UTC
The following commit should fix the issue:

commit db98e7f67ea0d7699410f514f01947cef5128a6c
Author:     Michal Privoznik <mprivozn>
AuthorDate: Thu Jan 4 11:11:53 2018 +0100
Commit:     Michal Privoznik <mprivozn>
CommitDate: Mon Jan 8 09:53:48 2018 +0100

    qemuDomainAttachDeviceMknodHelper: Remove symlink before creating it

    https://bugzilla.redhat.com/show_bug.cgi?id=1528502

    So imagine you have /dev/blah symlink which points to /dev/sda.
    You attach /dev/blah as disk to your domain. Libvirt correctly
    creates the /dev/blah -> /dev/sda symlink in the qemu namespace.
    However, then you detach the disk, change the symlink so that it
    points to /dev/sdb and tries to attach the disk again. This time,
    however, the attach fails (well, qemu attaches wrong disk)
    because the code assumes that symlinks don't change. Well they
    do.

    This is inspired by test fix written by Eduardo Habkost.

    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Andrea Bolognani <abologna>

v3.10.0-132-gdb98e7f67

Comment 2 Nir Soffer 2018-02-11 22:10:14 UTC
Michal, should this be MERGED?

Comment 3 Daniel Erez 2018-02-12 09:19:00 UTC
(In reply to Nir Soffer from comment #2)
> Michal, should this be MERGED?

I guess Nir meant MODIFIED :)

Also, which version of libvirt-python should we require now?
For libvirt-daemon-kvm it's >= 3.10.0-132, right?

Comment 4 Michal Privoznik 2018-02-12 12:39:02 UTC
I think this bug is only waiting for new Fedora build. Cole?

Comment 5 Fedora Update System 2018-02-13 20:32:12 UTC
libvirt-3.7.0-4.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-b22d46eabb

Comment 6 Fedora Update System 2018-02-14 18:27:46 UTC
libvirt-3.7.0-4.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-b22d46eabb

Comment 7 Cole Robinson 2018-02-14 23:11:21 UTC
Build pushed now, sorry for the delay

Comment 8 Nir Soffer 2018-02-28 00:34:41 UTC
Michal, is this fix available in libvirt-3.7.0-4.fc27?

Comment 9 Michal Privoznik 2018-02-28 06:45:06 UTC
(In reply to Nir Soffer from comment #8)
> Michal, is this fix available in libvirt-3.7.0-4.fc27?

Yes, it is.

Comment 10 Fedora Update System 2018-03-01 16:23:03 UTC
libvirt-3.7.0-4.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.