+++ This bug was initially created as a clone of Bug #1404952 +++ +++ This bug was initially created as a clone of Bug #1354251 +++ Description of problem: A VM is unable to migrate from Host A to B due to EACCESS on a block device. 1. Destination (B) qemu process is created, opens block device DEV 2. In final stages of migration, qemu process closes device DEV 3. udev event is triggered rewriting the permissions of DEV 4. qemu process opens again device DEV before actually running the VM 5. qemu process fails to open DEV, receiving EACCESS as the permissions changed. Version-Release number of selected component (if applicable): udev-147-2.51.el6.x86_64 libvirt-0.10.2-29.el6_5.12.x86_64 qemu-kvm-0.12.1.2-2.415.el6_5.15.x86_64 (in function drives_reopen) How reproducible: 100% - thank you Roman Hodain/Gordon Watson Steps to Reproduce: # cat /etc/udev/rules.d/51-qemu.rules KERNEL=="sd*", GROUP="qemu" # chown root:root /dev/sd*; ls -l /dev/sd*; ./test.out /dev/sda; sleep 1; ls -l /dev/sd*; brw-rw----. 1 root root 8, 0 Jul 10 03:12 /dev/sda brw-rw----. 1 root root 8, 16 Jul 10 03:14 /dev/sdb Opening device /dev/sda Closing device /dev/sda brw-rw----. 1 root qemu 8, 0 Jul 10 03:29 /dev/sda brw-rw----. 1 root root 8, 16 Jul 10 03:14 /dev/sdb ------------------------------------------------------------------------------ #include <stdio.h> #include <fcntl.h> int main ( int argc, char **argv ) { printf("Opening device %s\n", argv[1]); int fd = open(argv[1], O_RDWR); if (fd<0) { printf("Unable to open %s", argv[1]); return(1); } printf("Closing device %s\n", argv[1]); close(fd); return 0; } Actual results: Permissions changed Expected results: Permissions not changed Additional info: Are RHEL6 and RHEL7 missing this rule in 50-udev-default.rules? > ACTION!="add", GOTO="default_end" https://cgit.freedesktop.org/systemd/systemd/commit/?id=22582bb2cbe85b40de5f561589e0468dac769515 The rule which is causing trouble in this case is: SUBSYSTEM=="block", GROUP="disk" From our tests until now, this can be easily worked around by adding the qemu process to the disk group. However, what is the correct behavior here? Should udev rewrite it or not? I don't see how libvirt can prevent it from happening. Perhaps there is, but I cannot see any libudev mechanism to prevent this. And since this happens within qemu (it's really a close followed by an open) I am not sure how libvirt could be modified to reapply these permissions mid-flight. Similar issues and related information: https://www.redhat.com/archives/libvir-list/2011-April/msg00113.html https://bugzilla.redhat.com/show_bug.cgi?id=588348 --- Additional comment from Michal Sekletar on 2016-07-11 14:11:49 CEST --- There is an earlier commit that actually introduced this behaviour, i.e. not changing permissions on change events [1]. Commit you linked in bug description is only a rework of that. Conceptually, it made possible to change permissions of a device node on other event types as well, but introduced check in the rules file that prevents resetting permissions on change events. This was previously handled directly in the C code. That being said, I think we can also introduce such change in rules. Basically we would only add 'ACTION!="add", GOTO="default_end"' at appropriate place and jump label LABEL="default_end" at the end of file. We don't need part of the patch that touches C code because we don't have that code in RHEL6 version of udev. [1] https://github.com/systemd/systemd/commit/48a849ee17fb25e0001bfcc0f28a4aa633d016a1 --- Additional comment from Michal Privoznik on 2016-10-26 14:38:12 CEST --- First round of the patches proposed on the upstream list: https://www.redhat.com/archives/libvir-list/2016-October/msg01151.html --- Additional comment from Michal Privoznik on 2016-11-24 15:50:17 CET --- Patches proposed upstream: https://www.redhat.com/archives/libvir-list/2016-November/msg01208.html However, the change is fairly invasive, therefore I don't think we can guarantee stable package if they are backported. --- Additional comment from Michal Privoznik on 2016-12-07 09:40:33 CET --- v2 proposed upstream: https://www.redhat.com/archives/libvir-list/2016-December/msg00231.html
The set of patches that resolve this issue is a major change at the core of libvirt, and has a chance of breaking upper layers. https://www.redhat.com/archives/libvir-list/2016-December/msg00231.html -- [libvirt] [PATCH v1 00/21] Run qemu under its own namespace The above set of patches are available from libvirt v2.5.0 and above on-wards (v2.5.0-108-gbb4e529). However, we should rely on a version higher than that, that also has related important bug fixes backported to it -- the version that is shipped in RHEL 7.4. RHEL 7.4 ships with: libvirt 3.2.0 So the action item for Nova is to test with the RHEL 7.4 libvirt and ensure nothing breaks in the Nova's Virt Driver & related code paths
The action item here is to test downstream (latest) with libvirt 3.2.0. The current MIN_LIBVIRT_VERSION on RHOS-12 (from nova/virt/libvirt/driver.py): MIN_LIBVIRT_VERSION = (1, 2, 9) The next minimum version (as of this writing) is NEXT_MIN_LIBVIRT_VERSION = (1, 3, 1) The impact of this will only get to be known once Nova should starts gating on libvirt 3.2.0, which seems to be in the *distant* future.