Bug 1775679

Summary: unable to execute QEMU command '__com.redhat_drive_add': could not open disk image "Operation not permitted" [rhel-7.6.z]
Product: Red Hat Enterprise Linux 7 Reporter: RAD team bot copy to z-stream <autobot-eus-copy>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED ERRATA QA Contact: Han Han <hhan>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.6CC: amashah, chhudson, dyuan, gveitmic, hhan, hreitz, jdenemar, jinzhao, jsuchane, juzhang, kcleveng, kwolf, libvirt-maint, lmen, mprivozn, mtessun, qinwang, virt-maint, xuzhang
Target Milestone: rcKeywords: Upstream, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-4.5.0-10.el7_6.15 Doc Type: Bug Fix
Doc Text:
Cause: To enhance security and work around some races with udev (or any other software that mangles SELinux labels on /dev/* nodes whilst a domain is running) libvirt spawns each domain in its own, private namespace with a private /dev. This puts additional burden on libvirt which then has to update /dev/* nodes on some APIs like virDomainAttachDevice (aka virsh attach-device) or remove some on virDomainDetachDevice (aka virsh detach-device). For a generic devices this works perfectly as libvirt looks into the host's /dev and creates corresponding /dev/* nodes in the domain's private namespace exactly as it is in the host. However, there is one exception - disks. On a disk hot unplug libvirt is not removing any /dev/* nodes from the private namespace because they might still be in use by a backing chain of some other disk. And this is what is causing the problem. Imagine /dev/nvme0n1 disk which has some MAJOR:MINOR number (these are there to identify a device uniquely on kernel level). Now hotplug the disk into a domain => libvirt creates the exact copy in the domain's namespace. Then hotunplug the disk from the domain => libvirt keeps the /dev/nvme0n1 in the domain namespace. Now, hotunplug the NVMe disk from the host and hot plug it back again => The MINOR number is likely to change after this. However, at this point there is a discrepancy between MINOR number in the host and the one in the domain's namespace. Consequence: Qemu is trying to open a different device than it thinks and because of devices CGroup it is denied the access. Fix: The fix consists of forcibly creating /dev/nvme0n1 (or in general any other device) even if it exists in the domain's private namespace. This way we can be sure that it is the exact copy as in the host's namespace. Result: Hotplugging disks multiple times work again.
Story Points: ---
Clone Of: 1752978 Environment:
Last Closed: 2019-12-10 12:38:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1752978    
Bug Blocks:    

Description RAD team bot copy to z-stream 2019-11-22 14:54:45 UTC
This bug has been copied from bug #1752978 and has been proposed to be backported to 7.6 z-stream (EUS).

Comment 9 Han Han 2019-11-27 02:37:12 UTC
Reproduced on libvirt-4.5.0-10.el7_6.14.x86_64, no nvme disk required:
1. Start a vm
2. Check current MAJ:MIN number of host disk:
# lsblk
sdb                              8:16   0    10G  0 disk 

3. Change MAJ:MIN number of disk in qemu namespace
# nsenter -m -t $(pidof qemu-kvm) mknod /dev/sdb b 8 17

4. Live attach the disk
# virsh attach-disk pc /dev/sdb sdb
error: Failed to attach disk
error: internal error: unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-scsi0-0-0-1' could not be initialized

Test on libvirt-4.5.0-10.el7_6.15.x86_64:
1. Prepare a as step1~3 above
2. Live attach the disk
# virsh attach-disk pc /dev/sdb sdb
Disk attached successfully

3. Detach and reattach the disk
# virsh detach-disk pc /dev/sdb                                                                                                                 
Disk detached successfully

# virsh attach-disk pc /dev/sdb sdb
Disk attached successfully

It works as expected.
Then I will check the patch and run some regressions test to see if any regressions.

Comment 10 Han Han 2019-11-27 09:21:08 UTC
BTW, I find the fix may affect other host char or block devices. I will update the results of them then.

Comment 11 Han Han 2019-11-29 07:48:12 UTC
Verified version: libvirt-4.5.0-10.el7_6.15.x86_64 qemu-kvm-rhev-2.12.0-18.el7_6.7.x86_64

For the following host passthrough devices, we tested  bug reproducing scenarios
(https://bugzilla.redhat.com/show_bug.cgi?id=1775679#c9), vm start with these devices,
device hotplug/hotunplug, all PASS.
- block disk
- hostdev
  - usb
  - mdev
  - scsi
  - mdev
- nvdimm memory device
- char device
- input device
- rng device

For the hostdev scsi_host and tpm device, it is not supported in rhel7.6, skipped.
For the graphic gl devices, it does not support hotplug/unplug, only starting vm with
this device is PASSed.

And I also tested snapshot and blockjob related cases, PASS:
- live external disk snapshot on block device(lvm LVs)
- blockcommit:
  - shallow commit from active layer, then pivot to the destination  layer
  - commit from active layer for the all backing chain, then pivot to the destination  layer
- blockcopy:
  - shallow copy to a block device, then pivot to the new layer
  - copy the whole backing chain to a block device, then pivot to the new layer

Comment 15 errata-xmlrpc 2019-12-10 12:38:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:4169