1289391 – Libvirt incorrectly unplug the backend when host device frontent hotplug fails

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1289391 - Libvirt incorrectly unplug the backend when host device frontent hotplug fails

Summary: Libvirt incorrectly unplug the backend when host device frontent hotplug fails

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	7.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	John Ferlan
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-12-08 01:41 UTC by Yang Yang
Modified:	2016-11-03 18:49 UTC (History)
CC List:	7 users (show)
Fixed In Version:	libvirt-2.0.0-5.el7
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-11-03 18:49:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
libvirtd.log (11.15 MB, text/plain) 2016-05-20 06:19 UTC, Yang Yang	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2016:2577	0	normal	SHIPPED_LIVE	Moderate: libvirt security, bug fix, and enhancement update	2016-11-03 12:07:06 UTC

Description Yang Yang 2015-12-08 01:41:51 UTC

Description of problem:
After hostdev frontend attach fails libvirt incorrectly unplug the backend.
It causes next attach always fails. Possibly libvirt should error out
early instead of passing bad address to qemu.


Version-Release number of selected component (if applicable):
libvirt-1.2.17-13.el7_2.2.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.3.x86_64

How reproducible:
100%

Steps to Reproduce:
1. prepare a running domain vm1, hotplug scsi device
# cat hostdev.xml
<hostdev mode='subsystem' type='scsi' managed='no'>
      <source>
        <adapter name='scsi_host6'/>
        <address bus='0' target='0' unit='0'/>
      </source>
<address type='drive' controller='0' bus='1' target='1' unit='0'/>
    </hostdev>

# virsh attach-device vm1 hostdev.xml
error: Failed to attach device from hostdev.xml
error: internal error: unable to execute QEMU command 'device_add': bad scsi channel id: 1

2. edit hostdev.xml and update the address correctly, hotplug once more
# cat hostdev.xml
<hostdev mode='subsystem' type='scsi' managed='no'>
      <source>
        <adapter name='scsi_host6'/>
        <address bus='0' target='0' unit='0'/>
      </source>
<address type='drive' controller='0' bus='0' target='1' unit='0'/>
    </hostdev>

# virsh attach-device vm1 hostdev.xml
error: Failed to attach device from hostdev.xml
error: internal error: unable to execute QEMU command '_com.redhat_drive_add': Duplicate ID 'drive-hostdev0' for drive

# virsh dumpxml vm1 | grep hostdev
No output

3. destroy / start domain, hotplug with correct hostdev.xml once more, successfully
# cat hostdev.xml
<hostdev mode='subsystem' type='scsi' managed='no'>
      <source>
        <adapter name='scsi_host6'/>
        <address bus='0' target='0' unit='0'/>
      </source>
<address type='drive' controller='0' bus='0' target='1' unit='0'/>
    </hostdev>

# virsh attach-device vm1 hostdev.xml
Device attached successfully

# virsh dumpxml vm1 | grep hostdev -a6
<hostdev mode='subsystem' type='scsi' managed='no'>
      <source>
        <adapter name='scsi_host6'/>
        <address bus='0' target='0' unit='0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='drive' controller='0' bus='0' target='1' unit='0'/>
    </hostdev>

Actual results:
After hostdev frontend attach fails libvirt incorrectly unplug the backend.
It causes next attach always fails.

Expected results:
Possibly libvirt should error out early instead of triggering qemu failure
when a drive type address is invalid or duplicate. OR unplug the backend when hostdev frontent hotplug fails

Additional info:

Comment 1 John Ferlan 2016-04-25 12:08:56 UTC

Would have helped to have seen/known if you had any scsi devices already defined in your vm1 domain. Using the latest top of tree (1.3.4) if I have either no or the following default 'scsi' controller defined:

Then I get the following error at attach-device:

# virsh attach-device f18 hostdev-bad.xml
error: Failed to attach device from hostdev-bad.xml
error: unsupported configuration: target must be 0 for scsi host device if its controller model is 'lsilogic'

If I change to:

Then I do get the error:

# virsh attach-device f18 hostdev-bad.xml
error: Failed to attach device from hostdev-bad.xml
error: internal error: unable to execute QEMU command 'device_add': bad scsi channel id: 1

However, I can successfully hotplug afterwards. There have been changes in this code since 1.2.17 (as noted in the description), but none of the changes seem to have been related to the failure to add the hostdev SCSI device. There could also be changes in qemu since then (I have 2.4 installed) that could affect whether one can successfully delete the drive.

FWIW:
The error you got:

# virsh attach-device vm1 hostdev.xml
error: Failed to attach device from hostdev.xml
error: internal error: unable to execute QEMU command '_com.redhat_drive_add': Duplicate ID 'drive-hostdev0' for drive

would indicate that for some reason in qemuDomainAttachHostSCSIDevice the error path to qemuMonitorAddDevice failed when the qemuMonitorDriveDel was called, but that failure path is "hidden" by using only VIR_WARN to display. You'd have to be logging and looking for those errors. In the future, it would certainly be helpful to see that kind of output.

In any case, I'm going to move this to ON_QA for testing with the current release as I don't see a failure in my enviroment. If testing shows that something fails, then I will re-evaluate. Please those provide the details of the SCSI controller being used and try to take the steps to log the attach failure with more details. Add the following lines to your /etc/libvirt/libvirtd.conf:

log_level = 1
log_filters="3:remote 4:event 3:json 3:rpc"
log_outputs="1:file:/var/log/libvirt/libvirtd.log"

Restart libvirtd, then execute the sequence of commands and provide the libvirtd.log. That should be enough.

Comment 2 Yang Yang 2016-05-20 06:01:14 UTC

John,

The original scenario is blocked by qemu regression bz#1337100. VM is killed
after hotplug scsi hostdev(passthrough) fails. So I cannot reproduce this scenario before qemu bz1337100 is fixed. e.g. I have a default scsi controller like this

 <controller type='scsi' index='0'/>

Then hotplug scsi hostdev(passthrough) with incorrect address
# cat hostdev.xml
<hostdev mode='subsystem' type='scsi' managed='no'>
      <source>
        <adapter name='scsi_host6'/>
        <address bus='0' target='0' unit='0'/>
      </source>
<address type='drive' controller='0' bus='1' target='1' unit='0'/>
    </hostdev>

# virsh attach-device vm1 hostdev.xml
error: Failed to attach device from hostdev.xml
error: internal error: unable to execute QEMU command 'device_add': bad scsi channel id: 1

VM is unfortunately killed:

# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     vm1                            shut off

However, I can reproduce the issue by hot-plugging usb storage.
I have 1 default usb controller:

  <controller type='usb' index='0'>
      
Then hotplug 1 usb storage into usb controller with incorrect address:

# cat usb.xml 
 <disk type='block' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source dev='/dev/sdb'/>
      <target dev='sda' bus='usb'/>
      <address type='usb' bus='0' port='5'/>   --port='5' is incorrect
    </disk>

# virsh attach-device vm1 usb.xml 
error: Failed to attach device from usb.xml
error: internal error: unable to execute QEMU command 'device_add': usb port 5 (bus usb.0) not found (in use?)

Then change port to 2, attempt hotplug again:

# cat usb.xml 
<disk type='block' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source dev='/dev/sdb'/>
      <target dev='sda' bus='usb'/>
      <address type='usb' bus='0' port='2'/>
    </disk>

I got the error like this:

# virsh attach-device vm1 usb.xml 
error: Failed to attach device from usb.xml
error: internal error: unable to execute QEMU command '__com.redhat_drive_add': Duplicate ID 'drive-usb-disk0' for drive

however, drive-usb-disk0 does not exist in my vm:

# virsh dumpxml vm1 | grep usb
    <controller type='usb' index='0'>
      <alias name='usb'/>
    <redirdev bus='usb' type='spicevmc'>

I destroy/start vm, and hotplug successfully:

# virsh destroy vm1; virsh start vm1
Domain vm1 destroyed

Domain vm1 started

# virsh attach-device vm1 usb.xml 
Device attached successfully

I guess it is similar to Bug 1262399 - disk backend is not removed properly when disk frontent hotplug fails. But Bug 1262399 only fixes virtio hotplug issue. Attached libvirtd.log.

Regards
Yang

Comment 3 Yang Yang 2016-05-20 06:19:22 UTC

Created attachment 1159794 [details]
libvirtd.log

Comment 4 John Ferlan 2016-06-14 14:17:18 UTC

Not clear why there's a needinfo... Just clearing it. I do know about this bug and it is in my work queue.

Comment 5 John Ferlan 2016-06-29 20:54:09 UTC

Ohh.... re-reading everything again. You were asking about USB hotplug/unplug issues. I missed that on the first pass. Now that I've had a few more cycles to think and read more closely, I see.

One important thing to understand - when attaching, you are sending two commands to the qemu monitor. One to create the "-drive" and the other to create the "-device". One way to "view" this, is add the device to the guest xml, start it, and note the "-drive" and "-device" commands for your device. When you then stop the guest, remove the device, and try the hotplug - you can see the commands that libvirt uses in order to perform the adds in libvirtd.log:

2016-05-20 05:45:01.346+0000: 31899: debug : virEventPollInterruptLocked:727 : Interrupting
2016-05-20 05:45:01.346+0000: 31899: info : qemuMonitorSend:1007 : QEMU_MONITOR_SEND_MSG: mon=0x7f6c40004df0 msg={"execute":"__com.redhat_drive_add","arguments":{"file":"/dev/sdb","format":"qcow2","id":"drive-usb-disk0"},"id":"libvirt-12"}
fd=-1

which succeeds, then the :

2016-05-20 05:45:01.354+0000: 31899: info : qemuMonitorSend:1007 : QEMU_MONITOR_SEND_MSG: mon=0x7f6c40004df0 msg={"execute":"device_add","arguments":{"driver":"usb-storage","bus":"usb.0","port":"5","drive":"drive-usb-disk0","id":"usb-disk0","removable":"off"},"id":"libvirt-13"}
fd=-1

which fails. If you had check the arguments for the a guest with the disk already present you would note pretty much the same list of arguments and parameters.

Anyway, my assumption for the reason for the failure in the port='2' case is that when the port='5' case failed, the '-drive' for the device (eg the drive-usb-disk0) must not be removed when the '-device' fails. So when the port='2' is attempted, you get the failure because the '-drive' already exist. My more recent qemu versions (2.4.1) seem to remove it when the -device add fails and thus performing that second addition succeeds for me.

FWIW: That id is built using the 'target dev=sdX' value where "X" is 'a', 'b', 'c', etc. which relates to disk0, disk1, disk2, etc. (IOW: the 'a' means the 0th disk on the target bus). If on the port='2' case you had used target dev='sdb' (or something other than 'sda'), it should have worked.

When you stop/start the domain and attach the port='2' disk it works because the drive is no longer there from the first failed attempt.

BTW: Initially when I tried to reproduce your scenario I was getting failures, but then again my source device /dev/sdb isn't using 'qcow2' format.

Using the latest libvirt - once I changed that to 'raw', I was able unsuccessfully add port='5', but then successfully added port='2' (without any real changes).

Hence, my assumption is some failure path in more recent qemu's may have "cleaned up" the '-drive' when the '-device' failed. I could add code to libvirt to clean that code up and make it more obvious, but I think the usb should be retested using the 7.3/libvirt/qemu and if it's not working, create a bug.

Comment 6 Yang Yang 2016-06-30 07:46:22 UTC

Thanks John for your detail analysis

Re-test under latest rhel7.3 libvirt and qemu using both qcow2 and raw format usb deivce. The test results are same as comment#2.

libvirt-1.3.5-1.el7.x86_64
qemu-kvm-rhev-2.6.0-9.el7.x86_64
3.10.0-445.el7.x86_64

1. start vm with usb controller#0
# virsh dumpxml vm1 | grep usb -a5
<controller type='usb' index='0'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>

2. hot-plug scsi device with a bad address
# lsscsi
[0:0:0:0]    disk    ATA      SAMSUNG HD322GM  0101  /dev/sda 
[2:0:0:0]    cd/dvd  PLDS     DVD-ROM DH-16D5S VD15  /dev/sr0 
[6:0:0:0]    disk    Alcor    Flash Disk       8.07  /dev/sdb 

# qemu-img info /dev/sdb
image: /dev/sdb
file format: raw
virtual size: 999M (1047527424 bytes)
disk size: 0

# cat usb.xml 
<disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/dev/sdb'/>
      <target dev='sda' bus='usb'/>
<address type='usb' bus='0' port='2'/>
    </disk>

# virsh attach-device vm1 usb.xml 
error: Failed to attach device from usb.xml
error: internal error: unable to execute QEMU command 'device_add': usb port 5 (bus usb.0) not found (in use?)

3. hot-plug scsi device with a correct address
# cat usb.xml 
<disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/dev/sdb'/>
      <target dev='sda' bus='usb'/>
<address type='usb' bus='0' port='2'/>
    </disk>

# virsh attach-device vm1 usb.xml 
error: Failed to attach device from usb.xml
error: internal error: unable to execute QEMU command '__com.redhat_drive_add': Duplicate ID 'drive-usb-disk0' for drive

So '-drive' is not cleaned up after 'device_add' failed

I got the same results when using qcow2 format usb device

Comment 7 John Ferlan 2016-07-12 11:59:47 UTC

Again, this is a SCSI bug... The USB infrastructure (code path) is different. Then of course there is RHEL specific code in play too (e.g. that '__com.redhat_drive_add' in the output).

FWIW: I did post a series:

http://www.redhat.com/archives/libvir-list/2016-June/msg02207.html

that will do that drive_del for USB. Cole quickly noted that there is bz 1336225 which indicates both USB and SCSI are missing the drive del.

Comment 8 John Ferlan 2016-07-19 21:20:04 UTC

I've updated the above series as:

http://www.redhat.com/archives/libvir-list/2016-July/msg00730.html


Patches 1-5 work to resolve the USB drive add failure

Patch 6 works to resolve the SCSI drive add failure

Patches 7-9 work to resolve the SCSI_HOST drive add failure.  At least from a libvirt viewpoint.  As it turns out the code made the qemuMonitorDriveDel call, but it used the "whole" drvstr and not just the drivealias, which I assume caused things to go belly up in qemu.

I reserved a rhel7 beaker system and was able to reproduce the issue and prove to myself that I fixed it.

Comment 9 John Ferlan 2016-08-02 14:42:26 UTC

Patches have been pushed upstream

git show 1149fe4c15feba1a2970bd69c3d3d2884cd72938

    qemu: Use the hostdev alias in qemuDomainAttachHostSCSIDevice error path
    
...
   
    Rather than pass the whole drive string (which contained the alias),
    pass only the alias for the qemuMonitorDriveDel call in the error
    path when adding a host device in the monitor fails.

Comment 12 Pei Zhang 2016-09-01 09:26:15 UTC

Verified version :

libvirt-2.0.0-6.el7.x86_64
qemu-kvm-rhev-2.6.0-22.el7.x86_64

Verified steps :

1. start a guest with one scsi controller like following :

# virsh dumpxml vm2 | grep scsi -A 3
    <controller type='scsi' index='0' model='virtio-scsi'>
      <alias name='scsi0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>

2. prepare a scsi_host with wrong address as following :
# cat scsi_hostdev.xml 
<hostdev mode='subsystem' type='scsi' managed='no'>
      <source>
        <adapter name='scsi_host4'/>
        <address bus='0' target='0' unit='0'/>
      </source>
<address type='drive' controller='0' bus='1' target='1' unit='0'/> <== no scsi1 in guest
    </hostdev>

it should fail to attach like following :

# virsh attach-device vm2 scsi_hostdev.xml 
error: Failed to attach device from scsi_hostdev.xml
error: internal error: unable to execute QEMU command 'device_add': bad scsi channel id: 1

3. edit scsi_host xml, delete the wrong address 
# cat scsi_hostdev1.xml 
<hostdev mode='subsystem' type='scsi' managed='no'>
      <source>
        <adapter name='scsi_host4'/>
        <address bus='0' target='0' unit='0'/>
      </source>
    </hostdev>

attach again, it should attach successfully.

# virsh attach-device vm2 scsi_hostdev1.xml 
Device attached successfully

dumpxml to check :

# virsh dumpxml vm2 | grep hostdev -A 9
    <hostdev mode='subsystem' type='scsi' managed='no'>
      <source>
        <adapter name='scsi_host4'/>
        <address bus='0' target='0' unit='0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </hostdev>

Now for scsi host, it can attach successfully after a failure.

Comment 16 Pei Zhang 2016-09-02 06:40:02 UTC

According to comment 12, move this bug to verified.

Comment 18 errata-xmlrpc 2016-11-03 18:49:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2577.html

Note You need to log in before you can comment on or make changes to this bug.