Bug 1289391

Summary:

Libvirt incorrectly unplug the backend when host device frontent hotplug fails

Product:

Red Hat Enterprise Linux 7

Reporter:

Yang Yang <yanyang>

Component:

libvirt

Assignee:

John Ferlan <jferlan>

Status:

CLOSED ERRATA

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

7.2

CC:

dyuan, jferlan, mzhan, pzhang, rbalakri, xuzhang, yisun

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

libvirt-2.0.0-5.el7

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-11-03 18:49:28 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
libvirtd.log	none

Description Yang Yang 2015-12-08 01:41:51 UTC

Description of problem:
After hostdev frontend attach fails libvirt incorrectly unplug the backend.
It causes next attach always fails. Possibly libvirt should error out
early instead of passing bad address to qemu.


Version-Release number of selected component (if applicable):
libvirt-1.2.17-13.el7_2.2.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.3.x86_64

How reproducible:
100%

Steps to Reproduce:
1. prepare a running domain vm1, hotplug scsi device
# cat hostdev.xml
<hostdev mode='subsystem' type='scsi' managed='no'>
      <source>
        <adapter name='scsi_host6'/>
        <address bus='0' target='0' unit='0'/>
      </source>
<address type='drive' controller='0' bus='1' target='1' unit='0'/>
    </hostdev>

# virsh attach-device vm1 hostdev.xml
error: Failed to attach device from hostdev.xml
error: internal error: unable to execute QEMU command 'device_add': bad scsi channel id: 1

2. edit hostdev.xml and update the address correctly, hotplug once more
# cat hostdev.xml
<hostdev mode='subsystem' type='scsi' managed='no'>
      <source>
        <adapter name='scsi_host6'/>
        <address bus='0' target='0' unit='0'/>
      </source>
<address type='drive' controller='0' bus='0' target='1' unit='0'/>
    </hostdev>

# virsh attach-device vm1 hostdev.xml
error: Failed to attach device from hostdev.xml
error: internal error: unable to execute QEMU command '_com.redhat_drive_add': Duplicate ID 'drive-hostdev0' for drive

# virsh dumpxml vm1 | grep hostdev
No output

3. destroy / start domain, hotplug with correct hostdev.xml once more, successfully
# cat hostdev.xml
<hostdev mode='subsystem' type='scsi' managed='no'>
      <source>
        <adapter name='scsi_host6'/>
        <address bus='0' target='0' unit='0'/>
      </source>
<address type='drive' controller='0' bus='0' target='1' unit='0'/>
    </hostdev>

# virsh attach-device vm1 hostdev.xml
Device attached successfully

# virsh dumpxml vm1 | grep hostdev -a6
<hostdev mode='subsystem' type='scsi' managed='no'>
      <source>
        <adapter name='scsi_host6'/>
        <address bus='0' target='0' unit='0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='drive' controller='0' bus='0' target='1' unit='0'/>
    </hostdev>

Actual results:
After hostdev frontend attach fails libvirt incorrectly unplug the backend.
It causes next attach always fails.

Expected results:
Possibly libvirt should error out early instead of triggering qemu failure
when a drive type address is invalid or duplicate. OR unplug the backend when hostdev frontent hotplug fails

Additional info:

Comment 1 John Ferlan 2016-04-25 12:08:56 UTC

Would have helped to have seen/known if you had any scsi devices already defined in your vm1 domain. Using the latest top of tree (1.3.4) if I have either no or the following default 'scsi' controller defined:

Then I get the following error at attach-device:

# virsh attach-device f18 hostdev-bad.xml
error: Failed to attach device from hostdev-bad.xml
error: unsupported configuration: target must be 0 for scsi host device if its controller model is 'lsilogic'

If I change to:

Then I do get the error:

# virsh attach-device f18 hostdev-bad.xml
error: Failed to attach device from hostdev-bad.xml
error: internal error: unable to execute QEMU command 'device_add': bad scsi channel id: 1

However, I can successfully hotplug afterwards. There have been changes in this code since 1.2.17 (as noted in the description), but none of the changes seem to have been related to the failure to add the hostdev SCSI device. There could also be changes in qemu since then (I have 2.4 installed) that could affect whether one can successfully delete the drive.

FWIW:
The error you got:

# virsh attach-device vm1 hostdev.xml
error: Failed to attach device from hostdev.xml
error: internal error: unable to execute QEMU command '_com.redhat_drive_add': Duplicate ID 'drive-hostdev0' for drive

would indicate that for some reason in qemuDomainAttachHostSCSIDevice the error path to qemuMonitorAddDevice failed when the qemuMonitorDriveDel was called, but that failure path is "hidden" by using only VIR_WARN to display. You'd have to be logging and looking for those errors. In the future, it would certainly be helpful to see that kind of output.

In any case, I'm going to move this to ON_QA for testing with the current release as I don't see a failure in my enviroment. If testing shows that something fails, then I will re-evaluate. Please those provide the details of the SCSI controller being used and try to take the steps to log the attach failure with more details. Add the following lines to your /etc/libvirt/libvirtd.conf:

log_level = 1
log_filters="3:remote 4:event 3:json 3:rpc"
log_outputs="1:file:/var/log/libvirt/libvirtd.log"

Restart libvirtd, then execute the sequence of commands and provide the libvirtd.log. That should be enough.

Comment 2 Yang Yang 2016-05-20 06:01:14 UTC

John,

The original scenario is blocked by qemu regression bz#1337100. VM is killed
after hotplug scsi hostdev(passthrough) fails. So I cannot reproduce this scenario before qemu bz1337100 is fixed. e.g. I have a default scsi controller like this

 <controller type='scsi' index='0'/>

Then hotplug scsi hostdev(passthrough) with incorrect address
# cat hostdev.xml
<hostdev mode='subsystem' type='scsi' managed='no'>
      <source>
        <adapter name='scsi_host6'/>
        <address bus='0' target='0' unit='0'/>
      </source>
<address type='drive' controller='0' bus='1' target='1' unit='0'/>
    </hostdev>

# virsh attach-device vm1 hostdev.xml
error: Failed to attach device from hostdev.xml
error: internal error: unable to execute QEMU command 'device_add': bad scsi channel id: 1

VM is unfortunately killed:

# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     vm1                            shut off

However, I can reproduce the issue by hot-plugging usb storage.
I have 1 default usb controller:

  <controller type='usb' index='0'>
      
Then hotplug 1 usb storage into usb controller with incorrect address:

# cat usb.xml 
 <disk type='block' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source dev='/dev/sdb'/>
      <target dev='sda' bus='usb'/>
      <address type='usb' bus='0' port='5'/>   --port='5' is incorrect
    </disk>

# virsh attach-device vm1 usb.xml 
error: Failed to attach device from usb.xml
error: internal error: unable to execute QEMU command 'device_add': usb port 5 (bus usb.0) not found (in use?)

Then change port to 2, attempt hotplug again:

# cat usb.xml 
<disk type='block' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source dev='/dev/sdb'/>
      <target dev='sda' bus='usb'/>
      <address type='usb' bus='0' port='2'/>
    </disk>

I got the error like this:

# virsh attach-device vm1 usb.xml 
error: Failed to attach device from usb.xml
error: internal error: unable to execute QEMU command '__com.redhat_drive_add': Duplicate ID 'drive-usb-disk0' for drive

however, drive-usb-disk0 does not exist in my vm:

# virsh dumpxml vm1 | grep usb
    <controller type='usb' index='0'>
      <alias name='usb'/>
    <redirdev bus='usb' type='spicevmc'>

I destroy/start vm, and hotplug successfully:

# virsh destroy vm1; virsh start vm1
Domain vm1 destroyed

Domain vm1 started

# virsh attach-device vm1 usb.xml 
Device attached successfully

I guess it is similar to Bug 1262399 - disk backend is not removed properly when disk frontent hotplug fails. But Bug 1262399 only fixes virtio hotplug issue. Attached libvirtd.log.

Regards
Yang

Comment 3 Yang Yang 2016-05-20 06:19:22 UTC

Created attachment 1159794 [details]
libvirtd.log

Comment 4 John Ferlan 2016-06-14 14:17:18 UTC

Not clear why there's a needinfo... Just clearing it. I do know about this bug and it is in my work queue.

Comment 5 John Ferlan 2016-06-29 20:54:09 UTC

Ohh.... re-reading everything again. You were asking about USB hotplug/unplug issues. I missed that on the first pass. Now that I've had a few more cycles to think and read more closely, I see.

One important thing to understand - when attaching, you are sending two commands to the qemu monitor. One to create the "-drive" and the other to create the "-device". One way to "view" this, is add the device to the guest xml, start it, and note the "-drive" and "-device" commands for your device. When you then stop the guest, remove the device, and try the hotplug - you can see the commands that libvirt uses in order to perform the adds in libvirtd.log:

2016-05-20 05:45:01.346+0000: 31899: debug : virEventPollInterruptLocked:727 : Interrupting
2016-05-20 05:45:01.346+0000: 31899: info : qemuMonitorSend:1007 : QEMU_MONITOR_SEND_MSG: mon=0x7f6c40004df0 msg={"execute":"__com.redhat_drive_add","arguments":{"file":"/dev/sdb","format":"qcow2","id":"drive-usb-disk0"},"id":"libvirt-12"}
fd=-1

which succeeds, then the :

2016-05-20 05:45:01.354+0000: 31899: info : qemuMonitorSend:1007 : QEMU_MONITOR_SEND_MSG: mon=0x7f6c40004df0 msg={"execute":"device_add","arguments":{"driver":"usb-storage","bus":"usb.0","port":"5","drive":"drive-usb-disk0","id":"usb-disk0","removable":"off"},"id":"libvirt-13"}
fd=-1

which fails. If you had check the arguments for the a guest with the disk already present you would note pretty much the same list of arguments and parameters.

Anyway, my assumption for the reason for the failure in the port='2' case is that when the port='5' case failed, the '-drive' for the device (eg the drive-usb-disk0) must not be removed when the '-device' fails. So when the port='2' is attempted, you get the failure because the '-drive' already exist. My more recent qemu versions (2.4.1) seem to remove it when the -device add fails and thus performing that second addition succeeds for me.

FWIW: That id is built using the 'target dev=sdX' value where "X" is 'a', 'b', 'c', etc. which relates to disk0, disk1, disk2, etc. (IOW: the 'a' means the 0th disk on the target bus). If on the port='2' case you had used target dev='sdb' (or something other than 'sda'), it should have worked.

When you stop/start the domain and attach the port='2' disk it works because the drive is no longer there from the first failed attempt.

BTW: Initially when I tried to reproduce your scenario I was getting failures, but then again my source device /dev/sdb isn't using 'qcow2' format.

Using the latest libvirt - once I changed that to 'raw', I was able unsuccessfully add port='5', but then successfully added port='2' (without any real changes).

Hence, my assumption is some failure path in more recent qemu's may have "cleaned up" the '-drive' when the '-device' failed. I could add code to libvirt to clean that code up and make it more obvious, but I think the usb should be retested using the 7.3/libvirt/qemu and if it's not working, create a bug.

Comment 6 Yang Yang 2016-06-30 07:46:22 UTC

Thanks John for your detail analysis

Re-test under latest rhel7.3 libvirt and qemu using both qcow2 and raw format usb deivce. The test results are same as comment#2.

libvirt-1.3.5-1.el7.x86_64
qemu-kvm-rhev-2.6.0-9.el7.x86_64
3.10.0-445.el7.x86_64

1. start vm with usb controller#0
# virsh dumpxml vm1 | grep usb -a5
<controller type='usb' index='0'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>

2. hot-plug scsi device with a bad address
# lsscsi
[0:0:0:0]    disk    ATA      SAMSUNG HD322GM  0101  /dev/sda 
[2:0:0:0]    cd/dvd  PLDS     DVD-ROM DH-16D5S VD15  /dev/sr0 
[6:0:0:0]    disk    Alcor    Flash Disk       8.07  /dev/sdb 

# qemu-img info /dev/sdb
image: /dev/sdb
file format: raw
virtual size: 999M (1047527424 bytes)
disk size: 0

# cat usb.xml 
<disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/dev/sdb'/>
      <target dev='sda' bus='usb'/>
<address type='usb' bus='0' port='2'/>
    </disk>

# virsh attach-device vm1 usb.xml 
error: Failed to attach device from usb.xml
error: internal error: unable to execute QEMU command 'device_add': usb port 5 (bus usb.0) not found (in use?)

3. hot-plug scsi device with a correct address
# cat usb.xml 
<disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/dev/sdb'/>
      <target dev='sda' bus='usb'/>
<address type='usb' bus='0' port='2'/>
    </disk>

# virsh attach-device vm1 usb.xml 
error: Failed to attach device from usb.xml
error: internal error: unable to execute QEMU command '__com.redhat_drive_add': Duplicate ID 'drive-usb-disk0' for drive

So '-drive' is not cleaned up after 'device_add' failed

I got the same results when using qcow2 format usb device

Comment 7 John Ferlan 2016-07-12 11:59:47 UTC

Again, this is a SCSI bug... The USB infrastructure (code path) is different. Then of course there is RHEL specific code in play too (e.g. that '__com.redhat_drive_add' in the output).

FWIW: I did post a series:

http://www.redhat.com/archives/libvir-list/2016-June/msg02207.html

that will do that drive_del for USB. Cole quickly noted that there is bz 1336225 which indicates both USB and SCSI are missing the drive del.

Comment 8 John Ferlan 2016-07-19 21:20:04 UTC

I've updated the above series as:

http://www.redhat.com/archives/libvir-list/2016-July/msg00730.html


Patches 1-5 work to resolve the USB drive add failure

Patch 6 works to resolve the SCSI drive add failure

Patches 7-9 work to resolve the SCSI_HOST drive add failure.  At least from a libvirt viewpoint.  As it turns out the code made the qemuMonitorDriveDel call, but it used the "whole" drvstr and not just the drivealias, which I assume caused things to go belly up in qemu.

I reserved a rhel7 beaker system and was able to reproduce the issue and prove to myself that I fixed it.

Comment 9 John Ferlan 2016-08-02 14:42:26 UTC

Patches have been pushed upstream

git show 1149fe4c15feba1a2970bd69c3d3d2884cd72938

    qemu: Use the hostdev alias in qemuDomainAttachHostSCSIDevice error path
    
...
   
    Rather than pass the whole drive string (which contained the alias),
    pass only the alias for the qemuMonitorDriveDel call in the error
    path when adding a host device in the monitor fails.

Comment 12 Pei Zhang 2016-09-01 09:26:15 UTC

Verified version :

libvirt-2.0.0-6.el7.x86_64
qemu-kvm-rhev-2.6.0-22.el7.x86_64

Verified steps :

1. start a guest with one scsi controller like following :

# virsh dumpxml vm2 | grep scsi -A 3
    <controller type='scsi' index='0' model='virtio-scsi'>
      <alias name='scsi0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>

2. prepare a scsi_host with wrong address as following :
# cat scsi_hostdev.xml 
<hostdev mode='subsystem' type='scsi' managed='no'>
      <source>
        <adapter name='scsi_host4'/>
        <address bus='0' target='0' unit='0'/>
      </source>
<address type='drive' controller='0' bus='1' target='1' unit='0'/> <== no scsi1 in guest
    </hostdev>

it should fail to attach like following :

# virsh attach-device vm2 scsi_hostdev.xml 
error: Failed to attach device from scsi_hostdev.xml
error: internal error: unable to execute QEMU command 'device_add': bad scsi channel id: 1

3. edit scsi_host xml, delete the wrong address 
# cat scsi_hostdev1.xml 
<hostdev mode='subsystem' type='scsi' managed='no'>
      <source>
        <adapter name='scsi_host4'/>
        <address bus='0' target='0' unit='0'/>
      </source>
    </hostdev>

attach again, it should attach successfully.

# virsh attach-device vm2 scsi_hostdev1.xml 
Device attached successfully

dumpxml to check :

# virsh dumpxml vm2 | grep hostdev -A 9
    <hostdev mode='subsystem' type='scsi' managed='no'>
      <source>
        <adapter name='scsi_host4'/>
        <address bus='0' target='0' unit='0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </hostdev>

Now for scsi host, it can attach successfully after a failure.

Comment 16 Pei Zhang 2016-09-02 06:40:02 UTC

According to comment 12, move this bug to verified.

Comment 18 errata-xmlrpc 2016-11-03 18:49:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2577.html