Bug 1289391
Summary: | Libvirt incorrectly unplug the backend when host device frontent hotplug fails | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Yang Yang <yanyang> | ||||
Component: | libvirt | Assignee: | John Ferlan <jferlan> | ||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7.2 | CC: | dyuan, jferlan, mzhan, pzhang, rbalakri, xuzhang, yisun | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | libvirt-2.0.0-5.el7 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-11-03 18:49:28 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Yang Yang
2015-12-08 01:41:51 UTC
Would have helped to have seen/known if you had any scsi devices already defined in your vm1 domain. Using the latest top of tree (1.3.4) if I have either no or the following default 'scsi' controller defined: <controller type='scsi' index='0'/> Then I get the following error at attach-device: # virsh attach-device f18 hostdev-bad.xml error: Failed to attach device from hostdev-bad.xml error: unsupported configuration: target must be 0 for scsi host device if its controller model is 'lsilogic' # If I change to: <controller type='scsi' index='0' model='virtio-scsi'/> Then I do get the error: # virsh attach-device f18 hostdev-bad.xml error: Failed to attach device from hostdev-bad.xml error: internal error: unable to execute QEMU command 'device_add': bad scsi channel id: 1 # However, I can successfully hotplug afterwards. There have been changes in this code since 1.2.17 (as noted in the description), but none of the changes seem to have been related to the failure to add the hostdev SCSI device. There could also be changes in qemu since then (I have 2.4 installed) that could affect whether one can successfully delete the drive. FWIW: The error you got: # virsh attach-device vm1 hostdev.xml error: Failed to attach device from hostdev.xml error: internal error: unable to execute QEMU command '_com.redhat_drive_add': Duplicate ID 'drive-hostdev0' for drive # would indicate that for some reason in qemuDomainAttachHostSCSIDevice the error path to qemuMonitorAddDevice failed when the qemuMonitorDriveDel was called, but that failure path is "hidden" by using only VIR_WARN to display. You'd have to be logging and looking for those errors. In the future, it would certainly be helpful to see that kind of output. In any case, I'm going to move this to ON_QA for testing with the current release as I don't see a failure in my enviroment. If testing shows that something fails, then I will re-evaluate. Please those provide the details of the SCSI controller being used and try to take the steps to log the attach failure with more details. Add the following lines to your /etc/libvirt/libvirtd.conf: log_level = 1 log_filters="3:remote 4:event 3:json 3:rpc" log_outputs="1:file:/var/log/libvirt/libvirtd.log" Restart libvirtd, then execute the sequence of commands and provide the libvirtd.log. That should be enough. John, The original scenario is blocked by qemu regression bz#1337100. VM is killed after hotplug scsi hostdev(passthrough) fails. So I cannot reproduce this scenario before qemu bz1337100 is fixed. e.g. I have a default scsi controller like this <controller type='scsi' index='0'/> Then hotplug scsi hostdev(passthrough) with incorrect address # cat hostdev.xml <hostdev mode='subsystem' type='scsi' managed='no'> <source> <adapter name='scsi_host6'/> <address bus='0' target='0' unit='0'/> </source> <address type='drive' controller='0' bus='1' target='1' unit='0'/> </hostdev> # virsh attach-device vm1 hostdev.xml error: Failed to attach device from hostdev.xml error: internal error: unable to execute QEMU command 'device_add': bad scsi channel id: 1 VM is unfortunately killed: # virsh list --all Id Name State ---------------------------------------------------- - vm1 shut off However, I can reproduce the issue by hot-plugging usb storage. I have 1 default usb controller: <controller type='usb' index='0'> Then hotplug 1 usb storage into usb controller with incorrect address: # cat usb.xml <disk type='block' device='disk'> <driver name='qemu' type='qcow2'/> <source dev='/dev/sdb'/> <target dev='sda' bus='usb'/> <address type='usb' bus='0' port='5'/> --port='5' is incorrect </disk> # virsh attach-device vm1 usb.xml error: Failed to attach device from usb.xml error: internal error: unable to execute QEMU command 'device_add': usb port 5 (bus usb.0) not found (in use?) Then change port to 2, attempt hotplug again: # cat usb.xml <disk type='block' device='disk'> <driver name='qemu' type='qcow2'/> <source dev='/dev/sdb'/> <target dev='sda' bus='usb'/> <address type='usb' bus='0' port='2'/> </disk> I got the error like this: # virsh attach-device vm1 usb.xml error: Failed to attach device from usb.xml error: internal error: unable to execute QEMU command '__com.redhat_drive_add': Duplicate ID 'drive-usb-disk0' for drive however, drive-usb-disk0 does not exist in my vm: # virsh dumpxml vm1 | grep usb <controller type='usb' index='0'> <alias name='usb'/> <redirdev bus='usb' type='spicevmc'> I destroy/start vm, and hotplug successfully: # virsh destroy vm1; virsh start vm1 Domain vm1 destroyed Domain vm1 started # virsh attach-device vm1 usb.xml Device attached successfully I guess it is similar to Bug 1262399 - disk backend is not removed properly when disk frontent hotplug fails. But Bug 1262399 only fixes virtio hotplug issue. Attached libvirtd.log. Regards Yang Created attachment 1159794 [details]
libvirtd.log
Not clear why there's a needinfo... Just clearing it. I do know about this bug and it is in my work queue. Ohh.... re-reading everything again. You were asking about USB hotplug/unplug issues. I missed that on the first pass. Now that I've had a few more cycles to think and read more closely, I see. One important thing to understand - when attaching, you are sending two commands to the qemu monitor. One to create the "-drive" and the other to create the "-device". One way to "view" this, is add the device to the guest xml, start it, and note the "-drive" and "-device" commands for your device. When you then stop the guest, remove the device, and try the hotplug - you can see the commands that libvirt uses in order to perform the adds in libvirtd.log: 2016-05-20 05:45:01.346+0000: 31899: debug : virEventPollInterruptLocked:727 : Interrupting 2016-05-20 05:45:01.346+0000: 31899: info : qemuMonitorSend:1007 : QEMU_MONITOR_SEND_MSG: mon=0x7f6c40004df0 msg={"execute":"__com.redhat_drive_add","arguments":{"file":"/dev/sdb","format":"qcow2","id":"drive-usb-disk0"},"id":"libvirt-12"} fd=-1 which succeeds, then the : 2016-05-20 05:45:01.354+0000: 31899: info : qemuMonitorSend:1007 : QEMU_MONITOR_SEND_MSG: mon=0x7f6c40004df0 msg={"execute":"device_add","arguments":{"driver":"usb-storage","bus":"usb.0","port":"5","drive":"drive-usb-disk0","id":"usb-disk0","removable":"off"},"id":"libvirt-13"} fd=-1 which fails. If you had check the arguments for the a guest with the disk already present you would note pretty much the same list of arguments and parameters. Anyway, my assumption for the reason for the failure in the port='2' case is that when the port='5' case failed, the '-drive' for the device (eg the drive-usb-disk0) must not be removed when the '-device' fails. So when the port='2' is attempted, you get the failure because the '-drive' already exist. My more recent qemu versions (2.4.1) seem to remove it when the -device add fails and thus performing that second addition succeeds for me. FWIW: That id is built using the 'target dev=sdX' value where "X" is 'a', 'b', 'c', etc. which relates to disk0, disk1, disk2, etc. (IOW: the 'a' means the 0th disk on the target bus). If on the port='2' case you had used target dev='sdb' (or something other than 'sda'), it should have worked. When you stop/start the domain and attach the port='2' disk it works because the drive is no longer there from the first failed attempt. BTW: Initially when I tried to reproduce your scenario I was getting failures, but then again my source device /dev/sdb isn't using 'qcow2' format. Using the latest libvirt - once I changed that to 'raw', I was able unsuccessfully add port='5', but then successfully added port='2' (without any real changes). Hence, my assumption is some failure path in more recent qemu's may have "cleaned up" the '-drive' when the '-device' failed. I could add code to libvirt to clean that code up and make it more obvious, but I think the usb should be retested using the 7.3/libvirt/qemu and if it's not working, create a bug. Thanks John for your detail analysis Re-test under latest rhel7.3 libvirt and qemu using both qcow2 and raw format usb deivce. The test results are same as comment#2. libvirt-1.3.5-1.el7.x86_64 qemu-kvm-rhev-2.6.0-9.el7.x86_64 3.10.0-445.el7.x86_64 1. start vm with usb controller#0 # virsh dumpxml vm1 | grep usb -a5 <controller type='usb' index='0'> <alias name='usb'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> 2. hot-plug scsi device with a bad address # lsscsi [0:0:0:0] disk ATA SAMSUNG HD322GM 0101 /dev/sda [2:0:0:0] cd/dvd PLDS DVD-ROM DH-16D5S VD15 /dev/sr0 [6:0:0:0] disk Alcor Flash Disk 8.07 /dev/sdb # qemu-img info /dev/sdb image: /dev/sdb file format: raw virtual size: 999M (1047527424 bytes) disk size: 0 # cat usb.xml <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/dev/sdb'/> <target dev='sda' bus='usb'/> <address type='usb' bus='0' port='2'/> </disk> # virsh attach-device vm1 usb.xml error: Failed to attach device from usb.xml error: internal error: unable to execute QEMU command 'device_add': usb port 5 (bus usb.0) not found (in use?) 3. hot-plug scsi device with a correct address # cat usb.xml <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/dev/sdb'/> <target dev='sda' bus='usb'/> <address type='usb' bus='0' port='2'/> </disk> # virsh attach-device vm1 usb.xml error: Failed to attach device from usb.xml error: internal error: unable to execute QEMU command '__com.redhat_drive_add': Duplicate ID 'drive-usb-disk0' for drive So '-drive' is not cleaned up after 'device_add' failed I got the same results when using qcow2 format usb device Again, this is a SCSI bug... The USB infrastructure (code path) is different. Then of course there is RHEL specific code in play too (e.g. that '__com.redhat_drive_add' in the output). FWIW: I did post a series: http://www.redhat.com/archives/libvir-list/2016-June/msg02207.html that will do that drive_del for USB. Cole quickly noted that there is bz 1336225 which indicates both USB and SCSI are missing the drive del. I've updated the above series as: http://www.redhat.com/archives/libvir-list/2016-July/msg00730.html Patches 1-5 work to resolve the USB drive add failure Patch 6 works to resolve the SCSI drive add failure Patches 7-9 work to resolve the SCSI_HOST drive add failure. At least from a libvirt viewpoint. As it turns out the code made the qemuMonitorDriveDel call, but it used the "whole" drvstr and not just the drivealias, which I assume caused things to go belly up in qemu. I reserved a rhel7 beaker system and was able to reproduce the issue and prove to myself that I fixed it. Patches have been pushed upstream git show 1149fe4c15feba1a2970bd69c3d3d2884cd72938 qemu: Use the hostdev alias in qemuDomainAttachHostSCSIDevice error path ... Rather than pass the whole drive string (which contained the alias), pass only the alias for the qemuMonitorDriveDel call in the error path when adding a host device in the monitor fails. Verified version : libvirt-2.0.0-6.el7.x86_64 qemu-kvm-rhev-2.6.0-22.el7.x86_64 Verified steps : 1. start a guest with one scsi controller like following : # virsh dumpxml vm2 | grep scsi -A 3 <controller type='scsi' index='0' model='virtio-scsi'> <alias name='scsi0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </controller> 2. prepare a scsi_host with wrong address as following : # cat scsi_hostdev.xml <hostdev mode='subsystem' type='scsi' managed='no'> <source> <adapter name='scsi_host4'/> <address bus='0' target='0' unit='0'/> </source> <address type='drive' controller='0' bus='1' target='1' unit='0'/> <== no scsi1 in guest </hostdev> it should fail to attach like following : # virsh attach-device vm2 scsi_hostdev.xml error: Failed to attach device from scsi_hostdev.xml error: internal error: unable to execute QEMU command 'device_add': bad scsi channel id: 1 3. edit scsi_host xml, delete the wrong address # cat scsi_hostdev1.xml <hostdev mode='subsystem' type='scsi' managed='no'> <source> <adapter name='scsi_host4'/> <address bus='0' target='0' unit='0'/> </source> </hostdev> attach again, it should attach successfully. # virsh attach-device vm2 scsi_hostdev1.xml Device attached successfully dumpxml to check : # virsh dumpxml vm2 | grep hostdev -A 9 <hostdev mode='subsystem' type='scsi' managed='no'> <source> <adapter name='scsi_host4'/> <address bus='0' target='0' unit='0'/> </source> <alias name='hostdev0'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </hostdev> Now for scsi host, it can attach successfully after a failure. According to comment 12, move this bug to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2577.html |