Bug 596279
Summary: | QMP: does not report the real cause of PCI device assignment failure | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Daniel Berrangé <berrange> |
Component: | qemu-kvm | Assignee: | Luiz Capitulino <lcapitulino> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 6.0 | CC: | alex.williamson, armbru, juzhang, lcapitulino, mjenner, tburke, virt-maint |
Target Milestone: | rc | Flags: | juzhang:
needinfo-
|
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-0.12.1.2-2.85.el6 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-11-10 21:24:58 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 559201 |
Description
Daniel Berrangé
2010-05-26 14:10:22 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. This is 90% a QMP issue. pci-assign is a victim of an error reporting infrastructure that doesn't provide a means for a driver to report a useful error message via QMP. This specific case surfers of two problems. The first is our problematic error infrastructure and the other is that, when planning conversions to QMP, we decided to do 'shallow' conversions, that's, we decided to get the major error cause instead of digging too deep in call chains. The reason is simple: we will need several releases to do a full conversion to whatever error infrastructure we use. Now, there are two ways of doing what is proposed in 'Expected results', we could: 1. Add a new error for this specific case (easy) 2. Add optional members to the QError object and use it to fill the info we want (not trivial) Item 1 is an ugly workaround which is not suitable for upstream (and if we do it here, we would need it in other places too). Item 2 is something we would like to have, but I'm afraid we won't be able to do it on time for rhel6 (furthermore this solution is more like a wish have than a bug). I propose moving this one to rhel6.1. We could do a semi-shallow conversion, as suggested by Dan Berrange. Something like: if (qdev_init(qdev) < 0) { - qerror_report(QERR_DEVICE_INIT_FAILED, driver); + if (!qerror_is_set()) + qerror_report(QERR_DEVICE_INIT_FAILED, driver); return NULL; } This permits converting drivers one by one instead of all at once. Maybe that's what you meant by 1. I wouldn't call it an ugly work-around, and I think it's worth a try upstream. > I propose moving this one to rhel6.1.
Ordinarily I'd agree, but PCI device assignment is one of the most troublesome, likely to fail, operations the user can do with QEMU hotplug. So anything we can do to improve error reporting upon failure will reduce bogus bug reports about PCI assignment not working.
(In reply to comment #6) > > I propose moving this one to rhel6.1. > > Ordinarily I'd agree, but PCI device assignment is one of the most troublesome, > likely to fail, operations the user can do with QEMU hotplug. So anything we > can do to improve error reporting upon failure will reduce bogus bug reports > about PCI assignment not working. Ok, let's try to improve it then. (In reply to comment #5) > We could do a semi-shallow conversion, as suggested by Dan Berrange. Something > like: > > if (qdev_init(qdev) < 0) { > - qerror_report(QERR_DEVICE_INIT_FAILED, driver); > + if (!qerror_is_set()) > + qerror_report(QERR_DEVICE_INIT_FAILED, driver); > return NULL; > } > > This permits converting drivers one by one instead of all at once. Maybe > that's what you meant by 1. I wouldn't call it an ugly work-around, and I > think it's worth a try upstream. Yes, it's exactly what I meant by 1. I think this is not ideal because we will keep spreading QError the wrong way, maybe upstream will care, maybe it won't. I will try it anyway, as this is our only way to report error today. Ok, was working on this and found two problems: 1. There are lots of errors in hw/device-assigment.c that are reported with fprintf() (27 calls) or perror() (11 calls). Of course that the numbers are very different if we consider initialization code only 2. This code is in qemu-kvm, but not in upstream qemu. Which means that we would be extending a bad interface into a downstream fork So, I think the most reasonable thing to do is: 1. Identify the most important errors we want to report to users, ie. the ones that will avoid what is described in comment #6 2. Let's introduce those errors in rhel6 only, as vendor extensions (assuming this is not a problem to libvirt, of course) I tried all unused PCI device on my machine,no luck.can't reproduce this issue.in order to reproduce this issue.I borrowed a 82541nic(don't support MSI/MSIX).however,I still not sure cover this issue. Tested with qemu-kvm-0.12.1.2-2.82.el6(unfixed version) --------------------------------------------------------- steps: 1. Unbind device from host(82541nic,don't support MSI/MSIX) echo "8086 107c" > /sys/bus/pci/drivers/pci-stub/new_id echo 0000:37:04.0 > /sys/bus/pci/devices/0000\:37\:04.0/driver/unbind echo 0000:37:04.0 > /sys/bus/pci/drivers/pci-stub/bind 2. boot guest #/usr/libexec/qemu-kvm -S -M rhel6.0.0 -enable-kvm -m 1024 -smp 2,sockets=2,cores=1,threads=1 -name test -rtc base=utc -boot c -drive file=/root/zhangjunyi/RHEL-Server-6.0-64-virtio.qcow2,if=none,id=drive-virtio-disk0,boot=on,format=qcow2,cache=none -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,mac=76:00:40:3F:20:20,bus=pci.0,addr=0x4,id=hello -boot c -vnc :10 -qmp tcp:0:4444,server,nowait -monitor stdio 3. telnet the qmp server #{"execute":"qmp_capabilities"} 4.hotplug 82541nic using qmp command #{"execute":"device_add","arguments":{"driver":"pci-assign","host":"37:04.0","id":"test1"}} results: 1.{"error": {"class": "DeviceInitFailed", "desc": "Device 'pci-assign' could not be initialized", "data": {"device": "pci-assign"}}} 2.(qemu) Failed to assign irq for "test1": Input/output error Perhaps you are assigning a device that shares an IRQ with another device? Tested with qemu-kvm-0.12.1.2-2.109.el6(fixed version) using the same steps --------------------------------------------------------- After step4. Results: 1.{"error": {"class": "DeviceInitFailed", "desc": "Device 'pci-assign' could not be initialized: Failed to assign irq: Input/output error", "data": {"device": "pci-assign", "__com.redhat_strerror": "Input/output error", "__com.redhat_reason": "Failed to assign irq"}}} 2.(qemu) Failed to assign irq for "test1": Input/output error Perhaps you are assigning a device that shares an IRQ with another device? I have two question 1.why not allowed to assign device when guest all ready exist the same IRQ value.in my option,in modern pci device almost support IRQ sharing. 2.If my scenarios not cover this issue.would you please give some tips? (In reply to comment #14) > 1.why not allowed to assign device when guest all ready exist the same IRQ > value.in my option,in modern pci device almost support IRQ sharing. Not sure if I got it. > 2.If my scenarios not cover this issue.would you please give some tips? It does, you can see that "desc" has been extended in the fixed package. (In reply to comment #15) > (In reply to comment #14) > > > 1.why not allowed to assign device when guest all ready exist the same IRQ > > value.in my option,in modern pci device almost support IRQ sharing. > > Not sure if I got it. > > > 2.If my scenarios not cover this issue.would you please give some tips? > > It does, you can see that "desc" has been extended in the fixed package. Thanks Luiz,about first question,if you got it,please let me via mail or IRC. Based on comment15,mark this issue as verified. Any reason not to close this one? Red Hat Enterprise Linux 6.0 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |