Bug 616415
Summary: | QMP: does not report the real cause of VFIO device assignment failure | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | juzhang <juzhang> |
Component: | qemu-kvm | Assignee: | Eric Auger <eric.auger> |
Status: | CLOSED WONTFIX | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 7.0 | CC: | alex.williamson, areis, armbru, chayang, hhuang, huding, juzhang, knoel, lcapitulino, lersek, michen, mkenneth, qzhang, virt-maint, xfu |
Target Milestone: | rc | ||
Target Release: | 7.0 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-02-16 10:33:14 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1044815 | ||
Bug Blocks: | 559201, 580953, 580954 |
Description
juzhang
2010-07-20 12:05:22 UTC
I've agreed on introducing a special case for bug 596279 because Daniel said it was a very frequent error, we just can't add the fix for bug 596279 everywhere. What we really need is a better QError, this is not suitable for RHEL6 and isn't going to happen anytime soon. Moving to tier3. Unless custumers hit this often, this is low priority. Proposing for rhel6.2. *** Bug 678963 has been marked as a duplicate of this bug. *** This is a very old bug. Could you please test it on RHEL7 and post the results here? Thanks. (In reply to Luiz Capitulino from comment #9) > This is a very old bug. Could you please test it on RHEL7 and post the > results here? Thanks. NEEDINFO(QA) (In reply to Luiz Capitulino from comment #9) > This is a very old bug. Could you please test it on RHEL7 and post the > results here? Thanks. Just saw this, I will post the result soon. Hi Xfu, Can you give a test and paste the result? Re-test this issue on RHEL7 with qemu-kvm-1.5.3-3.el7.x86_64. Still can reproduce this bug. Reproduce to step: 1.Boot guest /usr/libexec/qemu-kvm -M pc -enable-kvm -m 4G -smp 2,sockets=2,cores=1,threads=1 -no-kvm-pit-reinjection -usb -device usb-tablet,id=input0 -name virtual-blk-device -rtc base=localtime,clock=host,driftfix=slew -drive file=/home/rhel7cp2.qcow2_v3,if=none,id=drive-system-disk,format=qcow2,cache=writeback,aio=native,werror=stop,rerror=stop -device virtio-scsi-pci,id=scsi0 -device scsi-hd,drive=drive-system-disk,bus=scsi0.0,id=system-disk,bootindex=1 -device virtio-balloon-pci,id=ballooning -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -k en-us -boot menu=on -qmp tcp:0:5555,server,nowait -serial unix:/tmp/ttyS0,server,nowait -monitor stdio -vnc :2 2. {"execute":"qmp_capabilities"} 3. {"execute":"device_add","arguments":{"driver":"pci-assign","host":"86:10.1","id":"test1","bus":"pci.0","addr":"0x6"}} 4. hot-plug another net with the same PCI address {"execute":"device_add","arguments":{"driver":"pci-assign","host":"86:10.2","id":"test2","bus":"pci.0","addr":"0x6"}} result: {"error": {"class": "GenericError", "desc": "Device initialization failed."}} Thanks for this information. So yes, the bug is still there. We want this error message to tell us the reason for the failure. What we have to do is to change device_add to propagate errors. That is, errors should be propagated up from the pci-assign driver to device_add. This is listed as a QMP TODO list item. I think there's an additional upstream item. See commit 249d41720b7dfbb5951b430b9eefdbee7464f515 Author: Andreas Färber <afaerber> Date: Wed Jan 9 03:58:11 2013 +0100 qdev: Prepare "realized" property The DeviceRealize function type is in fact able to emit an Error object. The item you mention is about propagating the Error object to the QMP caller. However, devices that are still only qdev's don't even generate a detailed enough Error object. For them device_realize() simply wraps the traditional qdev_initfn member function called "init", which cannot produce detailed errors. So I think we must first convert the pci-assign driver from qdev to QOM (it should have a genuine realize function which can emit good Error objects), and then we can think about propagating it. OTOH I wonder if we could take a shortcut here, by simply calling qerror_report() rather than error_report() in the qdev init function in question, which is assigned_initfn() in "hw/i386/kvm/pci-assign". qerror_report() branches on monitor_cur_is_qmp(). If the monitor is QMP, then monitor_set_error() is called (which I assume will reach the caller). Otherwise qerror_report() calls qerror_print(), which is a fallback to error_report() -- ie the current behavior. Re comment#14: You're correct, device_add propagating Error is just part of the problem. The other part is everything below it. The big incomplete piece is realization. For most devices, it's still a wrapper around old qdev's init() method, which reports errors via error_report() and returns success / failure. All the wrapper can do on failure is make up a generic error. Upstream has been working on converting init() to realize. However, if I count correctly, some 140 out of 180 device models in current upstream's qemu-system-x86_64 still have init(). Re comment#15: I don't remember qerror_report()'s exact behavior. A quick glance at the code suggests you're right: it sets cur_mon->error when cur_mon is QMP, and monitor_protocol_emitter() picks it up turns it into an error response. Use of qerror_report() in hw/ is rare: there are 14 calls of qerror_report() and qerror_report_err(), but some 100 calls of error_set() and almost 400 calls of error_report(). Nevertheless, there's precedence: serial_pci_init() is the qdev init() method of "pci-serial", and it fails like this: serial_realize_core(s, &err); if (err != NULL) { qerror_report_err(err); error_free(err); return -1; } Serial devices have been partially converted to realize() methods, but this one hasn't. This BZ could depend on (or be affected by) work done for bug 1044815, so setting dependency for later evaluation. (In reply to Laszlo Ersek from comment #22) > This BZ could depend on (or be affected by) work done for bug 1044815, so > setting dependency for later evaluation. bug 1044815 is only one potential error path, I don't see why it would block this bug If you reorganize error paths / error handling in "hw/i386/kvm/pci-assign.c", then any conversion there to qerror_report() should happen either before or after, not in parallel. That's all I had in mind. I got the idea that both BZs could affect the same code parts from your status report, but I haven't looked into it; the above was just a note to self. It could turn out that the BZs are completely unrelated. Maybe the BZ-level dependency is a stretch, please feel free to undo it. Thanks. posted upstream patchset: http://thread.gmane.org/gmane.comp.emulators.qemu/266388 Laszlo's patches are upstream, but as mentioned in the private comments, our focus for RHEL7 is on VFIO device assignment, which needs similar work to propagate Error objects for new style error reporting. Since that work isn't done yet, deferring this to 7.2 I'm not really keyed into the RHEL process, so resetting assignee vfio-pci needs to be converted to realize() upstream. Such conversions usually aren't hard, just tedious. It's too late for upstream 2.7, though. Currently trying to test the fix on ARM Cavium ThunderX. I updated the QMD command to: {"execute":"device_add","arguments":{"driver":"vfio-pci","host":"0005:90:00.0","id":"net0"}} But I get the "Bus 'pcie.0' does not support hotplugging" returned error. Investigating further ... I Posted "[PATCH 0/3] Convert VFIO-PCI to realize". This was tested on ARM Seattle with I350 NIC using qemu-system-aarch64 direct command line with virt machine and additional pci_bridge: -device pci-bridge,addr=12.0,chassis_nr=2,id=head.2 On QMP shell: device_add driver=vfio-pci host=0000:01:10.4 bus=head.2 addr=3 We had a discussion with Alex and it looks the backport of the associated upstreamed series does not really look sensible on qemu-kvm. This is quite a lot of changes in the VFIO code that will bring many conflicts with the qemu-kvm vfio code (which has an old and different structure compared to upstream code). We tend to think it is more sensible to get this fixed on qemu-kvm-rhev. And actually it is already since it was fixed in QEMU 2.8. Does that make sense? Thanks Eric Can we close that BZ, following comment #39? Thanks Eric (In reply to Eric Auger from comment #40) > Can we close that BZ, following comment #39? > > Thanks > > Eric Hi Eric, Make sense, I will close this bz as wontfix and QE will not test this scenario against qemu-kvm component. Best Regards, Junyi |