Bug 616415 - QMP: does not report the real cause of VFIO device assignment failure
QMP: does not report the real cause of VFIO device assignment failure
Status: ASSIGNED
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
7.0
All Linux
low Severity medium
: rc
: 7.0
Assigned To: Eric Auger
Virtualization Bugs
:
: 678963 (view as bug list)
Depends On: 1044815
Blocks: 559201 580953 580954
  Show dependency treegraph
 
Reported: 2010-07-20 08:05 EDT by juzhang
Modified: 2016-09-07 00:19 EDT (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description juzhang 2010-07-20 08:05:22 EDT
Description of problem:
Attempting to hot add a PCI device to a guest fails, returning a generic error
message "device 'pci-assign' could not be initialized".

Version-Release number of selected component (if applicable):
#rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.96.el6.x86_64

How reproducible:


Steps to Reproduce:
1. boot guest 
#/usr/libexec/qemu-kvm -m 2G -smp 2 -drive file=/root/zhangjunyi/winxp_32.raw,if=none,id=test,boot=on,cache=none,format=raw -device ide-drive,drive=test -cpu qemu64,-kvmclock -monitor stdio -boot order=cdn,menu=on -netdev tap,id=hostnet0,vhost=on -device rtl8139,netdev=hostnet0,id=net0,mac=22:11:22:45:66:93 -vnc :9 -qmp tcp:0:4445,server,nowait

2. connect to qmp server and issue the following commands
#{"execute":"qmp_capabilities"}

3. hot add physical nic to guest with addr=0x4
#{"execute":"device_add","arguments":{"driver":"pci-assign","host":"28:00.0","id":"test1","bus":"pci.0","addr":"0x4"}}

4.hot add another physical nic to guest with same addr=0x4
#{"execute":"device_add","arguments":{"driver":"pci-assign","host":"03:00.0","id":"test2","bus":"pci.0","addr":"0x4"}}

Actual results:
After step 4
{"error": {"class": "DeviceInitFailed", "desc": "Device 'pci-assign' could not be initialized", "data": {"device": "pci-assign"}}}

Expected results:
{"error": {"class": "DeviceInitFailed", "desc": "Device 'pci-assign' could not
be initialized. Failed to assign  addr is using for device.","data": {"device": "pci-assign"}}}
  


Additional info:
I know exist bug596279 QMP: does not report the real cause of PCI device assignment failure.but that is about fixing the problem with IRQ sharing,this is different.So I separate two issue temporary.if any mistake,please fix me.
Comment 2 Luiz Capitulino 2010-07-28 15:06:14 EDT
I've agreed on introducing a special case for bug 596279 because Daniel said it was a very frequent error, we just can't add the fix for bug 596279 everywhere.

What we really need is a better QError, this is not suitable for RHEL6 and isn't going to happen anytime soon.

Moving to tier3.
Comment 3 Luiz Capitulino 2010-11-24 11:38:59 EST
Unless custumers hit this often, this is low priority. Proposing for rhel6.2.
Comment 5 Luiz Capitulino 2011-04-25 09:23:39 EDT
*** Bug 678963 has been marked as a duplicate of this bug. ***
Comment 9 Luiz Capitulino 2013-07-16 10:53:43 EDT
This is a very old bug. Could you please test it on RHEL7 and post the results here? Thanks.
Comment 10 Ademar Reis 2013-09-26 10:02:00 EDT
(In reply to Luiz Capitulino from comment #9)
> This is a very old bug. Could you please test it on RHEL7 and post the
> results here? Thanks.

NEEDINFO(QA)
Comment 11 juzhang 2013-09-26 21:42:44 EDT

(In reply to Luiz Capitulino from comment #9)
> This is a very old bug. Could you please test it on RHEL7 and post the
> results here? Thanks.

Just saw this, I will post the result soon.

Hi Xfu,

Can you give a test and paste the result?
Comment 12 FuXiangChun 2013-09-26 23:12:01 EDT
Re-test this issue on RHEL7 with qemu-kvm-1.5.3-3.el7.x86_64. Still can reproduce this bug.

Reproduce to step:
1.Boot guest
/usr/libexec/qemu-kvm -M pc  -enable-kvm -m 4G -smp 2,sockets=2,cores=1,threads=1 -no-kvm-pit-reinjection -usb -device usb-tablet,id=input0 -name virtual-blk-device -rtc base=localtime,clock=host,driftfix=slew -drive file=/home/rhel7cp2.qcow2_v3,if=none,id=drive-system-disk,format=qcow2,cache=writeback,aio=native,werror=stop,rerror=stop -device virtio-scsi-pci,id=scsi0 -device scsi-hd,drive=drive-system-disk,bus=scsi0.0,id=system-disk,bootindex=1 -device virtio-balloon-pci,id=ballooning -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -k en-us -boot menu=on -qmp tcp:0:5555,server,nowait -serial unix:/tmp/ttyS0,server,nowait -monitor stdio -vnc :2 

2. {"execute":"qmp_capabilities"}

3. {"execute":"device_add","arguments":{"driver":"pci-assign","host":"86:10.1","id":"test1","bus":"pci.0","addr":"0x6"}}

4. hot-plug another net with the same PCI address
{"execute":"device_add","arguments":{"driver":"pci-assign","host":"86:10.2","id":"test2","bus":"pci.0","addr":"0x6"}}

result:
{"error": {"class": "GenericError", "desc": "Device initialization failed."}}
Comment 13 Luiz Capitulino 2013-09-30 09:11:42 EDT
Thanks for this information. So yes, the bug is still there. We want this error message to tell us the reason for the failure.

What we have to do is to change device_add to propagate errors. That is, errors should be propagated up from the pci-assign driver to device_add. This is listed as a QMP TODO list item.
Comment 14 Laszlo Ersek 2013-12-17 13:19:23 EST
I think there's an additional upstream item. See

commit 249d41720b7dfbb5951b430b9eefdbee7464f515
Author: Andreas Färber <afaerber@suse.de>
Date:   Wed Jan 9 03:58:11 2013 +0100

    qdev: Prepare "realized" property

The DeviceRealize function type is in fact able to emit an Error object. The item you mention is about propagating the Error object to the QMP caller.

However, devices that are still only qdev's don't even generate a detailed enough Error object. For them device_realize() simply wraps the traditional qdev_initfn member function called "init", which cannot produce detailed errors.

So I think we must first convert the pci-assign driver from qdev to QOM (it should have a genuine realize function which can emit good Error objects), and then we can think about propagating it.
Comment 15 Laszlo Ersek 2013-12-17 13:24:55 EST
OTOH I wonder if we could take a shortcut here, by simply calling qerror_report() rather than error_report() in the qdev init function in question, which is assigned_initfn() in "hw/i386/kvm/pci-assign".

qerror_report() branches on monitor_cur_is_qmp(). If the monitor is QMP, then monitor_set_error() is called (which I assume will reach the caller). Otherwise qerror_report() calls qerror_print(), which is a fallback to error_report() -- ie the current behavior.
Comment 18 Markus Armbruster 2013-12-17 16:27:55 EST
Re comment#14: You're correct, device_add propagating Error is just
part of the problem.  The other part is everything below it.  The big
incomplete piece is realization.  For most devices, it's still a
wrapper around old qdev's init() method, which reports errors via
error_report() and returns success / failure.  All the wrapper can do
on failure is make up a generic error.

Upstream has been working on converting init() to realize.  However,
if I count correctly, some 140 out of 180 device models in current
upstream's qemu-system-x86_64 still have init().
Comment 19 Markus Armbruster 2013-12-18 03:27:58 EST
Re comment#15: I don't remember qerror_report()'s exact behavior.  A
quick glance at the code suggests you're right: it sets cur_mon->error
when cur_mon is QMP, and monitor_protocol_emitter() picks it up turns
it into an error response.

Use of qerror_report() in hw/ is rare: there are 14 calls of
qerror_report() and qerror_report_err(), but some 100 calls of
error_set() and almost 400 calls of error_report().  Nevertheless,
there's precedence: serial_pci_init() is the qdev init() method of
"pci-serial", and it fails like this:

    serial_realize_core(s, &err);
    if (err != NULL) {
        qerror_report_err(err);
        error_free(err);
        return -1;
    }

Serial devices have been partially converted to realize() methods, but
this one hasn't.
Comment 22 Laszlo Ersek 2013-12-24 11:56:18 EST
This BZ could depend on (or be affected by) work done for bug 1044815, so setting dependency for later evaluation.
Comment 23 Alex Williamson 2013-12-24 13:01:26 EST
(In reply to Laszlo Ersek from comment #22)
> This BZ could depend on (or be affected by) work done for bug 1044815, so
> setting dependency for later evaluation.

bug 1044815 is only one potential error path, I don't see why it would block this bug
Comment 24 Laszlo Ersek 2013-12-25 05:35:37 EST
If you reorganize error paths / error handling in "hw/i386/kvm/pci-assign.c", then any conversion there to qerror_report() should happen either before or after, not in parallel. That's all I had in mind. I got the idea that both BZs could affect the same code parts from your status report, but I haven't looked into it; the above was just a note to self. It could turn out that the BZs are completely unrelated. Maybe the BZ-level dependency is a stretch, please feel free to undo it. Thanks.
Comment 25 Laszlo Ersek 2014-01-13 15:18:46 EST
http://mid.gmane.org/52B0A6D3.4030408@redhat.com
Comment 27 Laszlo Ersek 2014-04-10 04:29:47 EDT
posted upstream patchset:
http://thread.gmane.org/gmane.comp.emulators.qemu/266388
Comment 32 Cole Robinson 2014-07-29 15:13:52 EDT
Laszlo's patches are upstream, but as mentioned in the private comments, our focus for RHEL7 is on VFIO device assignment, which needs similar work to propagate Error objects for new style error reporting. Since that work isn't done yet, deferring this to 7.2
Comment 35 Cole Robinson 2016-05-03 17:06:12 EDT
I'm not really keyed into the RHEL process, so resetting assignee
Comment 36 Markus Armbruster 2016-07-27 04:15:22 EDT
vfio-pci needs to be converted to realize() upstream.  Such conversions usually aren't hard, just tedious.  It's too late for upstream 2.7, though.
Comment 37 Eric Auger 2016-08-21 10:04:20 EDT
Currently trying to test the fix on ARM Cavium ThunderX. I updated the QMD command to:

{"execute":"device_add","arguments":{"driver":"vfio-pci","host":"0005:90:00.0","id":"net0"}}

But I get the "Bus 'pcie.0' does not support hotplugging" returned error. Investigating further ...
Comment 38 Eric Auger 2016-09-06 03:24:41 EDT
I Posted "[PATCH 0/3] Convert VFIO-PCI to realize". This was tested on ARM Seattle with I350 NIC using qemu-system-aarch64 direct command line with virt machine and additional pci_bridge:

-device pci-bridge,addr=12.0,chassis_nr=2,id=head.2

On QMP shell:
device_add driver=vfio-pci host=0000:01:10.4 bus=head.2 addr=3

Note You need to log in before you can comment on or make changes to this bug.