RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 689002 - guest with assigned nic got kernel panic when send system_reset signal in QEMU monitor
Summary: guest with assigned nic got kernel panic when send system_reset signal in QEM...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Alex Williamson
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 685147
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-03-18 18:51 UTC by Alex Williamson
Modified: 2013-01-09 23:40 UTC (History)
16 users (show)

Fixed In Version: libvirt-0.8.7-14.el6
Doc Type: Bug Fix
Doc Text:
Clone Of: 685147
Environment:
Last Closed: 2011-05-19 13:29:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0596 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2011-05-18 17:56:36 UTC

Comment 2 Eric Blake 2011-03-18 19:36:28 UTC
If I'm reading this bz correctly, then the libvirt piece of this patch is already upstream and just needs to be backported:


commit 2090b0f52d8270c38c6157b6f8fdd00fa265c213
Author: Alex Williamson <alex.williamson>
Date:   Thu Mar 17 14:26:36 2011 -0600

    Add PCI sysfs reset access
    
    I'm proposing we make use of $PCIDIR/reset in qemu-kvm to reset
    devices on VM reset.  We need to add it to libvirt's list of
    files that get ownership for device assignment.
    
    Signed-off-by: Alex Williamson <alex.williamson>

Comment 3 Alex Williamson 2011-03-18 20:01:05 UTC
(In reply to comment #2)
> If I'm reading this bz correctly, then the libvirt piece of this patch is
> already upstream and just needs to be backported:

Yes, exactly.

Comment 6 Alex Jia 2011-03-25 14:43:17 UTC
I can reproduce the bug with the above test
environment(qemu-kvm-0.12.1.2-2.150.el6.x86_64, NICs is BCM5709 with MSI
capability)

And the bug has been verified on rhel6.1(2.6.32-122.el6.x86_64) with
qemu-kvm-0.12.1.2-2.152.el6.x86_64.

From libvirt point of view, I can't execute reset action for the device by
virsh command:
# virsh nodedev-dettach pci_0000_01_00_1
Device pci_0000_01_00_1 dettached

# ls /sys/bus/pci/drivers/bnx2/
0000:02:00.0  0000:02:00.1  bind  module  new_id  remove_id  uevent  unbind

# ls /sys/bus/pci/drivers/pci-stub/
0000:01:00.0  0000:01:00.1  0000:09:00.0  bind  new_id  remove_id  uevent 
unbind

# virsh nodedev-reset pci_0000_01_00_1
error: Failed to reset device pci_0000_01_00_1
error: this function is not supported by the connection driver: Unable to reset
PCI device 0000:01:00.1: this function is not supported by the connection
driver: Active 0000:01:00.0 devices on bus with 0000:01:00.1, not doing bus
reset

And it's also fail to directly hot-plug/cold-plug the device to guest by
virt-manager, the same error information will be raise:

Error starting domain: this function is not supported by the connection driver:
Unable to reset PCI device 0000:01:00.1: this function is not supported by the
connection driver: Active 0000:01:00.0 devices on bus with 0000:01:00.1, not
doing bus reset

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 45, in
cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/engine.py", line 956, in asyncfunc
    vm.startup()
  File "/usr/share/virt-manager/virtManager/domain.py", line 1048, in startup
    self._backend.create()
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 325, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirtError: this function is not supported by the connection driver: Unable
to reset PCI device 0000:01:00.1: this function is not supported by the
connection driver: Active 0000:01:00.0 devices on bus with 0000:01:00.1, not
doing bus reset


Because the NICs device can't be successfully assigned to guest, so I can't use virsh qemu-monitor-command to send system_reset command to guest.


# uname -r
2.6.32-122.el6.x86_64

# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.152.el6.x86_64

# rpm -q libvirt
libvirt-0.8.7-14.el6.x86_64

# rpm -q virt-manager
virt-manager-0.8.6-3.el6.noarch

# rpm -q python-virtinst
python-virtinst-0.500.5-2.el6.noarch

Comment 7 Alex Jia 2011-03-25 14:50:12 UTC
I need to clarify some places for the following comment:

"Because the NICs device can't be successfully assigned to guest, so I can't use
virsh qemu-monitor-command to send system_reset command to guest."


I mean I haven't met the previous test environment, although I can use virsh qemu-monitor-command to send system_reset command to guest, it doesn't make sense.


Alex

Comment 8 Alex Williamson 2011-03-25 15:32:16 UTC
Hi Alex,

Sorry, I'm still confused why this is going back to ON_DEV.  Apologies if libvirt has a different bug life cycle that I'm not familiar with.  With respect to nodedev-reset, this patch makes no changes to the behavior of that interface.  For hot-plug/cold-plug failing, is this a regression caused by this change, or is this a pre-existing condition?  The changes for this patch should only affect the permissions of an extra PCI sysfs file for the device and should not change whether or not a device can be assigned.  To verify the libvirt side of things, I think it would be sufficient to check the permissions of the file /sys/bus/pci/devices/ssss:bb:dd.f/reset before and after the patch is applied.  Before, the file should be owned by root, after by the qemu user.  If using the corresponding qemu-kvm from bz685147, the reset should be triggered any time the guest reboots, or if a reset is triggered via the virsh qemu-monitor-command --hmp system_reset.  Please clarify what you're seeing and whether you expect any further fixes from ON_DEV.  Thanks,

Alex

Comment 14 Alex Williamson 2011-03-30 03:27:47 UTC
(In reply to comment #6) 
> # virsh nodedev-reset pci_0000_01_00_1
> error: Failed to reset device pci_0000_01_00_1
> error: this function is not supported by the connection driver: Unable to reset
> PCI device 0000:01:00.1: this function is not supported by the connection
> driver: Active 0000:01:00.0 devices on bus with 0000:01:00.1, not doing bus
> reset
> 
> And it's also fail to directly hot-plug/cold-plug the device to guest by
> virt-manager, the same error information will be raise:
> 
> Error starting domain: this function is not supported by the connection driver:
> Unable to reset PCI device 0000:01:00.1: this function is not supported by the
> connection driver: Active 0000:01:00.0 devices on bus with 0000:01:00.1, not
> doing bus reset

I believe these are telling you that the device you're trying to assign does not support any method of reset other than a secondary bus reset of the parent PCI bridge, but that option is unavailable because it's a multi-funciton device.  This is not a bug, unless you want to file one for the clarity of the error message.  I would suggest testing with a device known to work with assignment, such as an 82576, or even most e1000 variants.

Comment 16 Alex Jia 2011-03-30 09:29:12 UTC
(In reply to comment #14)
> (In reply to comment #6) 
> > # virsh nodedev-reset pci_0000_01_00_1
> > error: Failed to reset device pci_0000_01_00_1
> > error: this function is not supported by the connection driver: Unable to reset
> > PCI device 0000:01:00.1: this function is not supported by the connection
> > driver: Active 0000:01:00.0 devices on bus with 0000:01:00.1, not doing bus
> > reset
> > 
> > And it's also fail to directly hot-plug/cold-plug the device to guest by
> > virt-manager, the same error information will be raise:
> > 
> > Error starting domain: this function is not supported by the connection driver:
> > Unable to reset PCI device 0000:01:00.1: this function is not supported by the
> > connection driver: Active 0000:01:00.0 devices on bus with 0000:01:00.1, not
> > doing bus reset
> 
> I believe these are telling you that the device you're trying to assign does
> not support any method of reset other than a secondary bus reset of the parent
> PCI bridge, but that option is unavailable because it's a multi-funciton
> device.  This is not a bug, unless you want to file one for the clarity of the
> error message.  I would suggest testing with a device known to work with
> assignment, such as an 82576, or even most e1000 variants.

Hi Alex,
As you said, BCM5709 NICs is a single-function device, so it has no reset under the /sys/bus/pci/devices/ssss:bb:dd.f/, Intel 82576 NICs is okay, so I can only use Intel 82576 to verify the bug again.

Thanks,
Alex Jia

Comment 17 Alex Jia 2011-03-30 11:01:10 UTC
The bug has been verified on rhel6.1(2.6.32-122.el6.x86_64) with qemu-kvm-0.12.1.2-2.152.el6.x86_64 and libvirt-0.8.7-14.el6.x86_64.

# lspci |grep 82576
09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)

# virsh nodedev-dumpxml pci_0000_09_10_1
<device>
  <name>pci_0000_09_10_1</name>
  <parent>pci_0000_00_09_0</parent>
  <capability type='pci'>
    <domain>0</domain>
    <bus>9</bus>
    <slot>16</slot>
    <function>1</function>
    <product id='0x10ca'>82576 Virtual Function</product>
    <vendor id='0x8086'>Intel Corporation</vendor>
    <capability type='phys_function'>
      <address domain='0x0000' bus='0x09' slot='0x00' function='0x1'/>
    </capability>
  </capability>
</device>


Add the following xml into guest xml configuration or virsh attach-device VM pf.xml when guest is running:

# cat pf.xml 
<hostdev mode='subsystem' type='pci' managed='yes'>
  <source>
    <address bus='0x09' slot='0x0' function='0x1'/>
  </source>
</hostdev>

# virsh edit vr-rhel6u1-x86_64-kvm
Domain vr-rhel6u1-x86_64-kvm XML configuration edited.

# ll /sys/bus/pci/devices/0000\:09\:00.1/reset 
--w-------. 1 root root 4096 Mar 31 01:55 /sys/bus/pci/devices/0000:09:00.1/reset
# virsh start vr-rhel6u1-x86_64-kvm
Domain vr-rhel6u1-x86_64-kvm started

# ll /sys/bus/pci/devices/0000\:09\:00.1/reset 
--w-------. 1 qemu qemu 4096 Mar 31 01:55 /sys/bus/pci/devices/0000:09:00.1/reset

The permissions indeed change from root to qemu before and after assigning the NICs to guest, it seems the result is enough according to your advice, if so, I will change the bug status to VERIFIED, otherwise, need I to run virsh qemu-monitor-command? and then checking /sys/bus/pci/devices/0000\:09\:00.1/reset permission again.

In addition, some messages are raise when run the following virsh command, of course, it may be another issue: 
# virsh qemu-monitor-command vr-rhel6u1-x86_64-kvm --hmp system_reset

#
Message from syslogd@amd-6168-16-1 at Mar 31 02:29:20 ...
 kernel:Uhhuh. NMI received for unknown reason 21 on CPU 0.

Message from syslogd@amd-6168-16-1 at Mar 31 02:29:20 ...
 kernel:Do you have a strange power saving mode enabled?

Message from syslogd@amd-6168-16-1 at Mar 31 02:29:20 ...
 kernel:Dazed and confused, but trying to continue


Alex Jia

Comment 18 Alex Jia 2011-04-14 05:48:47 UTC
Hi Alex, 
There are questions in Comment 17, I'm not sure if it's okay for the bug, so we need your help and confirm.


Thanks,
Alex Jia

Comment 19 Alex Williamson 2011-04-14 15:06:00 UTC
(In reply to comment #17)
> The permissions indeed change from root to qemu before and after assigning the
> NICs to guest, it seems the result is enough according to your advice, if so, I
> will change the bug status to VERIFIED, otherwise, need I to run virsh
> qemu-monitor-command? and then checking
> /sys/bus/pci/devices/0000\:09\:00.1/reset permission again.

The file permissions aren't going to be changed by qemu.  I think seeing that it's now owned by qemu is sufficient.  bz685147 is the qemu side of the patch that has already been verified that when qemu does have access to the reset file, it does what it's supposed to do.  This bz is primarily around setting up those permissions.

> In addition, some messages are raise when run the following virsh command, of
> course, it may be another issue: 
> # virsh qemu-monitor-command vr-rhel6u1-x86_64-kvm --hmp system_reset
> 
> #
> Message from syslogd@amd-6168-16-1 at Mar 31 02:29:20 ...
>  kernel:Uhhuh. NMI received for unknown reason 21 on CPU 0.
> 
> Message from syslogd@amd-6168-16-1 at Mar 31 02:29:20 ...
>  kernel:Do you have a strange power saving mode enabled?
> 
> Message from syslogd@amd-6168-16-1 at Mar 31 02:29:20 ...
>  kernel:Dazed and confused, but trying to continue

Did the device continue to function correctly after the guest was reset?  I'm guessing this is an HP test system.  ISTR something with the hpwdt driver.  Do you still get these messages if you unload/blacklist the hpwdt module?

Comment 20 Alex Jia 2011-04-15 02:44:10 UTC
> The file permissions aren't going to be changed by qemu.  I think seeing that
> it's now owned by qemu is sufficient.  bz685147 is the qemu side of the patch
> that has already been verified that when qemu does have access to the reset
> file, it does what it's supposed to do.  This bz is primarily around setting > up those permissions.

According to the above comment, setting the bug status to VERIFIED.


> Did the device continue to function correctly after the guest was reset?  I'm
> guessing this is an HP test system.  ISTR something with the hpwdt driver.  Do
> you still get these messages if you unload/blacklist the hpwdt module?

Hi Alex,
It's is a Dell machine not HP, and I haven't the machine environment now, if I got it again, will try your advice later.


Thanks,
Alex Jia

Comment 24 errata-xmlrpc 2011-05-19 13:29:21 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0596.html


Note You need to log in before you can comment on or make changes to this bug.