Bug 520572

Summary: SR-IOV -- Guest exit and host hang on if boot VM with 8 VFs assigned
Product: Red Hat Enterprise Linux 5 Reporter: Yolkfull Chow <yzhou>
Component: kvmAssignee: Don Dutile (Red Hat) <ddutile>
Status: CLOSED ERRATA QA Contact: Lawrence Lim <llim>
Severity: medium Docs Contact:
Priority: high    
Version: 5.4CC: cpelland, ehabkost, juzhang, ndai, qzhang, tburke, tools-bugs, virt-maint, ykaul
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kvm-83-165.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-13 23:11:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 579862, 579863    

Description Yolkfull Chow 2009-09-01 07:44:06 UTC
Description of problem:
Boot a guest with 8 VFs assigned will definitely lead to host hang on and guest process will exit as well. Following messages could be found on host:

assigned_dev_enable_msix: assign irq: Bad address


Version-Release number of selected component (if applicable):
kvm-83-105

How reproducible:
Everytime

Steps to Reproduce:
1. Setup SR-IOV, set max_vfs=7 when modprobe igb
2. Bind 8 VFs to pci-stub driver
3. boot a guest with these 8 VFs as pass throughable devices
  
Actual results:
Guest process exited and host hang on

Expected results:
guest boots up successfully

Additional info:
External bug reference: http://sourceforge.net/tracker/?func=detail&aid=2847560&group_id=180599&atid=893831

Comment 1 Yolkfull Chow 2009-09-03 08:46:38 UTC
Sometimes, we could find TX unit hang in host dmesg when booting guest with 8 VFs:

...
kvm: exhaust allocatable IRQ sources!
kvm: exhaust allocatable IRQ sources!
NETDEV WATCHDOG: eth0: transmit timed out
igb 0000:28:00.0: Detected Tx Unit Hang
  Tx Queue             <0>
  TDH                  <a3>
  TDT                  <8d>
  next_to_use          <8d>
  next_to_clean        <a3>
buffer_info[next_to_clean]
  time_stamp           <10029ad8b>
  next_to_watch        <a3>
  jiffies              <10029e06b>
  desc.status          <a8000>
breth0: port 1(eth0) entering disabled state
igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
breth0: topology change detected, propagating
breth0: port 1(eth0) entering forwarding state
NETDEV WATCHDOG: eth0: transmit timed out
igb 0000:28:00.0: Detected Tx Unit Hang
  Tx Queue             <0>
  TDH                  <d7>
  TDT                  <c1>
  next_to_use          <c1>
  next_to_clean        <d7>
buffer_info[next_to_clean]
  time_stamp           <1002a263f>
  next_to_watch        <d7>
  jiffies              <1002a4e33>
  desc.status          <a8000>
breth0: port 1(eth0) entering disabled state
igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
breth0: topology change detected, propagating
breth0: port 1(eth0) entering forwarding state
...

Comment 3 Lawrence Lim 2010-02-24 01:53:33 UTC
Bug has been reported for a while, could you please retest and find out if the problem still exist??

Comment 4 Yolkfull Chow 2010-02-24 07:59:33 UTC
Just re-tested this problem on 83-159, guest did not hang this time but we can still find following error messages:

kvm: exhaust allocatable IRQ sources!
kvm: exhaust allocatable IRQ sources!

Comment 6 Don Dutile (Red Hat) 2010-03-04 21:10:58 UTC
Backported the patch: "KVM: fix irq_source_id size verification" .

You can pull the rpm's from here:

http://people.redhat.com/~ddutile/rhel5/bz520572/

Please install these rpm's and test, and let me know if it fixes the problem.

Comment 7 Don Dutile (Red Hat) 2010-03-04 21:12:34 UTC
Additional note: test kvm rpm's built against -191 kernel.

If you can pull & test with -191, that'd be optimal.
One of the latter ones (past say, -186) ought to do as well.

- Don

Comment 8 Yolkfull Chow 2010-03-09 03:14:58 UTC
Just tested the patch, don't work and even bad that guest is hang during booting up:

# rpm -qa |grep kvm
kvm-qemu-img-83-161.el5bz520572v1
etherboot-zroms-kvm-5.4.4-13.el5
kvm-83-161.el5bz520572v1
kvm-debuginfo-83-161.el5bz520572v1
kvm-tools-83-161.el5bz520572v1
kmod-kvm-83-161.el5bz520572v1
[root@virtlab-66-84-58 ~]# uname -a
Linux virtlab-66-84-58.englab.nay.redhat.com 2.6.18-191.el5 #1 SMP Mon Mar 1 15:59:02 EST 2010 x86_64 x86_64 x86_64 GNU/Linux

Command:


#qemu-kvm -drive file=/tmp/kvm_autotest_root/images/RHEL-Server-5.5-32.qcow2,if=ide,boot=on -m 512 -smp 1 -vnc :0 -pcidevice host=42:10.0 -pcidevice host=42:10.1 -pcidevice host=42:10.2 -pcidevice host=42:10.3 -pcidevice host=42:10.4 -pcidevice host=42:10.5 -pcidevice host=42:10.6 -pcidevice host=42:10.7

Comment 9 Don Dutile (Red Hat) 2010-03-09 15:11:07 UTC
Please attach the following logs for both failing cases:

(a) /var/log/message (dmesg) (on host)
(b) /var/log/libvirt/qemu/<guest-name>.log (on host)

If possible, running libvirt with debug on:
export LIBVIRT_DEBUG=1
export LIBVIRT_LOG_OUTPUT="1:file:<dir/filename>"

and posting the libvirt output.

btw -- please try with virsh commands & xml for guest,
       vs qemu-kvm cmdline directly, so libvirt can capture
       the qemu log.

Comment 10 Yolkfull Chow 2010-03-11 03:33:00 UTC
Hi Don,

It's strange that I re-tested this problem based on the patched RPMs, the guest worked fine and cannot find extra error message in dmesg except:

BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1)


I also tested on 83-161 RPMs which are not patched, can find error messages:

kvm: exhaust allocatable IRQ sources!

assigned_dev_enable_msix: assign irq: Bad address


Thus we can say the patch fixed the problem. Don't know what I misoperated last time...

Comment 11 Don Dutile (Red Hat) 2010-03-12 18:57:57 UTC
Maybe you forgot to re-boot before re-testing?  installing the rpm
won't affect the currently running kernel unless you rmmod the kvm modules,
then modprobe the new ones back in.

anyhow, glad to see the second test effort showed positive results.

I'll dig into the kvm_destroy_phys_mem message to see if it
truly indicates a possible bug or an unexpected code path for dev assignment.

Will post patch shortly, and ask if it should be a candidate for 5.5-z.

- Don

Comment 12 Yolkfull Chow 2010-03-15 02:08:36 UTC
No, I had `modprobe -r` the old kvm modules and loaded the new kvm module, checked good before testing. Weird what's wrong...

Anyway, let's ignore the first testing results. :)

Comment 23 Qunfang Zhang 2010-04-08 09:54:04 UTC
When I verify the bug I meet a blocker. Host kernel panic when boot with "intel_iommu=on" in the kernel line.
see:
https://bugzilla.redhat.com/show_bug.cgi?id=580425

Comment 24 Qunfang Zhang 2010-04-12 09:38:23 UTC
Can reproduce the issue on kvm-83-164.el5
1.In qemu monitor, displays: "assigned_dev_enable_msix: assign irq: Bad address"
2.In the dmesg of guest, there are: "kvm: exhaust allocatable IRQ sources!"

Re-test in kvm-83-165.el5 and kvm-83-169.el5, kernel: 2.6.18-194.el5.this issue does not exist.

Command line:
/usr/libexec/qemu-kvm -no-hpet -usbdevice tablet -rtc-td-hack -smp 2 -m 2G -drive file=RHEL5.5-Server-64.qcow2,media=disk,if=ide,cache=off -net none -vnc :10 -monitor stdio -cpu qemu64,+sse2 -pcidevice host=03:10.0 -pcidevice host=03:10.1 -pcidevice host=03:10.2 -pcidevice host=03:10.3 -pcidevice host=03:10.4 -pcidevice host=03:10.5 -pcidevice host=03:10.6 -pcidevice host=03:10.7

Steps:
1.rmmod igb
modprobe igb max_vfs=7
2.bind 8 VFs to pci-stub driver.
3.boot a guest with above command line.
4.check dmesg of guest and host,and also check if there are error message in
qemu monitor.

Comment 26 juzhang 2010-11-03 06:06:51 UTC
Verified on kvm-83-206.el5,

/usr/libexec/qemu-kvm -no-hpet -usbdevice tablet -rtc-td-hack -m 4G -smp 2 -monitor stdio -drive file=/root/zhangjunyi/rhel5.6ide.raw,if=ide,boot=on,werror=stop,format=raw -net nic,vlan=0,macaddr=22:11:22:45:66:83,model=e1000 -net tap,vlan=0,script=/etc/qemu-ifup -uuid `uuidgen` -cpu qemu64,+sse2 -balloon none -boot c  -vnc :10 -notify all -boot c -pcidevice host=03:10.0 -pcidevice host=03:10.1 -pcidevice host=03:10.2 -pcidevice host=03:10.3 -pcidevice host=03:10.4 -pcidevice host=03:10.5 -pcidevice host=03:10.6 -pcidevice host=03:10.7
QEMU 0.9.1 monitor - type 'help' for more information
(qemu) BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1)
BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1)


Guest process works well and host works well and vf works well in guest.
.however,emit lots of "BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1)" in qemu monitor.I have filed a bug Bug 645322 - Emit message "BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1)" when boot guest with vf/pf attached.

Comment 29 errata-xmlrpc 2011-01-13 23:11:46 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0028.html