Bug 1007712 - host call trace while chang the number of VFs through sysfs with VF in use
Summary: host call trace while chang the number of VFs through sysfs with VF in use
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm
Version: 6.5
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: rc
: ---
Assignee: Alex Williamson
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-09-13 07:59 UTC by mazhang
Modified: 2016-09-20 04:39 UTC (History)
9 users (show)

Fixed In Version: kernel-2.6.32-422.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-10-14 06:51:17 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:1490 0 normal SHIPPED_LIVE qemu-kvm bug fix and enhancement update 2014-10-14 01:28:27 UTC

Description mazhang 2013-09-13 07:59:24 UTC
Description of problem:
Boot up guest with assigned vf, rebind parent pf then chang the number of VFs through sysfs, host kernel output call trace.


Version-Release number of selected component (if applicable):

host:
RHEL6.5-20130905.1
qemu-kvm-0.12.1.2-2.400.el6.x86_64
kernel-2.6.32-417.el6.x86_64
# lspci -v -s 06:00.0
06:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
	Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter
	Flags: bus master, fast devsel, latency 0, IRQ 38
	Memory at dd740000 (32-bit, non-prefetchable) [size=128K]
	Memory at dd800000 (32-bit, non-prefetchable) [size=4M]
	I/O ports at ecc0 [size=32]
	Memory at dd738000 (32-bit, non-prefetchable) [size=16K]
	Expansion ROM at dd000000 [disabled] [size=4M]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
	Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
	Capabilities: [a0] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Device Serial Number 90-e2-ba-ff-ff-05-63-5e
	Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
	Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
	Kernel driver in use: igb
	Kernel modules: igb
# lspci -v -s 06:10.0
06:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
	Subsystem: Intel Corporation Device a03c
	Flags: fast devsel
	[virtual] Memory at dd400000 (64-bit, non-prefetchable) [size=16K]
	[virtual] Memory at dd420000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [70] MSI-X: Enable- Count=3 Masked-
	Capabilities: [a0] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
	Kernel driver in use: pci-stub
	Kernel modules: igbvf


guest:
RHEL6.5-20130905.1

How reproducible:
50%

Steps to Reproduce:
1.bring up VFs through sysfs, unbind vf.
#echo 2 > /sys/bus/pci/devices/0000\:06\:00.0/sriov_numvfs
#echo "8086 10ca" >/sys/bus/pci/drivers/pci-stub/new_id
#echo 0000:06:10.0 >/sys/bus/pci/devices/0000\:06\:10.0/driver/unbind
#echo 0000:06:10.0 >/sys/bus/pci/drivers/pci-stub/bind

2.boot up guest with this vf.
cli:
...
-device pci-assign,host=06:10.0,id=vf,romfile=/home/808610ca.rom \ 
...

3.rebind it's parent PF
#echo "8086 10c9" >/sys/bus/pci/drivers/pci-stub/new_id 
#echo 0000:06:00.0 >/sys/bus/pci/devices/0000\:06\:00.0/driver/unbind 
#echo 0000:06:00.0 >/sys/bus/pci/drivers/pci-stub/bind
#echo "8086 10c9" >/sys/bus/pci/drivers/igb/new_id 
#echo 0000:06:00.0 >/sys/bus/pci/drivers/pci-stub/unbind 
#echo 0000:06:00.0 >/sys/bus/pci/drivers/igb/bind

4. chang the number of VFs through sysfs
#echo 0 > /sys/bus/pci/devices/0000\:06\:00.0/sriov_numvfs

Actual results:
host call trace:

kernel: ------------[ cut here ]------------
kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26b/0x280() (Not tainted)
kernel: Hardware name: PowerEdge R710
kernel: NETDEV WATCHDOG: p4p1 (igb): transmit queue 6 timed out
kernel: Modules linked in: igbvf ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables autofs4 bridge stp llc ipv6 vhost_net macvtap macvlan tun kvm_intel kvm power_meter microcode dcdbas serio_raw lpc_ich mfd_core i7core_edac edac_core be2net igb dca i2c_algo_bit i2c_core ptp pps_core ses enclosure sg bnx2x libcrc32c mdio bnx2 ext4 jbd2 mbcache sr_mod cdrom usb_storage sd_mod crc_t10dif pata_acpi ata_generic ata_piix megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
kernel: Pid: 0, comm: swapper Not tainted 2.6.32-417.el6.x86_64 #1
kernel: Call Trace:
kernel: <IRQ>  [<ffffffff81071f47>] ? warn_slowpath_common+0x87/0xc0
kernel: [<ffffffff81072036>] ? warn_slowpath_fmt+0x46/0x50
kernel: [<ffffffff8147b8ab>] ? dev_watchdog+0x26b/0x280
kernel: [<ffffffff8105df0e>] ? scheduler_tick+0x11e/0x260
kernel: [<ffffffff8147b640>] ? dev_watchdog+0x0/0x280
kernel: [<ffffffff81084c27>] ? run_timer_softirq+0x197/0x340
kernel: [<ffffffff810aca95>] ? tick_dev_program_event+0x65/0xc0
kernel: [<ffffffff8107aa01>] ? __do_softirq+0xc1/0x1e0
kernel: [<ffffffff810acb6a>] ? tick_program_event+0x2a/0x30
kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
kernel: [<ffffffff8100fa75>] ? do_softirq+0x65/0xa0
kernel: [<ffffffff8107a8b5>] ? irq_exit+0x85/0x90
kernel: [<ffffffff8153129a>] ? smp_apic_timer_interrupt+0x4a/0x60
kernel: [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20
kernel: <EOI>  [<ffffffff812e0d0e>] ? intel_idle+0xde/0x170
kernel: [<ffffffff812e0cf1>] ? intel_idle+0xc1/0x170
kernel: [<ffffffff814269c7>] ? cpuidle_idle_call+0xa7/0x140
kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
kernel: [<ffffffff81520fe7>] ? start_secondary+0x2ac/0x2ef
kernel: ---[ end trace 26d94eabc924252b ]---
kernel: igb 0000:06:00.0: p4p1: Reset adapter
kernel: igb: p4p1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
dhclient[2987]: DHCPDISCOVER on p4p1 to 255.255.255.255 port 67 interval 19 (xid=0x178e9a46)
kernel: igb 0000:06:00.0: Detected Tx Unit Hang
kernel:  Tx Queue             <6>
kernel:  TDH                  <0>
kernel:  TDT                  <1>
kernel:  next_to_use          <1>
kernel:  next_to_clean        <0>
kernel: buffer_info[next_to_clean]
kernel:  time_stamp           <1000a5342>
kernel:  next_to_watch        <ffff88012a8a8000>
kernel:  jiffies              <1000a5733>
kernel:  desc.status          <558000>
kernel: igb 0000:06:00.0: Detected Tx Unit Hang
kernel:  Tx Queue             <6>
kernel:  TDH                  <0>
kernel:  TDT                  <1>
kernel:  next_to_use          <1>
kernel:  next_to_clean        <0>
kernel: buffer_info[next_to_clean]
kernel:  time_stamp           <1000a5342>
kernel:  next_to_watch        <ffff88012a8a8000>
kernel:  jiffies              <1000a5b1b>
kernel:  desc.status          <558000>
kernel: igb 0000:06:00.0: Detected Tx Unit Hang
kernel:  Tx Queue             <6>
kernel:  TDH                  <0>
kernel:  TDT                  <1>
kernel:  next_to_use          <1>
kernel:  next_to_clean        <0>
kernel: buffer_info[next_to_clean]
kernel:  time_stamp           <1000a5342>
kernel:  next_to_watch        <ffff88012a8a8000>
kernel:  jiffies              <1000a5f03>
kernel:  desc.status          <558000>
kernel: igb 0000:06:00.0: Detected Tx Unit Hang
kernel:  Tx Queue             <6>
kernel:  TDH                  <0>
kernel:  TDT                  <1>
kernel:  next_to_use          <1>
kernel:  next_to_clean        <0>
kernel: buffer_info[next_to_clean]
kernel:  time_stamp           <1000a5342>
kernel:  next_to_watch        <ffff88012a8a8000>
kernel:  jiffies              <1000a62eb>
kernel:  desc.status          <558000>
kernel: igb 0000:06:00.0: Detected Tx Unit Hang
kernel:  Tx Queue             <6>
kernel:  TDH                  <0>
kernel:  TDT                  <1>
kernel:  next_to_use          <1>
kernel:  next_to_clean        <0>
kernel: buffer_info[next_to_clean]
kernel:  time_stamp           <1000a5342>
kernel:  next_to_watch        <ffff88012a8a8000>
kernel:  jiffies              <1000a66d3>
kernel:  desc.status          <558000>
kernel: igb 0000:06:00.0: p4p1: Reset adapter
kernel: igb: p4p1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX


Expected results:
no call trace

Additional info:

Comment 3 Alex Williamson 2014-06-25 20:52:35 UTC
This seems to be fixed by bug 985733

Comment 4 mazhang 2014-06-26 06:15:17 UTC
Test this scenario on kernel-2.6.32-486.el6.x86_64 with X540-AT2 nic, not hit the problem.

[root@dell-per720-02 ~]# lspci -v -s 01:00.0
01:00.0 Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)
	Subsystem: Dell Ethernet 10G 4P X540/I350 rNDC
	Flags: bus master, fast devsel, latency 0, IRQ 109
	Memory at d5000000 (64-bit, prefetchable) [size=2M]
	Memory at d55f8000 (64-bit, prefetchable) [size=16K]
	Expansion ROM at d8000000 [disabled] [size=512K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+
	Capabilities: [70] MSI-X: Enable- Count=64 Masked-
	Capabilities: [a0] Express Endpoint, MSI 00
	Capabilities: [e0] Vital Product Data
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
	Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
	Capabilities: [1d0] Access Control Services
	Kernel driver in use: ixgbe
	Kernel modules: ixgbe

Comment 5 Ademar Reis 2014-06-26 13:46:17 UTC
(In reply to Alex Williamson from comment #3)
> This seems to be fixed by bug 985733

Marking this one TestOnly.

Comment 6 mazhang 2014-07-01 06:23:46 UTC
Test this bug on kernel-2.6.32-488.el6.x86_64 with 82576 nic, host works well.

[root@amd-6168-256-1 pci-stub]# lspci -v -s 23:00.0
23:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
	Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter
	Flags: bus master, fast devsel, latency 0, IRQ 64
	Memory at e53a0000 (32-bit, non-prefetchable) [size=128K]
	Memory at e4400000 (32-bit, non-prefetchable) [size=4M]
	I/O ports at ccc0 [size=32]
	Memory at e53f8000 (32-bit, non-prefetchable) [size=16K]
	Expansion ROM at e4c00000 [disabled] [size=4M]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
	Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
	Capabilities: [a0] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Device Serial Number 90-e2-ba-ff-ff-05-63-5e
	Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
	Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
	Kernel driver in use: igb
	Kernel modules: igb

[root@amd-6168-256-1 pci-stub]# lspci -v -s 23:10.0
23:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
	Subsystem: Intel Corporation Device a03c
	Flags: bus master, fast devsel, latency 0
	[virtual] Memory at e5000000 (64-bit, non-prefetchable) [size=16K]
	[virtual] Memory at e5020000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [70] MSI-X: Enable+ Count=3 Masked-
	Capabilities: [a0] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
	Kernel driver in use: pci-stub
	Kernel modules: igbvf

So this bug has been fixed.

Comment 7 errata-xmlrpc 2014-10-14 06:51:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1490.html


Note You need to log in before you can comment on or make changes to this bug.