Bug 1007712 - host call trace while chang the number of VFs through sysfs with VF in use
host call trace while chang the number of VFs through sysfs with VF in use
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
6.5
Unspecified Unspecified
low Severity low
: rc
: ---
Assigned To: Alex Williamson
Virtualization Bugs
: TestOnly
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-13 03:59 EDT by mazhang
Modified: 2016-09-20 00:39 EDT (History)
9 users (show)

See Also:
Fixed In Version: kernel-2.6.32-422.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-10-14 02:51:17 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:1490 normal SHIPPED_LIVE qemu-kvm bug fix and enhancement update 2014-10-13 21:28:27 EDT

  None (edit)
Description mazhang 2013-09-13 03:59:24 EDT
Description of problem:
Boot up guest with assigned vf, rebind parent pf then chang the number of VFs through sysfs, host kernel output call trace.


Version-Release number of selected component (if applicable):

host:
RHEL6.5-20130905.1
qemu-kvm-0.12.1.2-2.400.el6.x86_64
kernel-2.6.32-417.el6.x86_64
# lspci -v -s 06:00.0
06:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
	Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter
	Flags: bus master, fast devsel, latency 0, IRQ 38
	Memory at dd740000 (32-bit, non-prefetchable) [size=128K]
	Memory at dd800000 (32-bit, non-prefetchable) [size=4M]
	I/O ports at ecc0 [size=32]
	Memory at dd738000 (32-bit, non-prefetchable) [size=16K]
	Expansion ROM at dd000000 [disabled] [size=4M]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
	Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
	Capabilities: [a0] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Device Serial Number 90-e2-ba-ff-ff-05-63-5e
	Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
	Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
	Kernel driver in use: igb
	Kernel modules: igb
# lspci -v -s 06:10.0
06:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
	Subsystem: Intel Corporation Device a03c
	Flags: fast devsel
	[virtual] Memory at dd400000 (64-bit, non-prefetchable) [size=16K]
	[virtual] Memory at dd420000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [70] MSI-X: Enable- Count=3 Masked-
	Capabilities: [a0] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
	Kernel driver in use: pci-stub
	Kernel modules: igbvf


guest:
RHEL6.5-20130905.1

How reproducible:
50%

Steps to Reproduce:
1.bring up VFs through sysfs, unbind vf.
#echo 2 > /sys/bus/pci/devices/0000\:06\:00.0/sriov_numvfs
#echo "8086 10ca" >/sys/bus/pci/drivers/pci-stub/new_id
#echo 0000:06:10.0 >/sys/bus/pci/devices/0000\:06\:10.0/driver/unbind
#echo 0000:06:10.0 >/sys/bus/pci/drivers/pci-stub/bind

2.boot up guest with this vf.
cli:
...
-device pci-assign,host=06:10.0,id=vf,romfile=/home/808610ca.rom \ 
...

3.rebind it's parent PF
#echo "8086 10c9" >/sys/bus/pci/drivers/pci-stub/new_id 
#echo 0000:06:00.0 >/sys/bus/pci/devices/0000\:06\:00.0/driver/unbind 
#echo 0000:06:00.0 >/sys/bus/pci/drivers/pci-stub/bind
#echo "8086 10c9" >/sys/bus/pci/drivers/igb/new_id 
#echo 0000:06:00.0 >/sys/bus/pci/drivers/pci-stub/unbind 
#echo 0000:06:00.0 >/sys/bus/pci/drivers/igb/bind

4. chang the number of VFs through sysfs
#echo 0 > /sys/bus/pci/devices/0000\:06\:00.0/sriov_numvfs

Actual results:
host call trace:

kernel: ------------[ cut here ]------------
kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26b/0x280() (Not tainted)
kernel: Hardware name: PowerEdge R710
kernel: NETDEV WATCHDOG: p4p1 (igb): transmit queue 6 timed out
kernel: Modules linked in: igbvf ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables autofs4 bridge stp llc ipv6 vhost_net macvtap macvlan tun kvm_intel kvm power_meter microcode dcdbas serio_raw lpc_ich mfd_core i7core_edac edac_core be2net igb dca i2c_algo_bit i2c_core ptp pps_core ses enclosure sg bnx2x libcrc32c mdio bnx2 ext4 jbd2 mbcache sr_mod cdrom usb_storage sd_mod crc_t10dif pata_acpi ata_generic ata_piix megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
kernel: Pid: 0, comm: swapper Not tainted 2.6.32-417.el6.x86_64 #1
kernel: Call Trace:
kernel: <IRQ>  [<ffffffff81071f47>] ? warn_slowpath_common+0x87/0xc0
kernel: [<ffffffff81072036>] ? warn_slowpath_fmt+0x46/0x50
kernel: [<ffffffff8147b8ab>] ? dev_watchdog+0x26b/0x280
kernel: [<ffffffff8105df0e>] ? scheduler_tick+0x11e/0x260
kernel: [<ffffffff8147b640>] ? dev_watchdog+0x0/0x280
kernel: [<ffffffff81084c27>] ? run_timer_softirq+0x197/0x340
kernel: [<ffffffff810aca95>] ? tick_dev_program_event+0x65/0xc0
kernel: [<ffffffff8107aa01>] ? __do_softirq+0xc1/0x1e0
kernel: [<ffffffff810acb6a>] ? tick_program_event+0x2a/0x30
kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
kernel: [<ffffffff8100fa75>] ? do_softirq+0x65/0xa0
kernel: [<ffffffff8107a8b5>] ? irq_exit+0x85/0x90
kernel: [<ffffffff8153129a>] ? smp_apic_timer_interrupt+0x4a/0x60
kernel: [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20
kernel: <EOI>  [<ffffffff812e0d0e>] ? intel_idle+0xde/0x170
kernel: [<ffffffff812e0cf1>] ? intel_idle+0xc1/0x170
kernel: [<ffffffff814269c7>] ? cpuidle_idle_call+0xa7/0x140
kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
kernel: [<ffffffff81520fe7>] ? start_secondary+0x2ac/0x2ef
kernel: ---[ end trace 26d94eabc924252b ]---
kernel: igb 0000:06:00.0: p4p1: Reset adapter
kernel: igb: p4p1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
dhclient[2987]: DHCPDISCOVER on p4p1 to 255.255.255.255 port 67 interval 19 (xid=0x178e9a46)
kernel: igb 0000:06:00.0: Detected Tx Unit Hang
kernel:  Tx Queue             <6>
kernel:  TDH                  <0>
kernel:  TDT                  <1>
kernel:  next_to_use          <1>
kernel:  next_to_clean        <0>
kernel: buffer_info[next_to_clean]
kernel:  time_stamp           <1000a5342>
kernel:  next_to_watch        <ffff88012a8a8000>
kernel:  jiffies              <1000a5733>
kernel:  desc.status          <558000>
kernel: igb 0000:06:00.0: Detected Tx Unit Hang
kernel:  Tx Queue             <6>
kernel:  TDH                  <0>
kernel:  TDT                  <1>
kernel:  next_to_use          <1>
kernel:  next_to_clean        <0>
kernel: buffer_info[next_to_clean]
kernel:  time_stamp           <1000a5342>
kernel:  next_to_watch        <ffff88012a8a8000>
kernel:  jiffies              <1000a5b1b>
kernel:  desc.status          <558000>
kernel: igb 0000:06:00.0: Detected Tx Unit Hang
kernel:  Tx Queue             <6>
kernel:  TDH                  <0>
kernel:  TDT                  <1>
kernel:  next_to_use          <1>
kernel:  next_to_clean        <0>
kernel: buffer_info[next_to_clean]
kernel:  time_stamp           <1000a5342>
kernel:  next_to_watch        <ffff88012a8a8000>
kernel:  jiffies              <1000a5f03>
kernel:  desc.status          <558000>
kernel: igb 0000:06:00.0: Detected Tx Unit Hang
kernel:  Tx Queue             <6>
kernel:  TDH                  <0>
kernel:  TDT                  <1>
kernel:  next_to_use          <1>
kernel:  next_to_clean        <0>
kernel: buffer_info[next_to_clean]
kernel:  time_stamp           <1000a5342>
kernel:  next_to_watch        <ffff88012a8a8000>
kernel:  jiffies              <1000a62eb>
kernel:  desc.status          <558000>
kernel: igb 0000:06:00.0: Detected Tx Unit Hang
kernel:  Tx Queue             <6>
kernel:  TDH                  <0>
kernel:  TDT                  <1>
kernel:  next_to_use          <1>
kernel:  next_to_clean        <0>
kernel: buffer_info[next_to_clean]
kernel:  time_stamp           <1000a5342>
kernel:  next_to_watch        <ffff88012a8a8000>
kernel:  jiffies              <1000a66d3>
kernel:  desc.status          <558000>
kernel: igb 0000:06:00.0: p4p1: Reset adapter
kernel: igb: p4p1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX


Expected results:
no call trace

Additional info:
Comment 3 Alex Williamson 2014-06-25 16:52:35 EDT
This seems to be fixed by bug 985733
Comment 4 mazhang 2014-06-26 02:15:17 EDT
Test this scenario on kernel-2.6.32-486.el6.x86_64 with X540-AT2 nic, not hit the problem.

[root@dell-per720-02 ~]# lspci -v -s 01:00.0
01:00.0 Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)
	Subsystem: Dell Ethernet 10G 4P X540/I350 rNDC
	Flags: bus master, fast devsel, latency 0, IRQ 109
	Memory at d5000000 (64-bit, prefetchable) [size=2M]
	Memory at d55f8000 (64-bit, prefetchable) [size=16K]
	Expansion ROM at d8000000 [disabled] [size=512K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+
	Capabilities: [70] MSI-X: Enable- Count=64 Masked-
	Capabilities: [a0] Express Endpoint, MSI 00
	Capabilities: [e0] Vital Product Data
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
	Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
	Capabilities: [1d0] Access Control Services
	Kernel driver in use: ixgbe
	Kernel modules: ixgbe
Comment 5 Ademar Reis 2014-06-26 09:46:17 EDT
(In reply to Alex Williamson from comment #3)
> This seems to be fixed by bug 985733

Marking this one TestOnly.
Comment 6 mazhang 2014-07-01 02:23:46 EDT
Test this bug on kernel-2.6.32-488.el6.x86_64 with 82576 nic, host works well.

[root@amd-6168-256-1 pci-stub]# lspci -v -s 23:00.0
23:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
	Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter
	Flags: bus master, fast devsel, latency 0, IRQ 64
	Memory at e53a0000 (32-bit, non-prefetchable) [size=128K]
	Memory at e4400000 (32-bit, non-prefetchable) [size=4M]
	I/O ports at ccc0 [size=32]
	Memory at e53f8000 (32-bit, non-prefetchable) [size=16K]
	Expansion ROM at e4c00000 [disabled] [size=4M]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
	Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
	Capabilities: [a0] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Device Serial Number 90-e2-ba-ff-ff-05-63-5e
	Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
	Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
	Kernel driver in use: igb
	Kernel modules: igb

[root@amd-6168-256-1 pci-stub]# lspci -v -s 23:10.0
23:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
	Subsystem: Intel Corporation Device a03c
	Flags: bus master, fast devsel, latency 0
	[virtual] Memory at e5000000 (64-bit, non-prefetchable) [size=16K]
	[virtual] Memory at e5020000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [70] MSI-X: Enable+ Count=3 Masked-
	Capabilities: [a0] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
	Kernel driver in use: pci-stub
	Kernel modules: igbvf

So this bug has been fixed.
Comment 7 errata-xmlrpc 2014-10-14 02:51:17 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1490.html

Note You need to log in before you can comment on or make changes to this bug.