Bug 703416 - host kernel panic while guest running on 10G public bridge.
Summary: host kernel panic while guest running on 10G public bridge.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 5.7
Assignee: Herbert Xu
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 706051 (view as bug list)
Depends On:
Blocks: Rhel5KvmTier1 629795
TreeView+ depends on / blocked
 
Reported: 2011-05-10 10:08 UTC by Quan Wenli
Modified: 2011-07-21 10:07 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-07-21 10:07:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
host-rhel5.7-kernelpanic-virtio.png (123.20 KB, image/png)
2011-05-10 10:08 UTC, Quan Wenli
no flags Details
hsot-rhel5.7-kernelpanic-e1000.png (118.67 KB, image/png)
2011-05-10 10:09 UTC, Quan Wenli
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1065 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.7 kernel security and bug fix update 2011-07-21 09:21:37 UTC

Description Quan Wenli 2011-05-10 10:08:15 UTC
Created attachment 497980 [details]
host-rhel5.7-kernelpanic-virtio.png

Description of problem:

Test following scenarios :

1 running netserver in the guest with virtio driver on 10G public bridge ,running netperf with tcp protocol on ex-host ->  host kernel panic , please refer to attachment.
2 switching guest with e1000 driver on 10G public bridge - >  got very low throughput (about 0.5 megabits/sec) firstly on the ex-host.then run netperf again, the host kernel panic. please refer to  attachment.
3 switching guest on 1G public bridge - > host kernel doesn't panic 
4 running 1st scenario on host rhel 5.5 / 5.6  , got very low throughput (about 3 megabits/sec) on the ex-host.netperf results on host rhel5.6:
#cat exhost2guest_TCP_STREAM.log.1 
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.58.13 (192.168.58.13) port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  16384     32    60.00         3.67   0.18     2.18     31.585  97.011 
 87380  16384     64    60.00         4.28   0.13     2.05     19.766  78.451 
 87380  16384    128    60.01         4.37   0.11     2.22     16.231  83.015 
 87380  16384    256    60.00         4.08   0.09     2.00     14.061  80.394 
 87380  16384    512    60.00         2.70   0.06     1.75     14.166  106.262 
 87380  16384   1024    60.00         2.87   0.05     2.32     10.923  132.027 
 87380  16384   1460    60.00         2.79   0.05     2.58     10.751  151.512 
 87380  16384   2048    60.01         2.96   0.04     2.73     9.700   151.529 
 87380  16384   4096    60.03         0.30   0.01     1.52     27.538  840.206 
 87380  16384   8192    60.01         3.15   0.05     1.77     9.961   91.813 
 87380  16384   9000    60.00         3.02   0.04     1.93     9.491   104.434 
 87380  16384  16384    60.01         2.76   0.04     2.72     9.900   161.333 
 87380  16384  32768    60.00         2.91   0.05     2.88     10.794  162.398 
 87380  16384  65495    60.15         0.22   0.02     1.01     56.834  770.551 
 87380  16384  65507    60.00         2.55   0.04     2.74     11.235  176.034 

Version-Release number of selected component (if applicable):

netperf-2.4.5
kernel-2.6.18-259.el5
kvm-83-232.el5

How reproducible:

100%

Steps to Reproduce:
1.cmd
/usr/libexec/qemu-kvm  -name 'vm1'  -drive file=/root/RHEL-Server-5.7-64-virtio.raw,index=0,if=virtio,boot=on,media=disk,cache=none,format=raw -net nic,vlan=0,model=virtio,macaddr='9a:3b:dd:52:d9:d7' -net tap,vlan=0,script=/etc/qemu-ifup -m 4096 -smp 2,cores=1,threads=1,sockets=2  -cpu qemu64,+sse2   -vnc :0 -rtc-td-hack  -boot c  -usbdevice tablet -no-kvm-pit-reinjection
2.runing netserver in the guest
3.running netperf on the ex-host
#netperf -H 192.168.0.123 (guest_ip)
  
Actual results:

host kernel panic while guest running  on 10G private bridge.

Expected results:


Additional info:

# ethtool eth2
Settings for eth2:
	Supported ports: [ FIBRE ]
	Supported link modes:   1000baseT/Full 
                               10000baseT/Full 
	Supports auto-negotiation: Yes
	Advertised link modes:  1000baseT/Full 
	                        10000baseT/Full 
	Advertised auto-negotiation: Yes
	Speed: 10000Mb/s
	Duplex: Full
	Port: FIBRE
	PHYAD: 0
	Transceiver: external
	Auto-negotiation: on
	Supports Wake-on: d
	Wake-on: d
	Current message level: 0x00000007 (7)
	Link detected: yes

# ethtool -k  eth2
Offload parameters for eth2:
Cannot get device udp large send offload settings: Operation not supported
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: off
generic-receive-offload: on

# brctl show
bridge name	bridge id		STP enabled	interfaces
switch		8000.001b216e074c	no		tap0
							eth2


Note: this bug blocked kvm network performance testing.

Comment 1 Quan Wenli 2011-05-10 10:09:13 UTC
Created attachment 497981 [details]
hsot-rhel5.7-kernelpanic-e1000.png

Comment 3 Quan Wenli 2011-05-11 05:14:40 UTC
backtrace with virtio driver on 10G public bridge.

crash> bt
PID: 5649   TASK: ffff81022f4df860  CPU: 2   COMMAND: "qemu-kvm"
 #0 [ffff81021c7dfae0] crash_kexec at ffffffff800afb0e
 #1 [ffff81021c7dfba0] __die at ffffffff80065127
 #2 [ffff81021c7dfbe0] do_page_fault at ffffffff80067474
 #3 [ffff81021c7dfcd0] error_exit at ffffffff8005dde9
    [exception RIP: list_del+11]
    RIP: ffffffff80158688  RSP: ffff81021c7dfd88  RFLAGS: 00010096
    RAX: 0000000000000002  RBX: 0000000000000002  RCX: 0000000000000000
    RDX: 0000000000000001  RSI: ffff81022fefcc00  RDI: ffff81022fefcc00
    RBP: ffff81022fefcc00   R8: ffff810237177580   R9: ffff810107aeac00
    R10: 0000000000000000  R11: ffff81022c47b820  R12: ffff810237177580
    R13: ffff810107ace5c0  R14: 000000000000000a  R15: ffff810107ad8340
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #4 [ffff81021c7dfd90] cache_alloc_refill at ffffffff8005bd63
 #5 [ffff81021c7dfdd0] kmem_cache_alloc at ffffffff8000ad28
 #6 [ffff81021c7dfdf0] audit_alloc at ffffffff8004949b
 #7 [ffff81021c7dfe30] copy_process at ffffffff8001f72d
 #8 [ffff81021c7dfec0] do_fork at ffffffff80031076
 #9 [ffff81021c7dff50] ptregscall_common at ffffffff8005d427
    RIP: 000000382aad4481  RSP: 00000000416cbd48  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: 000000006e8dd940  RCX: ffffffffffffffff
    RDX: 000000006e8dd9d0  RSI: 000000006e8dd210  RDI: 00000000003d0f00
    RBP: 0000000000000000   R8: 000000006e8dd940   R9: 000000006e8dd940
    R10: 000000006e8dd9d0  R11: 0000000000000202  R12: 0000000000000000
    R13: 0000000000000003  R14: 00000000004172f0  R15: 0000000000001000
    ORIG_RAX: 0000000000000038  CS: 0033  SS: 002b


backtrace with e1000 driver on 10G public bridge.

crash> bt
PID: 5257   TASK: ffff81022ff197e0  CPU: 2   COMMAND: "qemu-kvm"
 #0 [ffff810107b9ba70] crash_kexec at ffffffff800afb0e
 #1 [ffff810107b9bb30] __die at ffffffff80065127
 #2 [ffff810107b9bb70] die at ffffffff8006c729
 #3 [ffff810107b9bba0] do_invalid_op at ffffffff8006cce9
 #4 [ffff810107b9bc60] error_exit at ffffffff8005dde9
    [exception RIP: __list_add+36]
    RIP: ffffffff8015870c  RSP: ffff810107b9bd10  RFLAGS: 00010086
    RAX: 0000000000000058  RBX: 0000000000000001  RCX: ffffffff8031df28
    RDX: ffffffff8031df28  RSI: 0000000000000000  RDI: ffffffff8031df20
    RBP: ffff81022f98f9c0   R8: ffffffff8031df28   R9: 0000000000000001
    R10: 0000000000000000  R11: ffffffff8017600c  R12: ffff810237177380
    R13: ffff810107ace6c0  R14: 000000000000000b  R15: ffff810107ada3c0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffff810107b9bd18] cache_alloc_refill at ffffffff8005bd78
 #6 [ffff810107b9bd58] __kmalloc at ffffffff800de344
 #7 [ffff810107b9bd78] __alloc_skb at ffffffff8002dedf
 #8 [ffff810107b9bdb8] __netdev_alloc_skb at ffffffff802318ab
 #9 [ffff810107b9bdc8] ixgbe_alloc_rx_buffers at ffffffff882d86a3
#10 [ffff810107b9be08] ixgbe_clean_rx_irq at ffffffff882d9fed
#11 [ffff810107b9be98] ixgbe_clean_rxtx_many at ffffffff882dd04c
#12 [ffff810107b9bef8] net_rx_action at ffffffff8000ca51
#13 [ffff810107b9bf38] __do_softirq at ffffffff80012557
#14 [ffff810107b9bf68] call_softirq at ffffffff8005e2fc
#15 [ffff810107b9bf80] do_softirq at ffffffff8006d5e6
#16 [ffff810107b9bf90] do_IRQ at ffffffff8006d476
--- <IRQ stack> ---
#17 [ffff810211f87d38] ret_from_intr at ffffffff8005d615
    [exception RIP: kvm_arch_vcpu_ioctl_run+928]
    RIP: ffffffff884968c5  RSP: ffff810211f87de8  RFLAGS: 00000202
    RAX: 0000000000000000  RBX: ffff810211f44040  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: ffff810211f44040
    RBP: 00000002083e701e   R8: 0000000000000001   R9: 000000000000003f
    R10: ffff810104758008  R11: ffff810136ed2718  R12: 0000000111f44040
    R13: ffff810211f44040  R14: ffffffff884994ca  R15: 0000000000000000
    ORIG_RAX: ffffffffffffff6d  CS: 0010  SS: 0018
#18 [ffff810211f87e20] kvm_vcpu_ioctl at ffffffff88491e5c
#19 [ffff810211f87eb0] do_ioctl at ffffffff80041f05
#20 [ffff810211f87ed0] vfs_ioctl at ffffffff80030007
#21 [ffff810211f87f40] sys_ioctl at ffffffff8004c29e
#22 [ffff810211f87f80] tracesys at ffffffff8005d28d (via system_call)
    RIP: 000000382aaccda7  RSP: 0000000042044f58  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: ffffffff8005d28d  RCX: ffffffffffffffff
    RDX: 0000000000000000  RSI: 000000000000ae80  RDI: 000000000000000f
    RBP: 0000000000000000   R8: 0000000000500af0   R9: 0000000000001489
    R10: 172d1104f2cbfcff  R11: 0000000000000246  R12: 0000000000000001
    R13: 0000000000000001  R14: 00000000153e1010  R15: 0000000000000000
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b

Comment 4 Quan Wenli 2011-05-11 06:27:04 UTC
this bug also could be reproduced with kvm-83-229.el5.

Comment 12 Quan Wenli 2011-05-19 06:08:44 UTC
retest it with kernel-2.6.18-260debug.el5.

disabled tso and enable gso 

ethtool -k eth2
Offload parameters for eth2:
Cannot get device udp large send offload settings: Operation not supported
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: on
generic-receive-offload: on

ethtool -k breth0
Offload parameters for breth0:
Cannot get device rx csum settings: Operation not supported
Cannot get device udp large send offload settings: Operation not supported
rx-checksumming: off
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: on
generic-receive-offload: off

backtrace with virtio driver on 10G public bridge.

crash /usr/lib/debug/lib/modules/2.6.18-260.el5debug/vmlinux  /var/crash/2011-05-19-18\:58/vmcore 
....................
crash> bt
PID: 0      TASK: ffff81010d3820c0  CPU: 1   COMMAND: "swapper"
 #0 [ffff81023c7c3a60] crash_kexec at ffffffff800ba270
 #1 [ffff81023c7c3b20] __die at ffffffff80069047
 #2 [ffff81023c7c3b60] die at ffffffff8007070f
 #3 [ffff81023c7c3b90] do_invalid_op at ffffffff80070d09
 #4 [ffff81023c7c3c50] error_exit at ffffffff80060e9d
    [exception RIP: __list_add+36]
    RIP: ffffffff80166257  RSP: ffff81023c7c3d00  RFLAGS: 00010082
    RAX: 0000000000000058  RBX: 0000000000000001  RCX: 0000000000000058
    RDX: ffff81010d3820c0  RSI: 0000000000000000  RDI: ffffffff803368e0
    RBP: ffff81020f1f2ae0   R8: 0000000000000002   R9: ffffffff80017f4d
    R10: ffffffff80017f4d  R11: ffffffff80183dfe  R12: ffff81010d361118
    R13: ffff81010d2ea3c0  R14: ffff81010d2ded90  R15: 000000000000000b
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffff81023c7c3d08] cache_alloc_refill at ffffffff8005e9f8
 #6 [ffff81023c7c3d48] __kmalloc_track_caller at ffffffff80011741
 #7 [ffff81023c7c3d78] __alloc_skb at ffffffff8002f976
 #8 [ffff81023c7c3db8] __netdev_alloc_skb at ffffffff8024179e
 #9 [ffff81023c7c3dc8] ixgbe_alloc_rx_buffers at ffffffff882ca6ee
#10 [ffff81023c7c3e08] ixgbe_clean_rx_irq at ffffffff882cc03e
#11 [ffff81023c7c3e98] ixgbe_clean_rxtx_many at ffffffff882cf0e1
#12 [ffff81023c7c3ef8] net_rx_action at ffffffff8000d311
#13 [ffff81023c7c3f38] __do_softirq at ffffffff8001307a
#14 [ffff81023c7c3f68] call_softirq at ffffffff800613d0
#15 [ffff81023c7c3f80] do_softirq at ffffffff8007164c
#16 [ffff81023c7c3f90] do_IRQ at ffffffff80071612
--- <IRQ stack> ---
#17 [ffff81023c7bfdf8] ret_from_intr at ffffffff80060652
    [exception RIP: acpi_processor_idle_simple+328]
    RIP: ffffffff801afbb5  RSP: ffff81023c7bfea8  RFLAGS: 00000246
    RAX: ffff81023c7bffd8  RBX: 0000000000000050  RCX: 0000000000000050
    RDX: 0000000000000050  RSI: 0000000000000000  RDI: 0000000000013880
    RBP: ffff81023c7bfee8   R8: 0000000000039746   R9: 0000000000000001
    R10: 000000004dd4f794  R11: ffffffff80065fed  R12: 0000000000402100
    R13: ffff81000906cea0  R14: 0000000000000000  R15: ffff81000906cea0
    ORIG_RAX: ffffffffffffff55  CS: 0010  SS: 0018
#18 [ffff81023c7bfea0] acpi_processor_idle_simple at ffffffff801afbab
#19 [ffff81023c7bfef0] cpu_idle at ffffffff8004b7bd

Comment 13 Quan Wenli 2011-05-19 09:15:08 UTC
Messages from serial control before host kernel crashes.

slab error in verify_redzone_free(): cache `size-64': memory outside object was overwritten
Call Trace:
 <IRQ>  [<ffffffff800330df>] cache_free_debugcheck+0x108/0x21a
 [<ffffffff8000ba6e>] kfree+0xce/0x261
 
[<ffffffff80033edd>] ip_output+0x2a8/0x2d7
 
[<ffffffff8026461e>] ip_push_pending_frames+0x3f3/0x461
[<ffffffff802781b2>] icmp_send+0x54a/0x5c0
 
[<ffffffff80264968>] ip_fragment+0x82/0x756
 
[<ffffffff8880af9d>] :bridge:br_dev_queue_push_xmit+0x0/0x200
 [<ffffffff8880fb48>] :bridge:br_nf_post_routing+0x17c/0x197
[<ffffffff800361a0>] nf_iterate+0x41/0x7d
 
[<ffffffff8880af9d>] :bridge:br_dev_queue_push_xmit+0x0/0x200
 [<ffffffff800590dc>] nf_hook_slow+0x58/0xbc
 
[<ffffffff8880af9d>] :bridge:br_dev_queue_push_xmit+0x0/0x200
 [<ffffffff8880b1dc>] :bridge:br_forward_finish+0x3f/0x51
 
[<ffffffff8880f9c4>] :bridge:br_nf_forward_finish+0xf7/0xff
 
[<ffffffff88810262>] :bridge:br_nf_forward_ip+0x150/0x160
 
[<ffffffff800361a0>] nf_iterate+0x41/0x7d
 
[<ffffffff8880b19d>] :bridge:br_forward_finish+0x0/0x51
[<ffffffff800590dc>] nf_hook_slow+0x58/0xbc
 
[<ffffffff8880b19d>] :bridge:br_forward_finish+0x0/0x51

[<ffffffff8880b246>] :bridge:__br_forward+0x58/0x9c
 
[<ffffffff8880be7c>] :bridge:br_handle_frame_finish+0x12b/0x1d6
 [<ffffffff88810037>] :bridge:br_nf_pre_routing_finish+0x2e9/0x2f8
 [<ffffffff8880fd4e>] :bridge:br_nf_pre_routing_finish+0x0/0x2f8
 [<ffffffff800590dc>] nf_hook_slow+0x58/0xbc
 
[<ffffffff8880fd4e>] :bridge:br_nf_pre_routing_finish+0x0/0x2f8
 [<ffffffff88810c34>] :bridge:br_nf_pre_routing+0x600/0x61c
 
[<ffffffff8024179e>] __netdev_alloc_skb+0x12/0x2d
 
[<ffffffff800361a0>] nf_iterate+0x41/0x7d
 
[<ffffffff8880bd51>] :bridge:br_handle_frame_finish+0x0/0x1d6
 [<ffffffff800590dc>] nf_hook_slow+0x58/0xbc
 
[<ffffffff8880bd51>] :bridge:br_handle_frame_finish+0x0/0x1d6
 [<ffffffff8880c095>] :bridge:br_handle_frame+0x16e/0x1a1

[<ffffffff80021e4a>] netif_receive_skb+0x387/0x4b0
 
[<ffffffff88289e00>] :ixgbe:ixgbe_clean_rx_irq+0x4bd/0x733
 
[<ffffffff8828d0e1>] :ixgbe:ixgbe_clean_rxtx_many+0xf5/0x244
 [<ffffffff8000d311>] net_rx_action+0xb6/0x1d0
 
[<ffffffff8001307a>] __do_softirq+0x94/0x152
 
[<ffffffff800613d0>] call_softirq+0x1c/0x28
 
[<ffffffff8007164c>] do_softirq+0x31/0x94
 
[<ffffffff80071612>] do_IRQ+0xfd/0x106
[<ffffffff8847ab1a>] :kvm:kvm_arch_vcpu_ioctl_run+0x474/0x682
 [<ffffffff80060652>] ret_from_intr+0x0/0xf
 <EOI>  
[<ffffffff8000c1d1>] __down_read_trylock+0x15/0x44
 
[<ffffffff8847ab1a>] :kvm:kvm_arch_vcpu_ioctl_run+0x474/0x682
 [<ffffffff8847ab5a>] :kvm:kvm_arch_vcpu_ioctl_run+0x4b4/0x682
 [<ffffffff8847ab1a>] :kvm:kvm_arch_vcpu_ioctl_run+0x474/0x682
 [<ffffffff88475f26>] :kvm:kvm_vcpu_ioctl+0xf8/0x44e
 
[<ffffffff80092da9>] default_wake_function+0x0/0xe
 
[<ffffffff800444cd>] do_ioctl+0x21/0x6b
 
[<ffffffff80031c56>] vfs_ioctl+0x45d/0x4bf
 
[<ffffffff800c4127>] audit_syscall_entry+0x1a8/0x1d3
 
[<ffffffff8004eafa>] sys_ioctl+0x59/0x78
 
[<ffffffff800602a6>] tracesys+0xd5/0xdf

ffff81022f6fe818: redzone 1:0x170fc2a5, redzone 2:0x6500a8c0.

slab eMemory for crash kernel (0x0 to 0x0) notwithin permissible range

Warning: pci_mmcfg_init marking 256MB space uncacheable.

PCI: Failed to allocate mem resource #12:100000@0 for 0000:0f:00.0

PCI: Failed to allocate mem resource #15:100000@0 for 0000:0f:00.0

PCI: Failed to allocate mem resource #12:100000@0 for 0000:0f:00.1

PCI: Failed to allocate mem resource #15:100000@0 for 0000:0f:00.1

Comment 14 Michael S. Tsirkin 2011-05-19 10:01:41 UTC
note:

net.bridge.bridge-nf-filter-vlan-tagged = 0
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0

makes the crash go away

so it's a netfilter thing

Comment 15 Michael S. Tsirkin 2011-05-19 10:09:58 UTC
ok so clearly ixgbe+bridge netfilter corrupt slab here.
assigning to kernel people to look at, and
we'll need a separate bz to track the low
throughput issue which might be a separate issue,
make it depend on this one.

Comment 16 Herbert Xu 2011-05-20 07:28:35 UTC
You need to disable RSC (hardware LRO).  It should be controlled by the ethtool lro flag (not to be confused with the gro flag).

However, we may have removed the lro option from some versions of ethtool, so if you don't have an lro toggle, you can do it by directly modifying the coalesce parameters through ethtool -C.

When RSC is enabled, you will observe coalesced (i.e., larger than MTU) packets arriving through ixgbe using tcpdump, with it and GRO turned off, you should not get anything above the MTU.

Note that you only have to turn GRO off to verify that RSC is really off, you can turn GRO back on again after that.

Comment 17 Quan Wenli 2011-05-20 09:39:09 UTC
(In reply to comment #16)
> You need to disable RSC (hardware LRO).  It should be controlled by the ethtool
> lro flag (not to be confused with the gro flag).
> 
> However, we may have removed the lro option from some versions of ethtool, so
> if you don't have an lro toggle, you can do it by directly modifying the
> coalesce parameters through ethtool -C.
> 
> When RSC is enabled, you will observe coalesced (i.e., larger than MTU) packets
> arriving through ixgbe using tcpdump, with it and GRO turned off, you should
> not get anything above the MTU.
> 
> Note that you only have to turn GRO off to verify that RSC is really off, you
> can turn GRO back on again after that.

could be confirmed RSC is enalbed by counts of hw_rsc_aggregated/hw_rsc_flushed.
#ethtool -S eth2 | grep rsc
     hw_rsc_aggregated: 437
     hw_rsc_flushed: 340242

but I still am not sure how to disable hw rsc by checking ethtool command and document (ftp://supermicro.com/CDR-APLUS2_1.12_for_A+_AMD_SP5100_platform/Intel/LAN/v15.8/PROXGB/DOCS/LINUX/ixgb.htm).from that document ,will 'modprobe ixgbe InterruptThrottleRate=0' disabled HW RSC ? could you give any hints? 

the following is the output of 'ethool -c eth2'
ethtool -c eth2
Coalesce parameters for eth2:
Adaptive RX: off  TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 1
rx-frames: 0
rx-usecs-irq: 0
rx-frames-irq: 0

tx-usecs: 0
tx-frames: 0
tx-usecs-irq: 0
tx-frames-irq: 512

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

Comment 18 Herbert Xu 2011-05-20 10:07:35 UTC
Try setting rx-usecs to 0.

Comment 19 Quan Wenli 2011-05-20 10:48:40 UTC
The crashes go away by disabled rsc with 'ethtool -C eth2 rx-usecs 0' and  with enabled netfilter on bridge , and also get speedup of throughput.

#netperf -H 192.168.0.101
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.101 (192.168.0.101) port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.01    2902.13

Comment 25 Michael S. Tsirkin 2011-06-02 12:26:34 UTC
*** Bug 706051 has been marked as a duplicate of this bug. ***

Comment 35 Jarod Wilson 2011-06-15 15:48:34 UTC
Patch(es) available in kernel-2.6.18-268.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5
Detailed testing feedback is always welcomed.

Comment 36 Quan Wenli 2011-06-16 05:11:39 UTC
(In reply to comment #35)
> Patch(es) available in kernel-2.6.18-268.el5
> You can download this test kernel (or newer) from
> http://people.redhat.com/jwilson/el5
> Detailed testing feedback is always welcomed.

The crashes go away with test kernel-2.6.18-268.el5 with ixgbe driver default setting.

Steps :
1.boot guest on 10G bridge 
/usr/libexec/qemu-kvm  -name 'vm1'  -drive
file=/root/RHEL-Server-5.7-64-virtio.raw,index=0,if=virtio,boot=on,media=disk,cache=none,format=raw
-net nic,vlan=0,model=virtio,macaddr='9a:3b:dd:52:d9:d7' -net
tap,vlan=0,script=/etc/qemu-ifup -m 4096 -smp 2,cores=1,threads=1,sockets=2 
-cpu qemu64,+sse2   -vnc :0 -rtc-td-hack  -boot c  -usbdevice tablet
-no-kvm-pit-reinjection
2.runing netserver in the guest
3.running netperf on the ex-host
#netperf -H 192.168.0.13 -l 60 (guest_ip )
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.13 (192.168.0.13) port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    60.00    5784.94

Comment 38 juzhang 2011-06-17 01:59:10 UTC
According to comment36,mark this issue status as verified

Comment 39 errata-xmlrpc 2011-07-21 10:07:56 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1065.html


Note You need to log in before you can comment on or make changes to this bug.