Bug 826101 - [Mellanox 6.3] Kernel Panic under heavy traffic load.
[Mellanox 6.3] Kernel Panic under heavy traffic load.
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.3
x86_64 Linux
high Severity high
: rc
: ---
Assigned To: Red Hat Kernel Manager
Network QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-29 11:30 EDT by Yevgeny Petrilin
Modified: 2012-07-13 10:47 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-07-13 10:47:28 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Yevgeny Petrilin 2012-05-29 11:30:28 EDT
Description of problem:

Kernel panic occures when running multiple (over 100) netperf UDP_RR/TCP_RR streams over 10G NICs.
Same issue was reproduced with different NIC vendors (mlx4_en and ixgb) drivers.
The panic trace:


console [netcon0] enabled
netconsole: network logging started
RTNL: assertion failed at net/ipv4/igmp.c (1211)
ADDRCONF(NETDEV_UP): p7p1: link is not ready
8021q: adding VLAN 0 to HW filter on device p7p1
ixgbe 0000:06:00.0: p7p1: detected SFP+: 5
ixgbe 0000:06:00.0: p7p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
ADDRCONF(NETDEV_CHANGE): p7p1: link becomes ready
network todo '????{?????' but state -2119098720
general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/net/em1/broadcast
CPU 10
Modules linked in: netconsole nfs lockd fscache nfs_acl auth_rpcgss autofs4 sunrpc target_core_iblock target_core_file target_core_pscsi target_core_mod configfs bnx2fc cnic uio fcoe libfcoe libfc 8021q scsi_transport_fc scsi_tgt garp stp llc ipv6 uinput acpi_pad power_meter sg dcdbas microcode sb_edac edac_core iTCO_wdt iTCO_vendor_support shpchp ixgbe dca mdio tg3 ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif ahci megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: netconsole]

Pid: 12575, comm: netserver Not tainted 2.6.32-262.el6.x86_64 #1 Dell Inc. PowerEdge R720/061P35
RIP: 0010:[<ffffffff81439685>]  [<ffffffff81439685>] netdev_run_todo+0x25/0x220
RSP: 0018:ffff88100679db48  EFLAGS: 00010296
RAX: dead000000100100 RBX: ffff8810051dd680 RCX: 0000000000000000
RDX: 00000000fffffffc RSI: 0000000000000014 RDI: ffff8810051dd680
RBP: ffff88100679db88 R08: 0000000000000003 R09: ffff8810051e70a1
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000014
R13: ffff8810051dd680 R14: ffff88100679db48 R15: ffff88100679dbd0
FS:  00007fc41f127700(0000) GS:ffff880061940000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000039cfe9b6b0 CR3: 00000010048f7000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process netserver (pid: 12575, threadinfo ffff88100679c000, task ffff881006bb8ae0)
Stack:
 dead000000100100 ffff88100679dbd0 ffff88100679db88 ffff8810051dd680
<d> 0000000000000014 ffff8810051dd680 ffff881006b4c800 ffff88100679dbd0
<d> ffff88100679dba8 ffffffff8144675a 00000000000000d0 ffff881008d24800
Call Trace:
 [<ffffffff8144675a>] rtnetlink_rcv+0x2a/0x40
 [<ffffffff81461a66>] netlink_unicast+0x2e6/0x300
 [<ffffffff814623f0>] netlink_sendmsg+0x200/0x2e0
 [<ffffffff814260e3>] sock_sendmsg+0x123/0x150
 [<ffffffff81091f90>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81161bf7>] ? cache_grow+0x217/0x320
 [<ffffffff814265b9>] sys_sendto+0x139/0x190
 [<ffffffff81177cd7>] ? fd_install+0x47/0x90
 [<ffffffff815002ee>] ? do_page_fault+0x3e/0xa0
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
Code: ff c9 c3 0f 1f 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 18 0f 1f 44 00 00 48 8b 05 83 88 6d 00 4c 8d 75 c0 48 89 45 c0 <4c> 89 70 08 48 8b 05 78 88 6d 00 48 89 45 c8 4c 89 30 48 c7 05
RIP  [<ffffffff81439685>] netdev_run_todo+0x25/0x220
 RSP <ffff88100679db48>
general protection fault: 0000 [#2] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/net/em1/broadcast
CPU 4
Modules linked in: netconsole nfs lockd
---[ end trace d09fc193952f15d7 ]---
 fscache nfs_acl
Kernel panic - not syncing: Fatal exception
 auth_rpcgssPid: 12575, comm: netserver Tainted: G      D    ---------------    2.6.32-262.el6.x86_64 #1
 autofs4Call Trace:
 sunrpc target_core_iblock target_core_file [<ffffffff814fa1a0>] ? panic+0xa0/0x168
 target_core_pscsi target_core_mod [<ffffffff814fe334>] ? oops_end+0xe4/0x100
 configfs bnx2fc [<ffffffff8100f26b>] ? die+0x5b/0x90
 cnic uio [<ffffffff814fdea2>] ? do_general_protection+0x152/0x160
 fcoe libfcoe [<ffffffff814fd675>] ? general_protection+0x25/0x30
 libfc 8021q scsi_transport_fc [<ffffffff81439685>] ? netdev_run_todo+0x25/0x220
 scsi_tgt garp [<ffffffff81446770>] ? rtnetlink_rcv_msg+0x0/0x220
 stp llc ipv6 [<ffffffff8144675a>] ? rtnetlink_rcv+0x2a/0x40
 uinput acpi_pad [<ffffffff81461a66>] ? netlink_unicast+0x2e6/0x300
 power_meter sg [<ffffffff814623f0>] ? netlink_sendmsg+0x200/0x2e0
 dcdbas microcode [<ffffffff814260e3>] ? sock_sendmsg+0x123/0x150
 sb_edac edac_core [<ffffffff81091f90>] ? autoremove_wake_function+0x0/0x40
 iTCO_wdt iTCO_vendor_support [<ffffffff81161bf7>] ? cache_grow+0x217/0x320
 shpchp ixgbe dca [<ffffffff814265b9>] ? sys_sendto+0x139/0x190
 mdio tg3 ext3 [<ffffffff81177cd7>] ? fd_install+0x47/0x90
 jbd mbcache [<ffffffff815002ee>] ? do_page_fault+0x3e/0xa0
 sr_mod cdrom [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
 sd_mod crc_t10dif ahci megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: netconsole]

Pid: 12576, comm: netserver Tainted: G      D    ---------------    2.6.32-262.el6.x86_64 #1 Dell Inc. PowerEdge R720/061P35
RIP: 0010:[<ffffffff81439685>]  [<ffffffff81439685>] netdev_run_todo+0x25/0x220
RSP: 0018:ffff8810067afb48  EFLAGS: 00010296
RAX: dead000000100100 RBX: ffff880ff5872b80 RCX: 0000000000000000
RDX: 00000000fffffffc RSI: 0000000000000014 RDI: ffff880ff5872b80
RBP: ffff8810067afb88 R08: 0000000000000003 R09: ffff880ff58760a1
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000014
R13: ffff880ff5872b80 R14: ffff8810067afb48 R15: ffff8810067afbd0
FS:  00007fc41f127700(0000) GS:ffff880061880000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000039cfe9b6b0 CR3: 0000000ff6680000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process netserver (pid: 12576, threadinfo ffff8810067ae000, task ffff881003134080)
Stack:
 dead000000100100 ffff8810067afbd0 ffff8810067afb88 ffff880ff5872b80
<d> 0000000000000014 ffff880ff5872b80 ffff88100569cc00 ffff8810067afbd0
<d> ffff8810067afba8 ffffffff8144675a 00000000000000d0 ffff881008d24800
Call Trace:
 [<ffffffff8144675a>] rtnetlink_rcv+0x2a/0x40
 [<ffffffff81461a66>] netlink_unicast+0x2e6/0x300
 [<ffffffff814623f0>] netlink_sendmsg+0x200/0x2e0
 [<ffffffff814260e3>] sock_sendmsg+0x123/0x150
 [<ffffffff81091f90>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81161bf7>] ? cache_grow+0x217/0x320
 [<ffffffff814265b9>] sys_sendto+0x139/0x190
 [<ffffffff81177cd7>] ? fd_install+0x47/0x90
 [<ffffffff815002ee>] ? do_page_fault+0x3e/0xa0
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
Code: ff c9 c3 0f 1f 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 18 0f 1f 44 00 00 48 8b 05 83 88 6d 00 4c 8d 75 c0 48 89 45 c0 <4c> 89 70 08 48 8b 05 78 88 6d 00 48 89 45 c8 4c 89 30 48 c7 05
RIP  [<ffffffff81439685>] netdev_run_todo+0x25/0x220
 RSP <ffff8810067afb48>
general protection fault: 0000 [#3] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/net/em1/broadcast
CPU 11
Modules linked in: netconsole nfs lockd fscache nfs_acl auth_rpcgss autofs4 sunrpc target_core_iblock target_core_file target_core_pscsi target_core_mod configfs bnx2fc cnic uio fcoe libfcoe libfc 8021q scsi_transport_fc scsi_tgt garp stp llc ipv6 uinput acpi_pad power_meter sg dcdbas microcode sb_edac edac_core iTCO_wdt iTCO_vendor_support shpchp ixgbe dca mdio tg3 ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif ahci megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: netconsole]

Pid: 12577, comm: netserver Tainted: G      D    ---------------    2.6.32-262.el6.x86_64 #1 Dell Inc. PowerEdge R720/061P35
RIP: 0010:[<ffffffff81439685>]  [<ffffffff81439685>] netdev_run_todo+0x25/0x220
RSP: 0018:ffff881006bc1b48  EFLAGS: 00010296
RAX: dead000000100100 RBX: ffff8810075560c0 RCX: 0000000000000000
RDX: 00000000fffffffc RSI: 0000000000000014 RDI: ffff8810075560c0
RBP: ffff881006bc1b88 R08: 0000000000000003 R09: ffff8810075660a1
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000014
R13: ffff8810075560c0 R14: ffff881006bc1b48 R15: ffff881006bc1bd0
FS:  00007fc41f127700(0000) GS:ffff880061960000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000039cfe9b6b0 CR3: 0000001006bf1000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process netserver (pid: 12577, threadinfo ffff881006bc0000, task ffff88100263eae0)
Stack:
 dead000000100100 ffff881006bc1bd0 ffff881006bc1b88 ffff8810075560c0
<d> 0000000000000014 ffff8810075560c0 ffff880ff5d63c00 ffff881006bc1bd0
<d> ffff881006bc1ba8 ffffffff8144675a 00000000000000d0 ffff881008d24800
Call Trace:
 [<ffffffff8144675a>] rtnetlink_rcv+0x2a/0x40
 [<ffffffff81461a66>] netlink_unicast+0x2e6/0x300
 [<ffffffff814623f0>] netlink_sendmsg+0x200/0x2e0
 [<ffffffff814260e3>] sock_sendmsg+0x123/0x150
 [<ffffffff81091f90>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81161bf7>] ? cache_grow+0x217/0x320
 [<ffffffff814265b9>] sys_sendto+0x139/0x190
 [<ffffffff81177cd7>] ? fd_install+0x47/0x90
 [<ffffffff815002ee>] ? do_page_fault+0x3e/0xa0
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
Code: ff c9 c3 0f 1f 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 18 0f 1f 44 00 00 48 8b 05 83 88 6d 00 4c 8d 75 c0 48 89 45 c0 <4c> 89 70 08 48 8b 05 78 88 6d 00 48 89 45 c8 4c 89 30 48 c7 05
RIP  [<ffffffff81439685>] netdev_run_todo+0x25/0x220
 RSP <ffff881006bc1b48>

Version-Release number of selected component (if applicable):

The problem happens with RH6.3 snap2
[root@mtlprf045 scripts]# uname -r
2.6.32-262.el6.x86_64
[root@mtlprf045 scripts]# cat /etc/issue
Red Hat Enterprise Linux Server release 6.3 Beta (Santiago)
Kernel \r on an \m



How reproducible:
Using multiple netperf connections

Steps to Reproduce:
1. run netserver on server side
2. run 100 instances of "netperf -H <server_ip> -t UDP_RR" on client side
3.
  
Actual results:
Kernel panic on server side

Expected results:


Additional info:
Comment 3 Yevgeny Petrilin 2012-06-03 07:36:14 EDT
Hello,
The issue doesn't exist with snapshot 4 (kernel : 2.6.32-272.el6.x86_64)

Thanks,
Yevgeny
Comment 4 RHEL Product and Program Management 2012-07-10 04:28:23 EDT
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 5 RHEL Product and Program Management 2012-07-10 19:50:05 EDT
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development.  This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.
Comment 6 Larry Troan 2012-07-13 10:47:28 EDT
Per comment #3, looks like this was fixed in 6.3 (beta:snapshot 4).
Closing CURRENTRELEASE.

Note You need to log in before you can comment on or make changes to this bug.