Bug 592449

Summary: [Intel 6.0 Bug] Ethtool diagnostics on 82576/82580 devices causes kernel panic
Product: Red Hat Enterprise Linux 6 Reporter: Jeff Pieper <jeffrey.e.pieper>
Component: kernelAssignee: Stefan Assmann <sassmann>
Status: CLOSED CURRENTRELEASE QA Contact: Petr Beňas <pbenas>
Severity: high Docs Contact:
Priority: low    
Version: 6.0CC: agospoda, bobby.suber, bzeranski, cward, ddugger, dhoward, dnelson, emil.s.tantilov, fleitner, gasmith, haicheng.li, jane.lv, jeffrey.e.pieper, jesse.brandeburg, jjarvis, jlv, john.ronciak, jpirko, jvillalo, jwest, keve.a.gabbert, ltroan, luyu, martinez, martin.wilck, notting, pbenas, peterm, plyons, pm-eus, pm-rhel, pstehlik, qcai, rdoty, rlerch, rpacheco, sandy.garza, sputhenp, syeghiay, tao
Target Milestone: rcKeywords: Reopened
Target Release: 6.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.32-25.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-11-10 20:54:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 580574    

Description Jeff Pieper 2010-05-14 21:15:23 UTC
When ethtool diagnostics is ran on 82576/82580 devices, I'm seeing a kernel panic. This behavior is only seen on x86_64, not i386. Tested with both Snap2 and Snap3. To reproduce, do 'ethtool -t ethx', and system will panic.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000158
IP: [<ffffffffa02eca57>] igb_alloc_rx_buffers_adv+0x207/0x400 [igb]
PGD 1e6bd0067 PUD 1e2386067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
CPU 2 
Modules linked in: igb(U) nfsd(U) lockd(U) nfs_acl(U) auth_rpcgss(U) exportfs(U) autofs4(U) sunrpc(U) cpufreq_ondemand(U) acpi_cpufreq(U) freq_table(U) ipv6(U) dm_mirror(U) dm_region_hash(U) dm_log(U) uinput(U) sr_mod(U) mdio(U) ioatdma(U) i2c_i801(U) sg(U) i2c_core(U) iTCO_wdt(U) dca(U) iTCO_vendor_support(U) cdrom(U) ext3(U) jbd(U) mbcache(U) sd_mod(U) crc_t10dif(U) ata_generic(U) pata_acpi(U) usb_storage(U) ata_piix(U) dm_mod(U) [last unloaded: igb]
Pid: 3297, comm: ethtool Not tainted 2.6.32-23.el6.x86_64 #1 S5520HC
RIP: 0010:[<ffffffffa02eca57>]  [<ffffffffa02eca57>] igb_alloc_rx_buffers_adv+0x207/0x400 [igb]
RSP: 0018:ffff8801e554f938  EFLAGS: 00010246
RAX: 0000000000000080 RBX: ffffc900127af9d8 RCX: 0000000000000001
RDX: ffffc900127af000 RSI: 0000000000000040 RDI: ffff880366615098
RBP: ffff8801e554f9a8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffff880366615098
R13: 000000000000003f R14: 000000000000003f R15: ffff8801e6222000
FS:  00007f5c9ca6f700(0000) GS:ffff880028220000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000158 CR3: 00000001e55c0000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ethtool (pid: 3297, threadinfo ffff8801e554e000, task ffff8801e7a0e100)
Stack:
 0000000000000000 0000080268a1a090 ffff880366614000 0000000000000800
<0> ffffc900127ab3d8 0000000000000080 ffff880366615098 0000080066615008
<0> ffff8801e554f988 ffff8801e7c32964 ffff880366615008 ffff880366615098
Call Trace:
 [<ffffffffa02f3a5b>] igb_diag_test+0xa5b/0x10c0 [igb]
 [<ffffffff8114e5de>] ? cache_alloc_refill+0x15e/0x240
 [<ffffffff81408405>] dev_ethtool+0xbc5/0x18c0
 [<ffffffff81126f0d>] ? zone_statistics+0x7d/0xa0
 [<ffffffff81126f0d>] ? zone_statistics+0x7d/0xa0
 [<ffffffff8110d751>] ? get_page_from_freelist+0x3d1/0x820
 [<ffffffff8110dc98>] ? __alloc_pages_nodemask+0xf8/0x6d0
 [<ffffffff8110791e>] ? find_get_page+0x1e/0xa0
 [<ffffffff8110944e>] ? filemap_fault+0xbe/0x530
 [<ffffffff81158f08>] ? __mem_cgroup_try_charge+0x58/0x1f0
 [<ffffffff81404c30>] ? __dev_get_by_name+0xa0/0xd0
 [<ffffffff81405bf8>] dev_ioctl+0x358/0x5d0
 [<ffffffff813f004d>] sock_ioctl+0x9d/0x280
 [<ffffffff811769d2>] vfs_ioctl+0x22/0xa0
 [<ffffffff8114f0b2>] ? kmem_cache_alloc+0x182/0x190
 [<ffffffff81176b74>] do_vfs_ioctl+0x84/0x580
 [<ffffffff8112fe5e>] ? handle_mm_fault+0x1ee/0x2b0
 [<ffffffff811770f1>] sys_ioctl+0x81/0xa0
 [<ffffffff81013132>] system_call_fastpath+0x16/0x1b
Code: 0f 84 06 02 00 00 41 83 ed 01 0f ae f8 49 8b 44 24 40 44 89 28 48 83 c4 48 5b 41 5c 41 5d 41 5e 41 5f c9 c3 48 8b 45 b8 45 31 d2 <48> 8b 80 d8 00 00 00 48 89 45 c0 49 8b 44 24 10 48 85 c0 0f 84 
RIP  [<ffffffffa02eca57>] igb_alloc_rx_buffers_adv+0x207/0x400 [igb]
 RSP <ffff8801e554f938>
CR2: 0000000000000158
---[ end trace 82328d54f17ad8db ]---
Kernel panic - not syncing: Fatal exception
Pid: 3297, comm: ethtool Tainted: G      D    2.6.32-23.el6.x86_64 #1
Call Trace:
 [<ffffffff814c6944>] panic+0x78/0x137
 [<ffffffff814ca8fc>] oops_end+0xdc/0xf0
 [<ffffffff8104226b>] no_context+0xfb/0x260
 [<ffffffff810424f5>] __bad_area_nosemaphore+0x125/0x1e0
 [<ffffffff8104261e>] bad_area+0x4e/0x60
 [<ffffffff814cc476>] do_page_fault+0x3d6/0x3e0
 [<ffffffff814c9c55>] page_fault+0x25/0x30
 [<ffffffffa02eca57>] ? igb_alloc_rx_buffers_adv+0x207/0x400 [igb]
 [<ffffffff8125f20c>] ? is_swiotlb_buffer+0x3c/0x50
 [<ffffffffa02f3a5b>] igb_diag_test+0xa5b/0x10c0 [igb]
 [<ffffffff8114e5de>] ? cache_alloc_refill+0x15e/0x240
 [<ffffffff81408405>] dev_ethtool+0xbc5/0x18c0
 [<ffffffff81126f0d>] ? zone_statistics+0x7d/0xa0
 [<ffffffff81126f0d>] ? zone_statistics+0x7d/0xa0
 [<ffffffff8110d751>] ? get_page_from_freelist+0x3d1/0x820
 [<ffffffff8110dc98>] ? __alloc_pages_nodemask+0xf8/0x6d0
 [<ffffffff8110791e>] ? find_get_page+0x1e/0xa0
 [<ffffffff8110944e>] ? filemap_fault+0xbe/0x530
 [<ffffffff81158f08>] ? __mem_cgroup_try_charge+0x58/0x1f0
 [<ffffffff81404c30>] ? __dev_get_by_name+0xa0/0xd0
 [<ffffffff81405bf8>] dev_ioctl+0x358/0x5d0
 [<ffffffff813f004d>] sock_ioctl+0x9d/0x280

Comment 2 RHEL Program Management 2010-05-17 15:45:20 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 4 Bill Nottingham 2010-05-17 22:48:41 UTC
*** Bug 592300 has been marked as a duplicate of this bug. ***

Comment 5 Stefan Assmann 2010-05-18 12:04:48 UTC
should already be fixed in snap4 and later. Please retest with a later snapshot and reopen if problem persists.

Comment 9 Jeff Pieper 2010-05-24 17:08:32 UTC
Retested with Snap4 and can no longer reproduce.

Comment 10 Petr Beňas 2010-07-08 12:02:45 UTC
VERIFIED.
Reprodeced on 2.6.32-23.(Snap2)
[root@intel-s3ea2-02 ~]# uname -r
2.6.32-23.el6.x86_64
[root@intel-s3ea2-02 ~]# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:30:48:C6:34:BA  
          inet addr:10.16.65.160  Bcast:10.16.71.255  Mask:255.255.248.0
          inet6 addr: fec0:0:a10:4000:230:48ff:fec6:34ba/64 Scope:Site
          inet6 addr: fe80::230:48ff:fec6:34ba/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3149 errors:0 dropped:0 overruns:3 frame:0
          TX packets:308 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:270891 (264.5 KiB)  TX bytes:218984 (213.8 KiB)
          Memory:fbba0000-fbbc0000 

[root@intel-s3ea2-02 ~]# lspci | grep Ethernet
01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
01:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
[root@intel-s3ea2-02 ~]# ethtool -t eth0
BUG: unable to handle kernel NULL pointer dereference at 0000000000000158
IP: [<ffffffffa025fa57>] igb_alloc_rx_buffers_adv+0x207/0x400 [igb]
PGD 270bf2067 PUD 273cf0067 PMD 0 
Oops: 0000 [#1] SMP 

On 2.6.32-24 reached errors and the system hanged, did not reach panic. But this test was
performed on different machine.
RHEL6.0-Snapshot-3_nfs-Server-x86_64
[root@hp-sl2x160zg6-01 ~]# uname -r
2.6.32-24.el6.x86_64
[root@hp-sl2x160zg6-01 ~]# lspci | grep Ethernet
05:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
05:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
[root@hp-sl2x160zg6-01 ~]# ethtool -t eth0
Error sending SOL data: FAIL
SOL session closed by BMC
Error: Unable to establish IPMI v2 / RMCP+ session
Error: No response de-activating SOL payload

Verified on 2.6.32(snap4)
[root@intel-s3ea2-02 ~]# uname -r
2.6.32-25.el6.x86_64
[root@intel-s3ea2-02 ~]# ethtool -t eth0
The test result is PASS
The test extra info:
Register test  (offline)         0
Eeprom test    (offline)         0
Interrupt test (offline)         0
Loopback test  (offline)         0
Link test   (on/offline)         0

Comment 11 releng-rhel@redhat.com 2010-11-10 20:54:28 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.