Bug 592449 - [Intel 6.0 Bug] Ethtool diagnostics on 82576/82580 devices causes kernel panic
[Intel 6.0 Bug] Ethtool diagnostics on 82576/82580 devices causes kernel panic
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.0
All Linux
low Severity high
: rc
: 6.0
Assigned To: Stefan Assmann
Petr Beňas
: Reopened
: 592300 (view as bug list)
Depends On:
Blocks: 580574
  Show dependency treegraph
 
Reported: 2010-05-14 17:15 EDT by Jeff Pieper
Modified: 2015-01-04 17:59 EST (History)
40 users (show)

See Also:
Fixed In Version: kernel-2.6.32-25.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-11-10 15:54:28 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jeff Pieper 2010-05-14 17:15:23 EDT
When ethtool diagnostics is ran on 82576/82580 devices, I'm seeing a kernel panic. This behavior is only seen on x86_64, not i386. Tested with both Snap2 and Snap3. To reproduce, do 'ethtool -t ethx', and system will panic.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000158
IP: [<ffffffffa02eca57>] igb_alloc_rx_buffers_adv+0x207/0x400 [igb]
PGD 1e6bd0067 PUD 1e2386067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
CPU 2 
Modules linked in: igb(U) nfsd(U) lockd(U) nfs_acl(U) auth_rpcgss(U) exportfs(U) autofs4(U) sunrpc(U) cpufreq_ondemand(U) acpi_cpufreq(U) freq_table(U) ipv6(U) dm_mirror(U) dm_region_hash(U) dm_log(U) uinput(U) sr_mod(U) mdio(U) ioatdma(U) i2c_i801(U) sg(U) i2c_core(U) iTCO_wdt(U) dca(U) iTCO_vendor_support(U) cdrom(U) ext3(U) jbd(U) mbcache(U) sd_mod(U) crc_t10dif(U) ata_generic(U) pata_acpi(U) usb_storage(U) ata_piix(U) dm_mod(U) [last unloaded: igb]
Pid: 3297, comm: ethtool Not tainted 2.6.32-23.el6.x86_64 #1 S5520HC
RIP: 0010:[<ffffffffa02eca57>]  [<ffffffffa02eca57>] igb_alloc_rx_buffers_adv+0x207/0x400 [igb]
RSP: 0018:ffff8801e554f938  EFLAGS: 00010246
RAX: 0000000000000080 RBX: ffffc900127af9d8 RCX: 0000000000000001
RDX: ffffc900127af000 RSI: 0000000000000040 RDI: ffff880366615098
RBP: ffff8801e554f9a8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffff880366615098
R13: 000000000000003f R14: 000000000000003f R15: ffff8801e6222000
FS:  00007f5c9ca6f700(0000) GS:ffff880028220000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000158 CR3: 00000001e55c0000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ethtool (pid: 3297, threadinfo ffff8801e554e000, task ffff8801e7a0e100)
Stack:
 0000000000000000 0000080268a1a090 ffff880366614000 0000000000000800
<0> ffffc900127ab3d8 0000000000000080 ffff880366615098 0000080066615008
<0> ffff8801e554f988 ffff8801e7c32964 ffff880366615008 ffff880366615098
Call Trace:
 [<ffffffffa02f3a5b>] igb_diag_test+0xa5b/0x10c0 [igb]
 [<ffffffff8114e5de>] ? cache_alloc_refill+0x15e/0x240
 [<ffffffff81408405>] dev_ethtool+0xbc5/0x18c0
 [<ffffffff81126f0d>] ? zone_statistics+0x7d/0xa0
 [<ffffffff81126f0d>] ? zone_statistics+0x7d/0xa0
 [<ffffffff8110d751>] ? get_page_from_freelist+0x3d1/0x820
 [<ffffffff8110dc98>] ? __alloc_pages_nodemask+0xf8/0x6d0
 [<ffffffff8110791e>] ? find_get_page+0x1e/0xa0
 [<ffffffff8110944e>] ? filemap_fault+0xbe/0x530
 [<ffffffff81158f08>] ? __mem_cgroup_try_charge+0x58/0x1f0
 [<ffffffff81404c30>] ? __dev_get_by_name+0xa0/0xd0
 [<ffffffff81405bf8>] dev_ioctl+0x358/0x5d0
 [<ffffffff813f004d>] sock_ioctl+0x9d/0x280
 [<ffffffff811769d2>] vfs_ioctl+0x22/0xa0
 [<ffffffff8114f0b2>] ? kmem_cache_alloc+0x182/0x190
 [<ffffffff81176b74>] do_vfs_ioctl+0x84/0x580
 [<ffffffff8112fe5e>] ? handle_mm_fault+0x1ee/0x2b0
 [<ffffffff811770f1>] sys_ioctl+0x81/0xa0
 [<ffffffff81013132>] system_call_fastpath+0x16/0x1b
Code: 0f 84 06 02 00 00 41 83 ed 01 0f ae f8 49 8b 44 24 40 44 89 28 48 83 c4 48 5b 41 5c 41 5d 41 5e 41 5f c9 c3 48 8b 45 b8 45 31 d2 <48> 8b 80 d8 00 00 00 48 89 45 c0 49 8b 44 24 10 48 85 c0 0f 84 
RIP  [<ffffffffa02eca57>] igb_alloc_rx_buffers_adv+0x207/0x400 [igb]
 RSP <ffff8801e554f938>
CR2: 0000000000000158
---[ end trace 82328d54f17ad8db ]---
Kernel panic - not syncing: Fatal exception
Pid: 3297, comm: ethtool Tainted: G      D    2.6.32-23.el6.x86_64 #1
Call Trace:
 [<ffffffff814c6944>] panic+0x78/0x137
 [<ffffffff814ca8fc>] oops_end+0xdc/0xf0
 [<ffffffff8104226b>] no_context+0xfb/0x260
 [<ffffffff810424f5>] __bad_area_nosemaphore+0x125/0x1e0
 [<ffffffff8104261e>] bad_area+0x4e/0x60
 [<ffffffff814cc476>] do_page_fault+0x3d6/0x3e0
 [<ffffffff814c9c55>] page_fault+0x25/0x30
 [<ffffffffa02eca57>] ? igb_alloc_rx_buffers_adv+0x207/0x400 [igb]
 [<ffffffff8125f20c>] ? is_swiotlb_buffer+0x3c/0x50
 [<ffffffffa02f3a5b>] igb_diag_test+0xa5b/0x10c0 [igb]
 [<ffffffff8114e5de>] ? cache_alloc_refill+0x15e/0x240
 [<ffffffff81408405>] dev_ethtool+0xbc5/0x18c0
 [<ffffffff81126f0d>] ? zone_statistics+0x7d/0xa0
 [<ffffffff81126f0d>] ? zone_statistics+0x7d/0xa0
 [<ffffffff8110d751>] ? get_page_from_freelist+0x3d1/0x820
 [<ffffffff8110dc98>] ? __alloc_pages_nodemask+0xf8/0x6d0
 [<ffffffff8110791e>] ? find_get_page+0x1e/0xa0
 [<ffffffff8110944e>] ? filemap_fault+0xbe/0x530
 [<ffffffff81158f08>] ? __mem_cgroup_try_charge+0x58/0x1f0
 [<ffffffff81404c30>] ? __dev_get_by_name+0xa0/0xd0
 [<ffffffff81405bf8>] dev_ioctl+0x358/0x5d0
 [<ffffffff813f004d>] sock_ioctl+0x9d/0x280
Comment 2 RHEL Product and Program Management 2010-05-17 11:45:20 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.
Comment 4 Bill Nottingham 2010-05-17 18:48:41 EDT
*** Bug 592300 has been marked as a duplicate of this bug. ***
Comment 5 Stefan Assmann 2010-05-18 08:04:48 EDT
should already be fixed in snap4 and later. Please retest with a later snapshot and reopen if problem persists.
Comment 9 Jeff Pieper 2010-05-24 13:08:32 EDT
Retested with Snap4 and can no longer reproduce.
Comment 10 Petr Beňas 2010-07-08 08:02:45 EDT
VERIFIED.
Reprodeced on 2.6.32-23.(Snap2)
[root@intel-s3ea2-02 ~]# uname -r
2.6.32-23.el6.x86_64
[root@intel-s3ea2-02 ~]# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:30:48:C6:34:BA  
          inet addr:10.16.65.160  Bcast:10.16.71.255  Mask:255.255.248.0
          inet6 addr: fec0:0:a10:4000:230:48ff:fec6:34ba/64 Scope:Site
          inet6 addr: fe80::230:48ff:fec6:34ba/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3149 errors:0 dropped:0 overruns:3 frame:0
          TX packets:308 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:270891 (264.5 KiB)  TX bytes:218984 (213.8 KiB)
          Memory:fbba0000-fbbc0000 

[root@intel-s3ea2-02 ~]# lspci | grep Ethernet
01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
01:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
[root@intel-s3ea2-02 ~]# ethtool -t eth0
BUG: unable to handle kernel NULL pointer dereference at 0000000000000158
IP: [<ffffffffa025fa57>] igb_alloc_rx_buffers_adv+0x207/0x400 [igb]
PGD 270bf2067 PUD 273cf0067 PMD 0 
Oops: 0000 [#1] SMP 

On 2.6.32-24 reached errors and the system hanged, did not reach panic. But this test was
performed on different machine.
RHEL6.0-Snapshot-3_nfs-Server-x86_64
[root@hp-sl2x160zg6-01 ~]# uname -r
2.6.32-24.el6.x86_64
[root@hp-sl2x160zg6-01 ~]# lspci | grep Ethernet
05:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
05:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
[root@hp-sl2x160zg6-01 ~]# ethtool -t eth0
Error sending SOL data: FAIL
SOL session closed by BMC
Error: Unable to establish IPMI v2 / RMCP+ session
Error: No response de-activating SOL payload

Verified on 2.6.32(snap4)
[root@intel-s3ea2-02 ~]# uname -r
2.6.32-25.el6.x86_64
[root@intel-s3ea2-02 ~]# ethtool -t eth0
The test result is PASS
The test extra info:
Register test  (offline)         0
Eeprom test    (offline)         0
Interrupt test (offline)         0
Loopback test  (offline)         0
Link test   (on/offline)         0
Comment 11 releng-rhel@redhat.com 2010-11-10 15:54:28 EST
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.