Bug 728317 - kernel panic on resume
Summary: kernel panic on resume
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 15
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-08-04 17:00 UTC by Dhaval Giani
Modified: 2012-04-11 15:22 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-04-11 15:22:46 UTC


Attachments (Terms of Use)
panic picture (431.95 KB, image/jpeg)
2011-08-04 17:02 UTC, Dhaval Giani
no flags Details

Description Dhaval Giani 2011-08-04 17:00:31 UTC
Description of problem:
Kernel panic on resume. (Not sure if the resume is a red-herring). As of now, not been able to reproduce, but even if reproduced, not sure how to get more information out since its a hard hang.

Version-Release number of selected component (if applicable):
[dhaval@mordor ~]$ uname -r
2.6.40-4.fc15.x86_64
[dhaval@mordor ~]$ 

How reproducible:
Not been able to reproduce it yet

Steps to Reproduce:
This specific time, things done.
1.Suspend laptop
2.Remove USB devices connected (mouse, phone, external disk)
3.Resume
(Not been able to reproduce this because I had to run immediately so could not see if I can reproduce, but I am assuming its not reproducible)
  
Actual results:
Kernel Panic

Expected results:
System should resume

Additional info:
1. mcelog does show hardware issues but those are with thermal trip limits.
2. I have an image that shows the panic, will attach to bug
[dhaval@mordor ~]$ lscpu 
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
CPU socket(s):         1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 42
Stepping:              7
CPU MHz:               800.000
BogoMIPS:              5382.51
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              4096K
NUMA node0 CPU(s):     0-3
[dhaval@mordor ~]$ lspci 
00:00.0 Host bridge: Intel Corporation Sandy Bridge DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation Device 0126 (rev 09)
00:16.0 Communication controller: Intel Corporation Cougar Point HECI Controller #1 (rev 04)
00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 04)
00:1a.0 USB Controller: Intel Corporation Cougar Point USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation Cougar Point High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 1 (rev b4)
00:1c.1 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 2 (rev b4)
00:1c.3 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 4 (rev b4)
00:1c.4 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 5 (rev b4)
00:1c.6 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 7 (rev b4)
00:1d.0 USB Controller: Intel Corporation Cougar Point USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation Cougar Point LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation Cougar Point 6 port SATA AHCI Controller (rev 04)
00:1f.3 SMBus: Intel Corporation Cougar Point SMBus Controller (rev 04)
03:00.0 Network controller: Intel Corporation Centrino Wireless-N 1000
0d:00.0 System peripheral: Ricoh Co Ltd Device e823 (rev 04)
0e:00.0 USB Controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04)
[dhaval@mordor ~]$ 

If you need additional data, please let me know. I will try to reproduce later on tonight, once I am back and have the same setup again.

Comment 1 Dhaval Giani 2011-08-04 17:02:39 UTC
Created attachment 516748 [details]
panic picture

Comment 2 Josh Boyer 2011-08-04 18:07:06 UTC
(In reply to comment #1)
> Created attachment 516748 [details]
> panic picture

Bummer.  That doesn't really show much in the way of a backtrace.  I seem to have the same laptop you do, so I'll try and recreate it tomorrow.

Comment 3 Dhaval Giani 2011-08-06 16:10:51 UTC
So, I managed to hit something. Cannot confirm if it is the same thing. The steps

1. Suspend laptop
2. Disconnect USB devices
3. Resume laptop

That crashed it all again. However I cannot confirm that it was the same crash since it not fall through to the console. It is not 100% reproducible yet but I am sure I will figure it out.

Comment 4 Dhaval Giani 2011-11-02 16:45:49 UTC
So i finally have a trace that made it to hard disk

same bug, but different kernel version
[dhaval@mordor ~]$ uname -r
2.6.40.6-0.fc15.x86_64
[dhaval@mordor ~]$ 

backtrace

BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
IP: [<ffffffff81116b0e>] kmem_cache_alloc_node+0x10c/0x137
PGD 1f3753067 PUD 1f34c6067 PMD 0 
Oops: 0000 [#2] SMP 
CPU 0 
Modules linked in: tcp_lp usb_storage uas fuse ppdev parport_pc lp parport 8021q garp stp llc cpufreq_ondemand acpi_cpufreq mperf bnep bluetooth xts gf128mul dm_crypt snd_hda_codec_hdmi snd_hda_codec_conexant arc4 iwlagn snd_hda_intel snd_hda_codec virtio_net snd_hwdep snd_seq snd_seq_device snd_pcm xhci_hcd thinkpad_acpi uvcvideo videodev media v4l2_compat_ioctl32 snd_timer e1000e i2c_i801 mac80211 cfg80211 joydev snd iTCO_wdt iTCO_vendor_support soundcore kvm_intel kvm rfkill snd_page_alloc microcode ipv6 sdhci_pci sdhci mmc_core wmi i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
Pid: 9855, comm: bash Tainted: G      D     2.6.40.6-0.fc15.x86_64 #1 LENOVO 4286CTO/4286CTO
RIP: 0010:[<ffffffff81116b0e>]  [<ffffffff81116b0e>] kmem_cache_alloc_node+0x10c/0x137
RSP: 0018:ffff8801d00e7dc0  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000001200011 RCX: 00000000000013b6
RDX: 00000000000013b5 RSI: 00000000000158c0 RDI: ffffffff817b4cc3
RBP: ffff8801d00e7e10 R08: ffff88021e2158c0 R09: 0000000000000000
R10: 00007ffdb5ea99f0 R11: 0000000000000246 R12: ffff88021dc07600
R13: 0000000000000002 R14: 0000000000000000 R15: 00000000000000d0
FS:  00007ffdb5ea9720(0000) GS:ffff88021e200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000002 CR3: 0000000149fb1000 CR4: 00000000000406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process bash (pid: 9855, threadinfo ffff8801d00e6000, task ffff8801d0971730)
Stack:
 ffff8801f34c6010 ffff8801f360b0b0 ffff88020d4cad80 ffffffff81053066
 ffff8801d00e7df0 0000000001200011 ffffffffffffffea 0000000000000000
 ffff8801d0971730 00007ffdb5ea99f0 ffff8801d00e7ea0 ffffffff81053066
Call Trace:
 [<ffffffff81053066>] ? copy_process+0xd2/0x1171
 [<ffffffff81053066>] copy_process+0xd2/0x1171
 [<ffffffff81486eb5>] ? _cond_resched+0xe/0x22
 [<ffffffff811f39f6>] ? security_file_alloc+0x16/0x18
 [<ffffffff81054244>] do_fork+0x104/0x2c8
 [<ffffffff81063c9a>] ? recalc_sigpending+0x7e/0x82
 [<ffffffff81064291>] ? __set_task_blocked+0x66/0x6e
 [<ffffffff8112fba4>] ? path_put+0x20/0x24
 [<ffffffff810101e9>] sys_clone+0x28/0x2a
 [<ffffffff8148f023>] stub_clone+0x13/0x20
 [<ffffffff8148ed02>] ? system_call_fastpath+0x16/0x1b
Code: 24 49 83 c4 10 49 83 3c 24 00 eb 46 48 83 c4 28 4c 89 e8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 49 63 44 24 20 49 8b 34 24 48 8d 4a 01 
RIP  [<ffffffff81116b0e>] kmem_cache_alloc_node+0x10c/0x137
 RSP <ffff8801d00e7dc0>
CR2: 0000000000000002

Comment 5 Dave Jones 2011-11-03 18:15:04 UTC
something that should have been null had a single bit set, which does sound like it plausibly could be a hardware issue.

Comment 6 Dave Jones 2012-04-11 15:22:46 UTC
this is probably the i915 memory corruption issue that was fixed in 2.6.43.1


Note You need to log in before you can comment on or make changes to this bug.