Bug 728317

Summary: kernel panic on resume
Product: [Fedora] Fedora Reporter: Dhaval Giani <dhaval.giani>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 15CC: gansalmon, itamar, jonathan, jwboyer, kernel-maint, madhu.chinakonda, sergio
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-04-11 15:22:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
panic picture none

Description Dhaval Giani 2011-08-04 17:00:31 UTC
Description of problem:
Kernel panic on resume. (Not sure if the resume is a red-herring). As of now, not been able to reproduce, but even if reproduced, not sure how to get more information out since its a hard hang.

Version-Release number of selected component (if applicable):
[dhaval@mordor ~]$ uname -r
2.6.40-4.fc15.x86_64
[dhaval@mordor ~]$ 

How reproducible:
Not been able to reproduce it yet

Steps to Reproduce:
This specific time, things done.
1.Suspend laptop
2.Remove USB devices connected (mouse, phone, external disk)
3.Resume
(Not been able to reproduce this because I had to run immediately so could not see if I can reproduce, but I am assuming its not reproducible)
  
Actual results:
Kernel Panic

Expected results:
System should resume

Additional info:
1. mcelog does show hardware issues but those are with thermal trip limits.
2. I have an image that shows the panic, will attach to bug
[dhaval@mordor ~]$ lscpu 
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
CPU socket(s):         1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 42
Stepping:              7
CPU MHz:               800.000
BogoMIPS:              5382.51
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              4096K
NUMA node0 CPU(s):     0-3
[dhaval@mordor ~]$ lspci 
00:00.0 Host bridge: Intel Corporation Sandy Bridge DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation Device 0126 (rev 09)
00:16.0 Communication controller: Intel Corporation Cougar Point HECI Controller #1 (rev 04)
00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 04)
00:1a.0 USB Controller: Intel Corporation Cougar Point USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation Cougar Point High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 1 (rev b4)
00:1c.1 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 2 (rev b4)
00:1c.3 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 4 (rev b4)
00:1c.4 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 5 (rev b4)
00:1c.6 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 7 (rev b4)
00:1d.0 USB Controller: Intel Corporation Cougar Point USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation Cougar Point LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation Cougar Point 6 port SATA AHCI Controller (rev 04)
00:1f.3 SMBus: Intel Corporation Cougar Point SMBus Controller (rev 04)
03:00.0 Network controller: Intel Corporation Centrino Wireless-N 1000
0d:00.0 System peripheral: Ricoh Co Ltd Device e823 (rev 04)
0e:00.0 USB Controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04)
[dhaval@mordor ~]$ 

If you need additional data, please let me know. I will try to reproduce later on tonight, once I am back and have the same setup again.

Comment 1 Dhaval Giani 2011-08-04 17:02:39 UTC
Created attachment 516748 [details]
panic picture

Comment 2 Josh Boyer 2011-08-04 18:07:06 UTC
(In reply to comment #1)
> Created attachment 516748 [details]
> panic picture

Bummer.  That doesn't really show much in the way of a backtrace.  I seem to have the same laptop you do, so I'll try and recreate it tomorrow.

Comment 3 Dhaval Giani 2011-08-06 16:10:51 UTC
So, I managed to hit something. Cannot confirm if it is the same thing. The steps

1. Suspend laptop
2. Disconnect USB devices
3. Resume laptop

That crashed it all again. However I cannot confirm that it was the same crash since it not fall through to the console. It is not 100% reproducible yet but I am sure I will figure it out.

Comment 4 Dhaval Giani 2011-11-02 16:45:49 UTC
So i finally have a trace that made it to hard disk

same bug, but different kernel version
[dhaval@mordor ~]$ uname -r
2.6.40.6-0.fc15.x86_64
[dhaval@mordor ~]$ 

backtrace

BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
IP: [<ffffffff81116b0e>] kmem_cache_alloc_node+0x10c/0x137
PGD 1f3753067 PUD 1f34c6067 PMD 0 
Oops: 0000 [#2] SMP 
CPU 0 
Modules linked in: tcp_lp usb_storage uas fuse ppdev parport_pc lp parport 8021q garp stp llc cpufreq_ondemand acpi_cpufreq mperf bnep bluetooth xts gf128mul dm_crypt snd_hda_codec_hdmi snd_hda_codec_conexant arc4 iwlagn snd_hda_intel snd_hda_codec virtio_net snd_hwdep snd_seq snd_seq_device snd_pcm xhci_hcd thinkpad_acpi uvcvideo videodev media v4l2_compat_ioctl32 snd_timer e1000e i2c_i801 mac80211 cfg80211 joydev snd iTCO_wdt iTCO_vendor_support soundcore kvm_intel kvm rfkill snd_page_alloc microcode ipv6 sdhci_pci sdhci mmc_core wmi i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
Pid: 9855, comm: bash Tainted: G      D     2.6.40.6-0.fc15.x86_64 #1 LENOVO 4286CTO/4286CTO
RIP: 0010:[<ffffffff81116b0e>]  [<ffffffff81116b0e>] kmem_cache_alloc_node+0x10c/0x137
RSP: 0018:ffff8801d00e7dc0  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000001200011 RCX: 00000000000013b6
RDX: 00000000000013b5 RSI: 00000000000158c0 RDI: ffffffff817b4cc3
RBP: ffff8801d00e7e10 R08: ffff88021e2158c0 R09: 0000000000000000
R10: 00007ffdb5ea99f0 R11: 0000000000000246 R12: ffff88021dc07600
R13: 0000000000000002 R14: 0000000000000000 R15: 00000000000000d0
FS:  00007ffdb5ea9720(0000) GS:ffff88021e200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000002 CR3: 0000000149fb1000 CR4: 00000000000406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process bash (pid: 9855, threadinfo ffff8801d00e6000, task ffff8801d0971730)
Stack:
 ffff8801f34c6010 ffff8801f360b0b0 ffff88020d4cad80 ffffffff81053066
 ffff8801d00e7df0 0000000001200011 ffffffffffffffea 0000000000000000
 ffff8801d0971730 00007ffdb5ea99f0 ffff8801d00e7ea0 ffffffff81053066
Call Trace:
 [<ffffffff81053066>] ? copy_process+0xd2/0x1171
 [<ffffffff81053066>] copy_process+0xd2/0x1171
 [<ffffffff81486eb5>] ? _cond_resched+0xe/0x22
 [<ffffffff811f39f6>] ? security_file_alloc+0x16/0x18
 [<ffffffff81054244>] do_fork+0x104/0x2c8
 [<ffffffff81063c9a>] ? recalc_sigpending+0x7e/0x82
 [<ffffffff81064291>] ? __set_task_blocked+0x66/0x6e
 [<ffffffff8112fba4>] ? path_put+0x20/0x24
 [<ffffffff810101e9>] sys_clone+0x28/0x2a
 [<ffffffff8148f023>] stub_clone+0x13/0x20
 [<ffffffff8148ed02>] ? system_call_fastpath+0x16/0x1b
Code: 24 49 83 c4 10 49 83 3c 24 00 eb 46 48 83 c4 28 4c 89 e8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 49 63 44 24 20 49 8b 34 24 48 8d 4a 01 
RIP  [<ffffffff81116b0e>] kmem_cache_alloc_node+0x10c/0x137
 RSP <ffff8801d00e7dc0>
CR2: 0000000000000002

Comment 5 Dave Jones 2011-11-03 18:15:04 UTC
something that should have been null had a single bit set, which does sound like it plausibly could be a hardware issue.

Comment 6 Dave Jones 2012-04-11 15:22:46 UTC
this is probably the i915 memory corruption issue that was fixed in 2.6.43.1