Bug 735404

Summary: Crash from time to time after resume
Product: [Fedora] Fedora Reporter: Éric Brunet <eric.brunet>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 15CC: gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-06 19:54:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Part of /var/log/messages from resume to crash none

Description Éric Brunet 2011-09-02 15:10:43 UTC
Description of problem:

Since I upgraded my laptop to F15 I have occasionnaly a crash after resume/suspend (I should say it happens between 10 or 20 % of the time). Everything was working perfectly in F14.

When the bug happens, it happens between 0 and 20 seconds after the resume. Most of the time it is very fast and the screen is still black. The wifi led blinks a little bit, I can actionnate caps lock and then everything goes dead (screen black, caps lock not toggable). Occasionaly, seeing that X takes too much time before redrawing, I hit Ctrl-Alt-F2 and Ctrl-Alt-F1 and I get X up and running (keyboard, mouse and applications working). But then the computer freezes some time after that (5 or 10 seconds). Usually, there is nothing in the logs. Today, the computer survived 20 seconds and I have a Oops in /var/log/messages

Sep  2 16:52:27 romarin kernel: [15209.404903] BUG: unable to handle kernel paging request at 000006ca000006ca
Sep  2 16:52:27 romarin kernel: [15209.404928] IP: [<ffffffff811184ef>] __kmalloc_track_caller+0xb7/0x111
Sep  2 16:52:27 romarin kernel: [15209.404944] PGD 0 
Sep  2 16:52:27 romarin kernel: [15209.404950] Oops: 0000 [#2] SMP 
Sep  2 16:52:27 romarin kernel: [15209.404959] CPU 1 
Sep  2 16:52:27 romarin kernel: [15209.404962] Modules linked in: ppdev parport_pc lp parport cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_co
nntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack arc4 dell_wmi sparse_keymap snd_hda_codec_hdmi s
nd_hda_codec_idt dell_laptop microcode dcdbas snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device iwlagn uvcvideo i2c_i801 snd_pcm videodev iTCO_wdt 
joydev iTCO_vendor_support media v4l2_compat_ioctl32 mac80211 e1000e cfg80211 snd_timer snd rfkill soundcore snd_page_alloc ipv6 firewire_ohci sdhci_pci sdhci
 mmc_core firewire_core crc_itu_t wmi i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
Sep  2 16:52:27 romarin kernel: [15209.405008] 
Sep  2 16:52:27 romarin kernel: [15209.405008] Pid: 1457, comm: kmail Tainted: G      D W   2.6.40.3-0.fc15.x86_64 #1 Dell Inc. Latitude E4200                
  /02GMRH
Sep  2 16:52:27 romarin kernel: [15209.405008] RIP: 0010:[<ffffffff811184ef>]  [<ffffffff811184ef>] __kmalloc_track_caller+0xb7/0x111
Sep  2 16:52:27 romarin kernel: [15209.405008] RSP: 0018:ffff8800b41afe68  EFLAGS: 00010206
Sep  2 16:52:27 romarin kernel: [15209.405008] RAX: 0000000000000000 RBX: ffff880091bc03e0 RCX: 000000000024f0a7
Sep  2 16:52:27 romarin kernel: [15209.405008] RDX: 000000000024f0a6 RSI: 00000000000152c0 RDI: ffffffff817b46be
Sep  2 16:52:27 romarin kernel: [15209.405008] RBP: ffff8800b41afea8 R08: ffff8800bcf152c0 R09: 0000003303f2f9c0
Sep  2 16:52:27 romarin kernel: [15209.405008] R10: 0000000000000003 R11: 0000000000000202 R12: ffff8800bc402600
Sep  2 16:52:27 romarin kernel: [15209.405008] R13: 000006ca000006ca R14: 00000000000000d0 R15: 0000000000000018
Sep  2 16:52:27 romarin kernel: [15209.405008] FS:  000

(Yes, the last line is incomplete.)

Additional info:

Up to date F15 on a Dell E4200
  CPU: Intel(R) Core(TM)2 Duo CPU     U9600  @ 1.60GHz
  Integrated Graphics Chipset: Intel(R) GM45
  Linux romarin 2.6.40.3-0.fc15.x86_64 #1 SMP Tue Aug 16 04:10:59 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
(but the bug was present with previous kernels, too)

Comment 1 Dave Jones 2011-09-02 16:31:36 UTC
damn, following that information should be the backtrace, which is the most important part.

It's not clear from this info what happened at all.

keep trying, and hope you get lucky and get more of the dump in the logs perhaps ?

Comment 2 Dave Jones 2011-09-02 16:32:02 UTC
running the kernel-debug build might be something worth trying too.

Comment 3 Éric Brunet 2011-09-02 20:05:42 UTC
Created attachment 521278 [details]
Part of /var/log/messages from resume to crash

Actually, I haven't been very attentive and there are much more stuff in /var/log/messages. I am quite surprised because, as I understand it, it tells the computer woke up at 16:49:35 and the crash I reported occured at 16:52:27, but I believe it didn't last three minutes. Oh, well. Here are 5 WARNING and 1 BUG preceeding the crash. I am putting everything since the resume to the BUG of my original post in the attached file.

Comment 4 Éric Brunet 2011-09-02 20:20:41 UTC
... and there are more in /var/log/messages-2011xxxx, all with the same scheme.
Here is the output of
grep -h WARNING:\\\|BUG: /var/log/messages-2011*
Of course, full logs are available if you think they are interesting, but it looks like it is many times the same thing.

Aug 15 21:32:18 romarin kernel: [ 9589.303317] WARNING: at lib/list_debug.c:47 __list_del_entry+0x8d/0x98()
Aug 15 21:32:18 romarin kernel: [ 9589.303865] WARNING: at lib/list_debug.c:47 __list_del_entry+0x8d/0x98()
Aug 15 21:32:18 romarin kernel: [ 9589.318085] WARNING: at lib/list_debug.c:47 __list_del_entry+0x8d/0x98()
Aug 15 21:32:18 romarin kernel: [ 9589.318657] WARNING: at lib/list_debug.c:47 __list_del_entry+0x8d/0x98()
Aug 15 21:32:18 romarin kernel: [ 9589.334097] WARNING: at fs/sysfs/group.c:138 sysfs_remove_group+0x52/0x9b()
Aug 15 21:32:18 romarin kernel: [ 9589.334688] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
Aug 15 21:32:22 romarin kernel: [ 9593.283496] BUG: unable to handle kernel paging request at 0000031200000312
Aug 15 21:32:23 romarin kernel: [ 9594.615382] BUG: unable to handle kernel paging request at 0000031200000312
Aug 15 21:32:23 romarin kernel: [ 9594.647342] BUG: unable to handle kernel paging request at 0000031200000312
Aug 15 21:32:23 romarin kernel: [ 9594.701696] BUG: unable to handle kernel paging request at 0000031200000312
Aug 15 21:32:23 romarin kernel: [ 9594.727401] BUG: unable to handle kernel paging request at 0000031200000312
Aug 15 21:32:23 romarin kernel: [ 9594.735753] BUG: unable to handle kernel paging request at 0000031200000312
Aug 15 21:32:23 romarin kernel: [ 9594.766690] BUG: unable to handle kernel paging request at 0000031200000312
Aug 15 21:32:23 romarin kernel: [ 9594.794812] BUG: unable to handle kernel paging request at 0000031200000312
Aug 15 21:32:23 romarin kernel: [ 9594.819675] BUG: unable to handle kernel paging request at 0000031200000312
Aug 15 21:32:23 romarin kernel: [ 9594.934851] BUG: unable to handle kernel paging request at 0000031200000312
Aug 28 20:34:54 romarin kernel: [33556.855584] WARNING: at lib/list_debug.c:47 __list_del_entry+0x8d/0x98()
Aug 28 20:34:54 romarin kernel: [33556.856411] WARNING: at lib/list_debug.c:47 __list_del_entry+0x8d/0x98()
Aug 28 20:34:54 romarin kernel: [33556.858051] WARNING: at lib/list_debug.c:47 __list_del_entry+0x8d/0x98()
Aug 28 20:34:54 romarin kernel: [33556.858577] WARNING: at lib/list_debug.c:47 __list_del_entry+0x8d/0x98()
Aug 28 20:34:54 romarin kernel: [33556.859174] WARNING: at fs/sysfs/group.c:138 sysfs_remove_group+0x52/0x9b()
Aug 28 20:34:54 romarin kernel: [33556.859715] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
Aug 28 20:35:10 romarin kernel: [33573.476601] BUG: unable to handle kernel paging request at 000006d5000006d5
Aug 28 20:35:10 romarin kernel: [33573.477517] BUG: unable to handle kernel paging request at 000006d5000006d5
Aug 28 20:35:16 romarin kernel: [33578.647905] WARNING: at lib/list_debug.c:56 __list_del_entry+0x8d/0x98()
Aug 28 20:35:16 romarin kernel: [33578.648175] WARNING: at lib/list_debug.c:56 __list_del_entry+0x8d/0x98()
Aug 28 20:35:16 romarin kernel: [33578.648401] WARNING: at lib/list_debug.c:56 __list_del_entry+0x8d/0x98()
Aug 28 20:35:16 romarin kernel: [33578.648635] WARNING: at lib/list_debug.c:56 __list_del_entry+0x8d/0x98()
Aug 28 20:35:16 romarin kernel: [33578.648857] WARNING: at lib/list_debug.c:56 __list_del_entry+0x8d/0x98()
Aug 28 20:35:16 romarin kernel: [33578.649609] BUG: unable to handle kernel NULL pointer dereference at           (null)

Comment 5 Éric Brunet 2011-09-04 12:16:02 UTC
My bug is on a x86-64 kernel, but it looks extremely similar to bug 726983 which has been reported on a i686 kernel. Should my bug be marked as a duplicate as the other even if they affects different architectures ?

Comment 6 Éric Brunet 2011-09-09 00:09:31 UTC
As suggested by Dave Jones, I have tried for a while to run kernel-debug. It was interesting: the system would crash at each resume rather than once in a while. It would also crash very quickly and nothing of interest would appear in /var/log/messages

I got tired of rebooting after each suspend/resume and I am now running on my Fedora 15 the latest kernel 2.6.35.6-39.fc14.x86_64 from Fedora 14.

It works perfectly and I have had no crash in the last couple of days.

Please suggest what I should do now.

Comment 7 Éric Brunet 2011-09-09 07:46:48 UTC
An additional point which might be or not relevant.

I noticed that under F15, the battery indicator on my desktop (kde) would, upon resume, display a discharged battery for one and two seconds and then the real charge level would be correctly displayed. This behaviour does not seem to occur with the F14 kernel and I have instantaneously the correct charge level displayed on resume.

I thought this point might be relevant as the call chain in the WARNING includes sysfs_remove_battery.

A last very small point: /proc/acpi/battery has two subdirectories, BAT0 which describe the physical battery, and BAT1 where the three files (alarm info state) only contain "present: no". I have only one physical battery. Could it be that the newer kernels are confused by the non-existing battery which made its way into sysfs ?

Comment 8 Josh Boyer 2012-06-06 19:12:02 UTC
Are you still seeing this with 2.6.43/3.3?

Comment 9 Éric Brunet 2012-06-06 19:38:22 UTC
(In reply to comment #8)
> Are you still seeing this with 2.6.43/3.3?

Now I haven't seen it in awhile. I am running right now 3.3.7-1.fc16.x86_64.

Has there been a patch which you think fixed this ?

Comment 10 Josh Boyer 2012-06-06 19:54:28 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > Are you still seeing this with 2.6.43/3.3?
> 
> Now I haven't seen it in awhile. I am running right now 3.3.7-1.fc16.x86_64.
> 
> Has there been a patch which you think fixed this ?

There's been a huge number of patches between 3.0 and 3.3.  If you haven't seen it in a while we'll close this out for now.  Please reopen or file a new bug against F16 if you hit it again.