Description of problem: Since I upgraded my laptop to F15 I have occasionnaly a crash after resume/suspend (I should say it happens between 10 or 20 % of the time). Everything was working perfectly in F14. When the bug happens, it happens between 0 and 20 seconds after the resume. Most of the time it is very fast and the screen is still black. The wifi led blinks a little bit, I can actionnate caps lock and then everything goes dead (screen black, caps lock not toggable). Occasionaly, seeing that X takes too much time before redrawing, I hit Ctrl-Alt-F2 and Ctrl-Alt-F1 and I get X up and running (keyboard, mouse and applications working). But then the computer freezes some time after that (5 or 10 seconds). Usually, there is nothing in the logs. Today, the computer survived 20 seconds and I have a Oops in /var/log/messages Sep 2 16:52:27 romarin kernel: [15209.404903] BUG: unable to handle kernel paging request at 000006ca000006ca Sep 2 16:52:27 romarin kernel: [15209.404928] IP: [<ffffffff811184ef>] __kmalloc_track_caller+0xb7/0x111 Sep 2 16:52:27 romarin kernel: [15209.404944] PGD 0 Sep 2 16:52:27 romarin kernel: [15209.404950] Oops: 0000 [#2] SMP Sep 2 16:52:27 romarin kernel: [15209.404959] CPU 1 Sep 2 16:52:27 romarin kernel: [15209.404962] Modules linked in: ppdev parport_pc lp parport cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_co nntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack arc4 dell_wmi sparse_keymap snd_hda_codec_hdmi s nd_hda_codec_idt dell_laptop microcode dcdbas snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device iwlagn uvcvideo i2c_i801 snd_pcm videodev iTCO_wdt joydev iTCO_vendor_support media v4l2_compat_ioctl32 mac80211 e1000e cfg80211 snd_timer snd rfkill soundcore snd_page_alloc ipv6 firewire_ohci sdhci_pci sdhci mmc_core firewire_core crc_itu_t wmi i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan] Sep 2 16:52:27 romarin kernel: [15209.405008] Sep 2 16:52:27 romarin kernel: [15209.405008] Pid: 1457, comm: kmail Tainted: G D W 2.6.40.3-0.fc15.x86_64 #1 Dell Inc. Latitude E4200 /02GMRH Sep 2 16:52:27 romarin kernel: [15209.405008] RIP: 0010:[<ffffffff811184ef>] [<ffffffff811184ef>] __kmalloc_track_caller+0xb7/0x111 Sep 2 16:52:27 romarin kernel: [15209.405008] RSP: 0018:ffff8800b41afe68 EFLAGS: 00010206 Sep 2 16:52:27 romarin kernel: [15209.405008] RAX: 0000000000000000 RBX: ffff880091bc03e0 RCX: 000000000024f0a7 Sep 2 16:52:27 romarin kernel: [15209.405008] RDX: 000000000024f0a6 RSI: 00000000000152c0 RDI: ffffffff817b46be Sep 2 16:52:27 romarin kernel: [15209.405008] RBP: ffff8800b41afea8 R08: ffff8800bcf152c0 R09: 0000003303f2f9c0 Sep 2 16:52:27 romarin kernel: [15209.405008] R10: 0000000000000003 R11: 0000000000000202 R12: ffff8800bc402600 Sep 2 16:52:27 romarin kernel: [15209.405008] R13: 000006ca000006ca R14: 00000000000000d0 R15: 0000000000000018 Sep 2 16:52:27 romarin kernel: [15209.405008] FS: 000 (Yes, the last line is incomplete.) Additional info: Up to date F15 on a Dell E4200 CPU: Intel(R) Core(TM)2 Duo CPU U9600 @ 1.60GHz Integrated Graphics Chipset: Intel(R) GM45 Linux romarin 2.6.40.3-0.fc15.x86_64 #1 SMP Tue Aug 16 04:10:59 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux (but the bug was present with previous kernels, too)
damn, following that information should be the backtrace, which is the most important part. It's not clear from this info what happened at all. keep trying, and hope you get lucky and get more of the dump in the logs perhaps ?
running the kernel-debug build might be something worth trying too.
Created attachment 521278 [details] Part of /var/log/messages from resume to crash Actually, I haven't been very attentive and there are much more stuff in /var/log/messages. I am quite surprised because, as I understand it, it tells the computer woke up at 16:49:35 and the crash I reported occured at 16:52:27, but I believe it didn't last three minutes. Oh, well. Here are 5 WARNING and 1 BUG preceeding the crash. I am putting everything since the resume to the BUG of my original post in the attached file.
... and there are more in /var/log/messages-2011xxxx, all with the same scheme. Here is the output of grep -h WARNING:\\\|BUG: /var/log/messages-2011* Of course, full logs are available if you think they are interesting, but it looks like it is many times the same thing. Aug 15 21:32:18 romarin kernel: [ 9589.303317] WARNING: at lib/list_debug.c:47 __list_del_entry+0x8d/0x98() Aug 15 21:32:18 romarin kernel: [ 9589.303865] WARNING: at lib/list_debug.c:47 __list_del_entry+0x8d/0x98() Aug 15 21:32:18 romarin kernel: [ 9589.318085] WARNING: at lib/list_debug.c:47 __list_del_entry+0x8d/0x98() Aug 15 21:32:18 romarin kernel: [ 9589.318657] WARNING: at lib/list_debug.c:47 __list_del_entry+0x8d/0x98() Aug 15 21:32:18 romarin kernel: [ 9589.334097] WARNING: at fs/sysfs/group.c:138 sysfs_remove_group+0x52/0x9b() Aug 15 21:32:18 romarin kernel: [ 9589.334688] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 Aug 15 21:32:22 romarin kernel: [ 9593.283496] BUG: unable to handle kernel paging request at 0000031200000312 Aug 15 21:32:23 romarin kernel: [ 9594.615382] BUG: unable to handle kernel paging request at 0000031200000312 Aug 15 21:32:23 romarin kernel: [ 9594.647342] BUG: unable to handle kernel paging request at 0000031200000312 Aug 15 21:32:23 romarin kernel: [ 9594.701696] BUG: unable to handle kernel paging request at 0000031200000312 Aug 15 21:32:23 romarin kernel: [ 9594.727401] BUG: unable to handle kernel paging request at 0000031200000312 Aug 15 21:32:23 romarin kernel: [ 9594.735753] BUG: unable to handle kernel paging request at 0000031200000312 Aug 15 21:32:23 romarin kernel: [ 9594.766690] BUG: unable to handle kernel paging request at 0000031200000312 Aug 15 21:32:23 romarin kernel: [ 9594.794812] BUG: unable to handle kernel paging request at 0000031200000312 Aug 15 21:32:23 romarin kernel: [ 9594.819675] BUG: unable to handle kernel paging request at 0000031200000312 Aug 15 21:32:23 romarin kernel: [ 9594.934851] BUG: unable to handle kernel paging request at 0000031200000312 Aug 28 20:34:54 romarin kernel: [33556.855584] WARNING: at lib/list_debug.c:47 __list_del_entry+0x8d/0x98() Aug 28 20:34:54 romarin kernel: [33556.856411] WARNING: at lib/list_debug.c:47 __list_del_entry+0x8d/0x98() Aug 28 20:34:54 romarin kernel: [33556.858051] WARNING: at lib/list_debug.c:47 __list_del_entry+0x8d/0x98() Aug 28 20:34:54 romarin kernel: [33556.858577] WARNING: at lib/list_debug.c:47 __list_del_entry+0x8d/0x98() Aug 28 20:34:54 romarin kernel: [33556.859174] WARNING: at fs/sysfs/group.c:138 sysfs_remove_group+0x52/0x9b() Aug 28 20:34:54 romarin kernel: [33556.859715] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 Aug 28 20:35:10 romarin kernel: [33573.476601] BUG: unable to handle kernel paging request at 000006d5000006d5 Aug 28 20:35:10 romarin kernel: [33573.477517] BUG: unable to handle kernel paging request at 000006d5000006d5 Aug 28 20:35:16 romarin kernel: [33578.647905] WARNING: at lib/list_debug.c:56 __list_del_entry+0x8d/0x98() Aug 28 20:35:16 romarin kernel: [33578.648175] WARNING: at lib/list_debug.c:56 __list_del_entry+0x8d/0x98() Aug 28 20:35:16 romarin kernel: [33578.648401] WARNING: at lib/list_debug.c:56 __list_del_entry+0x8d/0x98() Aug 28 20:35:16 romarin kernel: [33578.648635] WARNING: at lib/list_debug.c:56 __list_del_entry+0x8d/0x98() Aug 28 20:35:16 romarin kernel: [33578.648857] WARNING: at lib/list_debug.c:56 __list_del_entry+0x8d/0x98() Aug 28 20:35:16 romarin kernel: [33578.649609] BUG: unable to handle kernel NULL pointer dereference at (null)
My bug is on a x86-64 kernel, but it looks extremely similar to bug 726983 which has been reported on a i686 kernel. Should my bug be marked as a duplicate as the other even if they affects different architectures ?
As suggested by Dave Jones, I have tried for a while to run kernel-debug. It was interesting: the system would crash at each resume rather than once in a while. It would also crash very quickly and nothing of interest would appear in /var/log/messages I got tired of rebooting after each suspend/resume and I am now running on my Fedora 15 the latest kernel 2.6.35.6-39.fc14.x86_64 from Fedora 14. It works perfectly and I have had no crash in the last couple of days. Please suggest what I should do now.
An additional point which might be or not relevant. I noticed that under F15, the battery indicator on my desktop (kde) would, upon resume, display a discharged battery for one and two seconds and then the real charge level would be correctly displayed. This behaviour does not seem to occur with the F14 kernel and I have instantaneously the correct charge level displayed on resume. I thought this point might be relevant as the call chain in the WARNING includes sysfs_remove_battery. A last very small point: /proc/acpi/battery has two subdirectories, BAT0 which describe the physical battery, and BAT1 where the three files (alarm info state) only contain "present: no". I have only one physical battery. Could it be that the newer kernels are confused by the non-existing battery which made its way into sysfs ?
Are you still seeing this with 2.6.43/3.3?
(In reply to comment #8) > Are you still seeing this with 2.6.43/3.3? Now I haven't seen it in awhile. I am running right now 3.3.7-1.fc16.x86_64. Has there been a patch which you think fixed this ?
(In reply to comment #9) > (In reply to comment #8) > > Are you still seeing this with 2.6.43/3.3? > > Now I haven't seen it in awhile. I am running right now 3.3.7-1.fc16.x86_64. > > Has there been a patch which you think fixed this ? There's been a huge number of patches between 3.0 and 3.3. If you haven't seen it in a while we'll close this out for now. Please reopen or file a new bug against F16 if you hit it again.