Description of problem: When system recovers from suspend, X appears to be non-functional (black screen with what appears to be blinking/garbled blocks or random characters). I pressed ctrl-alt-bspace to try and recover the system, system flashes to text mode then when it tries to start X again, I get this panic. It seems the panic happens fairly frequently, but actually getting it to kdump is somewhat more rare. BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: 00000000 *pde = 759fe067 Oops: 0000 [#1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/resource Modules linked in: cpufreq_powersave nfs lockd fscache nfs_acl i915 drm autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq dm_multipath scsi_dh video backlight sbs i2c_ec button battery asus_acpi ac parport_pc lp parport joydev snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq sg snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd_page_alloc snd_hwdep tg3 snd ide_cd libphy soundcore cdrom pcspkr sdhci mmc_core i2c_i801 i2c_core dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd CPU: 0 EIP: 0060:[<00000000>] Not tainted VLI EFLAGS: 00210282 (2.6.18-121.el5 #1) EIP is at _stext+0x3fbfee10/0x3c eax: ca16c600 ebx: c062b040 ecx: 00000000 edx: ca17f380 esi: f759fea8 edi: ca16c600 ebp: f759fe98 esp: f759fc28 ds: 007b es: 007b ss: 0068 Process setroubleshootd (pid: 2888, ti=f759f000 task=f7674aa0 task.ti=f759f000) Stack: c05af017 00000000 c0482bce f759ffb0 0816f5a0 0816f5b8 00000000 f759fe98 f759fe98 f759fe98 f759feb8 00000000 c04836a7 00000000 00000000 00000003 f75b3680 00000000 f7674aa0 c041e457 f7602404 f7602404 f7602400 ca16c600 Call Trace: [<c05af017>] sock_poll+0xc/0xe [<c0482bce>] do_sys_poll+0x198/0x339 [<c04836a7>] __pollwait+0x0/0xb2 [<c041e457>] default_wake_function+0x0/0xc [<c041e457>] default_wake_function+0x0/0xc [<c041e457>] default_wake_function+0x0/0xc [<c041e457>] default_wake_function+0x0/0xc [<f8862386>] ext3_mark_iloc_dirty+0x2d8/0x333 [ext3] [<c041d87e>] __wake_up+0x2a/0x3d [<f88c167c>] journal_stop+0x1b0/0x1ba [jbd] [<f886973c>] __ext3_journal_stop+0x19/0x34 [ext3] [<c045cc56>] __pagevec_lru_add+0x80/0x8b [<c04581aa>] generic_file_buffered_write+0x6dc/0x713 [<c04fd706>] vgacon_set_cursor_size+0x39/0xd0 [<c053a968>] set_cursor+0x50/0x5c [<c053e6ab>] vt_console_print+0x202/0x212 [<c04cd899>] constraint_expr_eval+0x3a6/0x42a [<c042444c>] release_console_sem+0x17e/0x1b8 [<c04cdb41>] context_struct_compute_av+0x224/0x284 [<c04c142f>] avc_alloc_node+0x16/0x150 [<c04c16ea>] avc_has_perm_noaudit+0x181/0x322 [<c04613e5>] do_wp_page+0x3bf/0x40a [<c05b3a47>] skb_dequeue+0x39/0x3f [<c05b1e58>] sk_free+0xa7/0xdf [<c0488107>] destroy_inode+0x36/0x45 [<c044a2b7>] audit_syscall_entry+0x14b/0x17d [<c0486e06>] dput+0x22/0xed [<c0482db0>] sys_poll+0x41/0x44 [<c0404f17>] syscall_call+0x7/0xb ======================= Code: Bad EIP value. EIP: [<00000000>] _stext+0x3fbfee10/0x3c SS:ESP 0068:f759fc28 Version-Release number of selected component (if applicable): RHEL5.3-Client-20081020.1 for i386 -121 kernel (possibly also -120 kernel, but I could never get a dump out of it) How reproducible: Frequently Steps to Reproduce: 1.initiate suspend via either the gnome applet or close the lid 2.open the lid to resume 3.press ctrl-alt-bspace when it is clear that X isn't going to recover. Actual results: panic Expected results: system recovers Additional info: vmcore and logs are here: http://test185.test.redhat.com/crashdumps/Z61t/2008-10-31/
As an extra data point, I have tried hibernating the system and resuming it and cannot get it to crash or otherwise fail to resume from a hibernate.
Looks like memory corruption? Is the backtrace always the same?
it probably is memory corruption.. got a different backtrace this time: BUG: unable to handle kernel paging request at virtual address 06fd05fd printing eip: 06fd05fd *pde = 00000000 Oops: 0000 [#1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/resource Modules linked in: i915 drm autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq dm_multipath scsi_dh video backlight sbs i2c_ec button battery asus_acpi ac parport_pc lp parport joydev sg snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq tg3 serio_raw snd_seq_device sdhci ide_cd libphy snd_pcm_oss mmc_core pcspkr snd_mixer_oss i2c_i801 snd_pcm cdrom i2c_core snd_timer snd_page_alloc snd_hwdep snd soundcore dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd CPU: 0 EIP: 0060:[<06fd05fd>] Not tainted VLI EFLAGS: 00010246 (2.6.18-121.el5 #1) EIP is at 0x6fd05fd eax: f6058efc ebx: c062b040 ecx: f6058eb4 edx: f58d6680 esi: f6058eb4 edi: f58d6680 ebp: 0000001e esp: f6058e6c ds: 007b es: 007b ss: 0068 Process acpid (pid: 3236, ti=f6058000 task=f7d86000 task.ti=f6058000) Stack: c05af004 0000001e f6058efc f6058efc 0000001e f6058e94 00000000 c05af48a f6058ed0 00000001 f6058e98 ffffffff ffffffff 0000001e f58d6680 00000001 00000000 f6058eb4 00000000 00000000 f6058ed0 00000001 00000000 00000000 Call Trace: [<c05af004>] do_sock_write+0xa3/0xaa [<c05af48a>] sock_aio_write+0x53/0x61 [<c047240e>] do_sync_write+0xb6/0xf1 [<c043466b>] autoremove_wake_function+0x0/0x2d [<c0472cd8>] vfs_write+0xb2/0x143 [<c04732b9>] sys_write+0x3c/0x63 [<c0404f17>] syscall_call+0x7/0xb ======================= Code: Bad EIP value. EIP: [<06fd05fd>] 0x6fd05fd SS:ESP 0068:f6058e6c dmesg log and crash dump available at: http://test185.test.redhat.com/crashdumps/Z61t/2008-11-5.1/ I'll try and reproduce it with the -122 kernel, any other ideas?
If you try suspending from single user mode, does it resume reliably? Best way to test this is: dbus-send --system --print-reply --dest=org.freedesktop.Hal /org/freedesktop/Hal/devices/computer org.freedesktop.Hal.Device.SystemPowerManagement.Suspend int32:0
I can't get that dbus command to work.. looks like the word wrapping may have mangled it. In single user mode it looks like one of the services it needs isn't running and in runlevel 3 I just get back the usage summary. I tried 'echo mem > /sys/power/state' and that suspended the system but it did not return from suspend nor did it appear to crash no matter what I did.
Can you attach the output of lshal?
Created attachment 322626 [details] lshal output
Hm. Did this ever work? From X (as root) does: pm-suspend --quirk-s3-bios --quirk-s3-mode result in any improvement?
I have a Z61t here, too. There is a regression in 5.3, filed as bug 468289, where the screen is blank on resume. VT switching works around it, but if I Ctrl-Alt-Backspace instead, I do get a kernel panic.
Zack, can you try the same thing?
I'll try it, but this laptop never needed any quirks before.
Er, actually it does seem to help. Resumes fine, and there's no kernel panic if I Ctrl-Alt-Backspace.
Hm. Ok, in that case if it's a regression it's in the Intel graphics driver - presumably it's now doing slightly less reprogramming than it was before, and then managed to stomp over inappropriate memory when killed. We can add quirks for the Z61t easily enough, but there's the potential for other systems to have been affected.
*** This bug has been marked as a duplicate of bug 468289 ***
This has been marked a duplicate of 468289 but attempting to access that thread returns "You are not authorized to access bug #468289."