Bug 469770

Summary: kernel panic on Lenovo Z61t laptop when returning from suspend
Product: Red Hat Enterprise Linux 5 Reporter: Mike Gahagan <mgahagan>
Component: xorg-x11-drv-i810Assignee: Adam Jackson <ajax>
Status: CLOSED DUPLICATE QA Contact: desktop-bugs <desktop-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 5.3CC: bill.muller, dzickus, jfeeney, zcerza
Target Milestone: rc   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-11-05 22:05:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
lshal output none

Description Mike Gahagan 2008-11-03 22:19:02 UTC
Description of problem:

When system recovers from suspend, X appears to be non-functional (black screen with what appears to be blinking/garbled blocks or random characters). I pressed ctrl-alt-bspace to try and recover the system, system flashes to text mode then when it tries to start X again, I get this panic. It seems the panic happens fairly frequently, but actually getting it to kdump is somewhat more rare.

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
 printing eip:
00000000
*pde = 759fe067
Oops: 0000 [#1]
SMP 
last sysfs file: /devices/pci0000:00/0000:00:00.0/resource
Modules linked in: cpufreq_powersave nfs lockd fscache nfs_acl i915 drm autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq dm_multipath scsi_dh video backlight sbs i2c_ec button battery asus_acpi ac parport_pc lp parport joydev snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq sg snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd_page_alloc snd_hwdep tg3 snd ide_cd libphy soundcore cdrom pcspkr sdhci mmc_core i2c_i801 i2c_core dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
CPU:    0
EIP:    0060:[<00000000>]    Not tainted VLI
EFLAGS: 00210282   (2.6.18-121.el5 #1) 
EIP is at _stext+0x3fbfee10/0x3c
eax: ca16c600   ebx: c062b040   ecx: 00000000   edx: ca17f380
esi: f759fea8   edi: ca16c600   ebp: f759fe98   esp: f759fc28
ds: 007b   es: 007b   ss: 0068
Process setroubleshootd (pid: 2888, ti=f759f000 task=f7674aa0 task.ti=f759f000)
Stack: c05af017 00000000 c0482bce f759ffb0 0816f5a0 0816f5b8 00000000 f759fe98 
       f759fe98 f759fe98 f759feb8 00000000 c04836a7 00000000 00000000 00000003 
       f75b3680 00000000 f7674aa0 c041e457 f7602404 f7602404 f7602400 ca16c600 
Call Trace:
 [<c05af017>] sock_poll+0xc/0xe
 [<c0482bce>] do_sys_poll+0x198/0x339
 [<c04836a7>] __pollwait+0x0/0xb2
 [<c041e457>] default_wake_function+0x0/0xc
 [<c041e457>] default_wake_function+0x0/0xc
 [<c041e457>] default_wake_function+0x0/0xc
 [<c041e457>] default_wake_function+0x0/0xc
 [<f8862386>] ext3_mark_iloc_dirty+0x2d8/0x333 [ext3]
 [<c041d87e>] __wake_up+0x2a/0x3d
 [<f88c167c>] journal_stop+0x1b0/0x1ba [jbd]
 [<f886973c>] __ext3_journal_stop+0x19/0x34 [ext3]
 [<c045cc56>] __pagevec_lru_add+0x80/0x8b
 [<c04581aa>] generic_file_buffered_write+0x6dc/0x713
 [<c04fd706>] vgacon_set_cursor_size+0x39/0xd0
 [<c053a968>] set_cursor+0x50/0x5c
 [<c053e6ab>] vt_console_print+0x202/0x212
 [<c04cd899>] constraint_expr_eval+0x3a6/0x42a
 [<c042444c>] release_console_sem+0x17e/0x1b8
 [<c04cdb41>] context_struct_compute_av+0x224/0x284
 [<c04c142f>] avc_alloc_node+0x16/0x150
 [<c04c16ea>] avc_has_perm_noaudit+0x181/0x322
 [<c04613e5>] do_wp_page+0x3bf/0x40a
 [<c05b3a47>] skb_dequeue+0x39/0x3f
 [<c05b1e58>] sk_free+0xa7/0xdf
 [<c0488107>] destroy_inode+0x36/0x45
 [<c044a2b7>] audit_syscall_entry+0x14b/0x17d
 [<c0486e06>] dput+0x22/0xed
 [<c0482db0>] sys_poll+0x41/0x44
 [<c0404f17>] syscall_call+0x7/0xb
 =======================
Code:  Bad EIP value.
EIP: [<00000000>] _stext+0x3fbfee10/0x3c SS:ESP 0068:f759fc28
 


Version-Release number of selected component (if applicable):
RHEL5.3-Client-20081020.1 for i386
-121 kernel (possibly also -120 kernel, but I could never get a dump out of it)

How reproducible:
Frequently

Steps to Reproduce:
1.initiate suspend via either the gnome applet or close the lid
2.open the lid to resume
3.press ctrl-alt-bspace when it is clear that X isn't going to recover.
  
Actual results:
panic

Expected results:
system recovers

Additional info:
vmcore and logs are here:
http://test185.test.redhat.com/crashdumps/Z61t/2008-10-31/

Comment 1 Mike Gahagan 2008-11-04 14:18:27 UTC
As an extra data point, I have tried hibernating the system and resuming it and cannot get it to crash or otherwise fail to resume from a hibernate.

Comment 2 Matthew Garrett 2008-11-04 16:03:38 UTC
Looks like memory corruption? Is the backtrace always the same?

Comment 3 Mike Gahagan 2008-11-05 16:47:47 UTC
it probably is memory corruption.. got a different backtrace this time:

BUG: unable to handle kernel paging request at virtual address 06fd05fd
 printing eip:
06fd05fd
*pde = 00000000
Oops: 0000 [#1]
SMP 
last sysfs file: /devices/pci0000:00/0000:00:00.0/resource
Modules linked in: i915 drm autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq dm_multipath scsi_dh video backlight sbs i2c_ec button battery asus_acpi ac parport_pc lp parport joydev sg snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq tg3 serio_raw snd_seq_device sdhci ide_cd libphy snd_pcm_oss mmc_core pcspkr snd_mixer_oss i2c_i801 snd_pcm cdrom i2c_core snd_timer snd_page_alloc snd_hwdep snd soundcore dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
CPU:    0
EIP:    0060:[<06fd05fd>]    Not tainted VLI
EFLAGS: 00010246   (2.6.18-121.el5 #1) 
EIP is at 0x6fd05fd
eax: f6058efc   ebx: c062b040   ecx: f6058eb4   edx: f58d6680
esi: f6058eb4   edi: f58d6680   ebp: 0000001e   esp: f6058e6c
ds: 007b   es: 007b   ss: 0068
Process acpid (pid: 3236, ti=f6058000 task=f7d86000 task.ti=f6058000)
Stack: c05af004 0000001e f6058efc f6058efc 0000001e f6058e94 00000000 c05af48a 
       f6058ed0 00000001 f6058e98 ffffffff ffffffff 0000001e f58d6680 00000001 
       00000000 f6058eb4 00000000 00000000 f6058ed0 00000001 00000000 00000000 
Call Trace:
 [<c05af004>] do_sock_write+0xa3/0xaa
 [<c05af48a>] sock_aio_write+0x53/0x61
 [<c047240e>] do_sync_write+0xb6/0xf1
 [<c043466b>] autoremove_wake_function+0x0/0x2d
 [<c0472cd8>] vfs_write+0xb2/0x143
 [<c04732b9>] sys_write+0x3c/0x63
 [<c0404f17>] syscall_call+0x7/0xb
 =======================
Code:  Bad EIP value.
EIP: [<06fd05fd>] 0x6fd05fd SS:ESP 0068:f6058e6c
 

dmesg log and crash dump available at:
http://test185.test.redhat.com/crashdumps/Z61t/2008-11-5.1/

I'll try and reproduce it with the -122 kernel, any other ideas?

Comment 4 Matthew Garrett 2008-11-05 17:02:28 UTC
If you try suspending from single user mode, does it resume reliably? Best way to test this is:

dbus-send --system --print-reply --dest=org.freedesktop.Hal /org/freedesktop/Hal/devices/computer org.freedesktop.Hal.Device.SystemPowerManagement.Suspend int32:0

Comment 5 Mike Gahagan 2008-11-05 18:11:43 UTC
I can't get that dbus command to work.. looks like the word wrapping may have mangled it. In single user mode it looks like one of the services it needs isn't running and in runlevel 3 I just get back the usage summary. 

I tried 'echo mem > /sys/power/state' and that suspended the system but it did not return from suspend nor did it appear to crash no matter what I did.

Comment 6 Matthew Garrett 2008-11-05 18:21:46 UTC
Can you attach the output of lshal?

Comment 7 Mike Gahagan 2008-11-05 18:32:14 UTC
Created attachment 322626 [details]
lshal output

Comment 8 Matthew Garrett 2008-11-05 18:56:02 UTC
Hm. Did this ever work? From X (as root) does:

pm-suspend --quirk-s3-bios --quirk-s3-mode

result in any improvement?

Comment 9 Zack Cerza 2008-11-05 19:25:32 UTC
I have a Z61t here, too. There is a regression in 5.3, filed as bug 468289, where the screen is blank on resume. VT switching works around it, but if I Ctrl-Alt-Backspace instead, I do get a kernel panic.

Comment 10 Matthew Garrett 2008-11-05 19:37:47 UTC
Zack, can you try the same thing?

Comment 11 Zack Cerza 2008-11-05 20:00:15 UTC
I'll try it, but this laptop never needed any quirks before.

Comment 12 Zack Cerza 2008-11-05 20:08:44 UTC
Er, actually it does seem to help. Resumes fine, and there's no kernel panic if I Ctrl-Alt-Backspace.

Comment 13 Matthew Garrett 2008-11-05 22:03:51 UTC
Hm. Ok, in that case if it's a regression it's in the Intel graphics driver - presumably it's now doing slightly less reprogramming than it was before, and then managed to stomp over inappropriate memory when killed. We can add quirks for the Z61t easily enough, but there's the potential for other systems to have been affected.

Comment 14 Matthew Garrett 2008-11-05 22:05:25 UTC

*** This bug has been marked as a duplicate of bug 468289 ***

Comment 15 Bill Muller 2012-03-13 23:11:13 UTC
This has been marked a duplicate of 468289 but attempting to access that thread returns "You are not authorized to access bug #468289."