469770 – kernel panic on Lenovo Z61t laptop when returning from suspend

Bug 469770 - kernel panic on Lenovo Z61t laptop when returning from suspend

Summary: kernel panic on Lenovo Z61t laptop when returning from suspend

Keywords:
Status:	CLOSED DUPLICATE of bug 468289
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	xorg-x11-drv-i810
Sub Component:
Version:	5.3
Hardware:	i386
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Adam Jackson
QA Contact:	desktop-bugs@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-11-03 22:19 UTC by Mike Gahagan
Modified:	2013-01-10 07:58 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-11-05 22:05:25 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
lshal output (105.33 KB, text/plain) 2008-11-05 18:32 UTC, Mike Gahagan	no flags	Details
View All

Description Mike Gahagan 2008-11-03 22:19:02 UTC

Description of problem:

When system recovers from suspend, X appears to be non-functional (black screen with what appears to be blinking/garbled blocks or random characters). I pressed ctrl-alt-bspace to try and recover the system, system flashes to text mode then when it tries to start X again, I get this panic. It seems the panic happens fairly frequently, but actually getting it to kdump is somewhat more rare.

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
 printing eip:
00000000
*pde = 759fe067
Oops: 0000 [#1]
SMP 
last sysfs file: /devices/pci0000:00/0000:00:00.0/resource
Modules linked in: cpufreq_powersave nfs lockd fscache nfs_acl i915 drm autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq dm_multipath scsi_dh video backlight sbs i2c_ec button battery asus_acpi ac parport_pc lp parport joydev snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq sg snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd_page_alloc snd_hwdep tg3 snd ide_cd libphy soundcore cdrom pcspkr sdhci mmc_core i2c_i801 i2c_core dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
CPU:    0
EIP:    0060:[<00000000>]    Not tainted VLI
EFLAGS: 00210282   (2.6.18-121.el5 #1) 
EIP is at _stext+0x3fbfee10/0x3c
eax: ca16c600   ebx: c062b040   ecx: 00000000   edx: ca17f380
esi: f759fea8   edi: ca16c600   ebp: f759fe98   esp: f759fc28
ds: 007b   es: 007b   ss: 0068
Process setroubleshootd (pid: 2888, ti=f759f000 task=f7674aa0 task.ti=f759f000)
Stack: c05af017 00000000 c0482bce f759ffb0 0816f5a0 0816f5b8 00000000 f759fe98 
       f759fe98 f759fe98 f759feb8 00000000 c04836a7 00000000 00000000 00000003 
       f75b3680 00000000 f7674aa0 c041e457 f7602404 f7602404 f7602400 ca16c600 
Call Trace:
 [<c05af017>] sock_poll+0xc/0xe
 [<c0482bce>] do_sys_poll+0x198/0x339
 [<c04836a7>] __pollwait+0x0/0xb2
 [<c041e457>] default_wake_function+0x0/0xc
 [<c041e457>] default_wake_function+0x0/0xc
 [<c041e457>] default_wake_function+0x0/0xc
 [<c041e457>] default_wake_function+0x0/0xc
 [<f8862386>] ext3_mark_iloc_dirty+0x2d8/0x333 [ext3]
 [<c041d87e>] __wake_up+0x2a/0x3d
 [<f88c167c>] journal_stop+0x1b0/0x1ba [jbd]
 [<f886973c>] __ext3_journal_stop+0x19/0x34 [ext3]
 [<c045cc56>] __pagevec_lru_add+0x80/0x8b
 [<c04581aa>] generic_file_buffered_write+0x6dc/0x713
 [<c04fd706>] vgacon_set_cursor_size+0x39/0xd0
 [<c053a968>] set_cursor+0x50/0x5c
 [<c053e6ab>] vt_console_print+0x202/0x212
 [<c04cd899>] constraint_expr_eval+0x3a6/0x42a
 [<c042444c>] release_console_sem+0x17e/0x1b8
 [<c04cdb41>] context_struct_compute_av+0x224/0x284
 [<c04c142f>] avc_alloc_node+0x16/0x150
 [<c04c16ea>] avc_has_perm_noaudit+0x181/0x322
 [<c04613e5>] do_wp_page+0x3bf/0x40a
 [<c05b3a47>] skb_dequeue+0x39/0x3f
 [<c05b1e58>] sk_free+0xa7/0xdf
 [<c0488107>] destroy_inode+0x36/0x45
 [<c044a2b7>] audit_syscall_entry+0x14b/0x17d
 [<c0486e06>] dput+0x22/0xed
 [<c0482db0>] sys_poll+0x41/0x44
 [<c0404f17>] syscall_call+0x7/0xb
 =======================
Code:  Bad EIP value.
EIP: [<00000000>] _stext+0x3fbfee10/0x3c SS:ESP 0068:f759fc28
 


Version-Release number of selected component (if applicable):
RHEL5.3-Client-20081020.1 for i386
-121 kernel (possibly also -120 kernel, but I could never get a dump out of it)

How reproducible:
Frequently

Steps to Reproduce:
1.initiate suspend via either the gnome applet or close the lid
2.open the lid to resume
3.press ctrl-alt-bspace when it is clear that X isn't going to recover.
  
Actual results:
panic

Expected results:
system recovers

Additional info:
vmcore and logs are here:
http://test185.test.redhat.com/crashdumps/Z61t/2008-10-31/

Comment 1 Mike Gahagan 2008-11-04 14:18:27 UTC

As an extra data point, I have tried hibernating the system and resuming it and cannot get it to crash or otherwise fail to resume from a hibernate.

Comment 2 Matthew Garrett 2008-11-04 16:03:38 UTC

Looks like memory corruption? Is the backtrace always the same?

Comment 3 Mike Gahagan 2008-11-05 16:47:47 UTC

it probably is memory corruption.. got a different backtrace this time:

BUG: unable to handle kernel paging request at virtual address 06fd05fd
 printing eip:
06fd05fd
*pde = 00000000
Oops: 0000 [#1]
SMP 
last sysfs file: /devices/pci0000:00/0000:00:00.0/resource
Modules linked in: i915 drm autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq dm_multipath scsi_dh video backlight sbs i2c_ec button battery asus_acpi ac parport_pc lp parport joydev sg snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq tg3 serio_raw snd_seq_device sdhci ide_cd libphy snd_pcm_oss mmc_core pcspkr snd_mixer_oss i2c_i801 snd_pcm cdrom i2c_core snd_timer snd_page_alloc snd_hwdep snd soundcore dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
CPU:    0
EIP:    0060:[<06fd05fd>]    Not tainted VLI
EFLAGS: 00010246   (2.6.18-121.el5 #1) 
EIP is at 0x6fd05fd
eax: f6058efc   ebx: c062b040   ecx: f6058eb4   edx: f58d6680
esi: f6058eb4   edi: f58d6680   ebp: 0000001e   esp: f6058e6c
ds: 007b   es: 007b   ss: 0068
Process acpid (pid: 3236, ti=f6058000 task=f7d86000 task.ti=f6058000)
Stack: c05af004 0000001e f6058efc f6058efc 0000001e f6058e94 00000000 c05af48a 
       f6058ed0 00000001 f6058e98 ffffffff ffffffff 0000001e f58d6680 00000001 
       00000000 f6058eb4 00000000 00000000 f6058ed0 00000001 00000000 00000000 
Call Trace:
 [<c05af004>] do_sock_write+0xa3/0xaa
 [<c05af48a>] sock_aio_write+0x53/0x61
 [<c047240e>] do_sync_write+0xb6/0xf1
 [<c043466b>] autoremove_wake_function+0x0/0x2d
 [<c0472cd8>] vfs_write+0xb2/0x143
 [<c04732b9>] sys_write+0x3c/0x63
 [<c0404f17>] syscall_call+0x7/0xb
 =======================
Code:  Bad EIP value.
EIP: [<06fd05fd>] 0x6fd05fd SS:ESP 0068:f6058e6c
 

dmesg log and crash dump available at:
http://test185.test.redhat.com/crashdumps/Z61t/2008-11-5.1/

I'll try and reproduce it with the -122 kernel, any other ideas?

Comment 4 Matthew Garrett 2008-11-05 17:02:28 UTC

If you try suspending from single user mode, does it resume reliably? Best way to test this is:

dbus-send --system --print-reply --dest=org.freedesktop.Hal /org/freedesktop/Hal/devices/computer org.freedesktop.Hal.Device.SystemPowerManagement.Suspend int32:0

Comment 5 Mike Gahagan 2008-11-05 18:11:43 UTC

I can't get that dbus command to work.. looks like the word wrapping may have mangled it. In single user mode it looks like one of the services it needs isn't running and in runlevel 3 I just get back the usage summary. 

I tried 'echo mem > /sys/power/state' and that suspended the system but it did not return from suspend nor did it appear to crash no matter what I did.

Comment 6 Matthew Garrett 2008-11-05 18:21:46 UTC

Can you attach the output of lshal?

Comment 7 Mike Gahagan 2008-11-05 18:32:14 UTC

Created attachment 322626 [details]
lshal output

Comment 8 Matthew Garrett 2008-11-05 18:56:02 UTC

Hm. Did this ever work? From X (as root) does:

pm-suspend --quirk-s3-bios --quirk-s3-mode

result in any improvement?

Comment 9 Zack Cerza 2008-11-05 19:25:32 UTC

I have a Z61t here, too. There is a regression in 5.3, filed as bug 468289, where the screen is blank on resume. VT switching works around it, but if I Ctrl-Alt-Backspace instead, I do get a kernel panic.

Comment 10 Matthew Garrett 2008-11-05 19:37:47 UTC

Zack, can you try the same thing?

Comment 11 Zack Cerza 2008-11-05 20:00:15 UTC

I'll try it, but this laptop never needed any quirks before.

Comment 12 Zack Cerza 2008-11-05 20:08:44 UTC

Er, actually it does seem to help. Resumes fine, and there's no kernel panic if I Ctrl-Alt-Backspace.

Comment 13 Matthew Garrett 2008-11-05 22:03:51 UTC

Hm. Ok, in that case if it's a regression it's in the Intel graphics driver - presumably it's now doing slightly less reprogramming than it was before, and then managed to stomp over inappropriate memory when killed. We can add quirks for the Z61t easily enough, but there's the potential for other systems to have been affected.

Comment 14 Matthew Garrett 2008-11-05 22:05:25 UTC


*** This bug has been marked as a duplicate of bug 468289 ***

Comment 15 Bill Muller 2012-03-13 23:11:13 UTC

This has been marked a duplicate of 468289 but attempting to access that thread returns "You are not authorized to access bug #468289."

Note You need to log in before you can comment on or make changes to this bug.