Bug 612757

Summary: Occasional blank screen on hibernate/thaw with ThinkPad T510
Product: [Fedora] Fedora Reporter: Bojan Smojver <bojan>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 13CC: anton, dougsland, gansalmon, itamar, jonathan, kernel-maint, lists, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-29 13:47:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bojan Smojver 2010-07-08 23:51:51 UTC
Description of problem:
Occasionally, especially when /sys/power/image_size is set to default 512 MB, a kernel OOPS causes blank screen on thaw after hibernate.

Version-Release number of selected component (if applicable):
2.6.33.6-147.fc13.x86_64

How reproducible:
Sometimes.

Steps to Reproduce:
1. Hibernate.
2. Thaw.
  
Actual results:
Blank screen on resume. Can SSH into the system, so it's not entirely dead, but neither text consoles nor X work.

Expected results:
X/consoles in normal state.

Additional info:
When /sys/power/image_size is set to a larger value, between 1 and 2 GB, this cannot be readily replicated. Hardware profile: http://www.smolts.org/client/show/pub_ebd16c9b-ba21-4d39-964a-cfd361713146

OOPS:

Jul  9 09:30:33 shrek kernel: BUG: unable to handle kernel paging request at fff
fc90017d61004
Jul  9 09:30:33 shrek kernel: IP: [<ffffffffa007cd9b>] i915_gem_do_execbuffer+0x
785/0xfb3 [i915]
Jul  9 09:30:33 shrek kernel: PGD 23bc07067 PUD 23bc28067 PMD 22e3ed067 PTE 0
Jul  9 09:30:33 shrek kernel: Oops: 0002 [#1] SMP 
Jul  9 09:30:33 shrek kernel: last sysfs file: /sys/devices/virtual/backlight/ac
pi_video0/brightness
Jul  9 09:30:33 shrek kernel: CPU 3 
Jul  9 09:30:33 shrek kernel: Pid: 2437, comm: compiz Not tainted 2.6.33.6-147.f
c13.x86_64 #1 4313CTO/4313CTO
Jul  9 09:30:33 shrek kernel: RIP: 0010:[<ffffffffa007cd9b>]  [<ffffffffa007cd9b
>] i915_gem_do_execbuffer+0x785/0xfb3 [i915]
Jul  9 09:30:33 shrek kernel: RSP: 0018:ffff880230a1fc18  EFLAGS: 00010286
Jul  9 09:30:33 shrek kernel: RAX: 0000000002bf7000 RBX: ffff88021f147420 RCX: f
fffc90017d61000
Jul  9 09:30:33 shrek kernel: RDX: ffff88022ec3a780 RSI: ffffc90017d61004 RDI: f
fff88023325c000
Jul  9 09:30:33 shrek kernel: RBP: ffff880230a1fd18 R08: 0000000000000000 R09: 0
000000000600000
Jul  9 09:30:33 shrek kernel: R10: ffff88022ec3a900 R11: ffff880213cd3840 R12: f
fff8802187ce6c0
Jul  9 09:30:33 shrek kernel: R13: ffff88022cc998a8 R14: 0000000000000000 R15: f
fff8802187ce600
Jul  9 09:30:33 shrek kernel: FS:  00007f6f88bb0740(0000) GS:ffff880009180000(00
00) knlGS:0000000000000000
Jul  9 09:30:33 shrek kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul  9 09:30:33 shrek kernel: CR2: ffffc90017d61004 CR3: 00000002304ba000 CR4: 0
0000000000006e0
Jul  9 09:30:33 shrek kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul  9 09:30:33 shrek kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul  9 09:30:33 shrek kernel: Process compiz (pid: 2437, threadinfo ffff880230a1e000, task ffff880217f245f0)
Jul  9 09:30:33 shrek kernel: Stack:
Jul  9 09:30:33 shrek kernel: ffff88022d2f7480 ffff880233102ea0 ffff880230a1fc68 ffffffff81044114
Jul  9 09:30:33 shrek kernel: <0> ffff88022ec3a900 ffff88022ec3a780 ffff88021f147420 ffff88023325c000
Jul  9 09:30:33 shrek kernel: <0> 000000002d2f7480 ffff88021849a780 ffff88023325c000 00000001a007c5fe
Jul  9 09:30:33 shrek kernel: Call Trace:
Jul  9 09:30:33 shrek kernel: [<ffffffff81044114>] ? finish_task_switch+0x43/0xb2
Jul  9 09:30:33 shrek kernel: [<ffffffffa007c5fe>] ? drm_malloc_ab+0x33/0x4b [i915]
Jul  9 09:30:33 shrek kernel: [<ffffffffa0077dd7>] ? might_fault+0x1c/0x1e [i915]
Jul  9 09:30:33 shrek kernel: [<ffffffffa007d692>] i915_gem_execbuffer2+0xc9/0x129 [i915]
Jul  9 09:30:33 shrek kernel: [<ffffffff8109f7cd>] ? __delayacct_blkio_end+0x39/0x3b
Jul  9 09:30:33 shrek kernel: [<ffffffffa002b19b>] drm_ioctl+0x254/0x365 [drm]
Jul  9 09:30:33 shrek kernel: [<ffffffff810e588d>] ? read_swap_cache_async+0x3c/0x11c
Jul  9 09:30:33 shrek kernel: [<ffffffffa007d5c9>] ? i915_gem_execbuffer2+0x0/0x129 [i915]
Jul  9 09:30:33 shrek kernel: [<ffffffff810e67fd>] ? swap_entry_free+0x6f/0xd3
Jul  9 09:30:33 shrek kernel: [<ffffffff810c0bf8>] ? unlock_page+0x22/0x27
Jul  9 09:30:33 shrek kernel: [<ffffffff8110d8cf>] vfs_ioctl+0x2d/0xa1
Jul  9 09:30:33 shrek kernel: [<ffffffff8110de38>] do_vfs_ioctl+0x47e/0x4c4
Jul  9 09:30:33 shrek kernel: [<ffffffff810dc305>] ? find_vma+0x2c/0x5a
Jul  9 09:30:33 shrek kernel: [<ffffffff8110decf>] sys_ioctl+0x51/0x74
Jul  9 09:30:33 shrek kernel: [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
Jul  9 09:30:33 shrek kernel: Code: 77 5c 48 8b bd 38 ff ff ff 8b 43 04 41 03 42 5c 48 89 f1 81 e6 ff 0f 00 00 48 81 e1 00 f0 ff ff 48 03 8f d8 0f 00 00 48 8d 34 31 <89> 06 45 8b 52 5c 4c 89 53 10 48 89 d7 41 ff c6 e8 72 ae ff ff 
Jul  9 09:30:34 shrek kernel: RIP  [<ffffffffa007cd9b>] i915_gem_do_execbuffer+0x785/0xfb3 [i915]
Jul  9 09:30:34 shrek kernel: RSP <ffff880230a1fc18>
Jul  9 09:30:34 shrek kernel: CR2: ffffc90017d61004
Jul  9 09:30:34 shrek kernel: ---[ end trace a758ec0c73680f30 ]---

Comment 1 Chuck Ebbert 2010-07-09 07:22:09 UTC
Did this just start with this latest kernel version?
Does the latest 2.6.34 kernel from koji work any better?

Comment 2 Bojan Smojver 2010-07-09 07:44:31 UTC
The problem with earlier kernels was a different one in relation to hibernate, so I tried to avoid that. Recently, Intel graphics v. hibernate was fixed, so I gave it a try. This is when I noticed this new problem.

As for 2.6.34, I did not try that. Do we have F-13 builds? Or do I just use rawhide?

Comment 3 Bojan Smojver 2010-07-09 09:27:02 UTC
This kernel (the most recent 2.6.34):

http://koji.fedoraproject.org/koji/buildinfo?buildID=181064

Doesn't appear to have the Intel graphics hibernation fix. Can we can that applied and rebuilt? No point testing with 2.6.34 when we already know it's going to cause trouble without that patch.

Comment 4 Bojan Smojver 2010-07-09 09:31:57 UTC
This was closed upstream:

https://bugzilla.kernel.org/show_bug.cgi?id=15114

But, it looks somewhat similar to what I'm seeing here.

Comment 5 Bojan Smojver 2010-07-09 09:32:41 UTC
Have another trace:

BUG: unable to handle kernel paging request at ffffc90020300004
IP: [<ffffffffa007cd9b>] i915_gem_do_execbuffer+0x785/0xfb3 [i915]
PGD 23bc07067 PUD 23bc28067 PMD 22e3e0067 PTE 0
Oops: 0002 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:1c.1/0000:03:00.0/ieee80211/phy0/rfkill2/uevent
CPU 2 
Pid: 1941, comm: Xorg Not tainted 2.6.33.6-147.fc13.x86_64 #1 4313CTO/4313CTO
RIP: 0010:[<ffffffffa007cd9b>]  [<ffffffffa007cd9b>] i915_gem_do_execbuffer+0x785/0xfb3 [i915]
RSP: 0018:ffff880221b37c18  EFLAGS: 00010286
RAX: 000000000deff000 RBX: ffffc90011a4cbe0 RCX: ffffc90020300000
RDX: ffff88021ab07480 RSI: ffffc90020300004 RDI: ffff88022e390000
RBP: ffff880221b37d18 R08: ffff880221b36000 R09: ffff880221b36000
R10: ffff88021ab07300 R11: ffff880000000000 R12: ffff88021ab073c0
R13: ffff8801f68327a8 R14: 0000000000000000 R15: ffff88021ab07cc0
FS:  00007fef63e83840(0000) GS:ffff880009100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc90020300004 CR3: 00000002219bd000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process Xorg (pid: 1941, threadinfo ffff880221b36000, task ffff880230fbc5f0)
Stack:
ffff880221b37c90 ffff880221d35080 0000000000000000 ffff880230fbc5f0
 ffff88021ab07300 ffff88021ab07480 ffffc90011a4cbe0 ffff88022e390000
 0000000000000000 ffff8802218f49c0 ffff88022e390000 000001dfa007c5fe
Call Trace:
[<ffffffffa007c5fe>] ? drm_malloc_ab+0x33/0x4b [i915]
[<ffffffffa0077dd7>] ? might_fault+0x1c/0x1e [i915]
[<ffffffffa007d692>] i915_gem_execbuffer2+0xc9/0x129 [i915]
[<ffffffffa0078188>] ? i915_gem_sw_finish_ioctl+0x7f/0x8a [i915]
[<ffffffffa002b19b>] drm_ioctl+0x254/0x365 [drm]
[<ffffffffa007d5c9>] ? i915_gem_execbuffer2+0x0/0x129 [i915]
[<ffffffff811010fd>] ? do_sync_read+0xbf/0xfc
[<ffffffff8110d8cf>] vfs_ioctl+0x2d/0xa1
[<ffffffff8110de38>] do_vfs_ioctl+0x47e/0x4c4
[<ffffffff8104f0b0>] ? do_setitimer+0xbd/0x1de
[<ffffffff8110decf>] sys_ioctl+0x51/0x74
[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
Code: 77 5c 48 8b bd 38 ff ff ff 8b 43 04 41 03 42 5c 48 89 f1 81 e6 ff 0f 00 00 48 81 e1 00 f0 ff ff 48 03 8f d8 0f 00 00 48 8d 34 31 <89> 06 45 8b 52 5c 4c 89 53 10 48 89 d7 41 ff c6 e8 72 ae ff ff 
RIP  [<ffffffffa007cd9b>] i915_gem_do_execbuffer+0x785/0xfb3 [i915]
RSP <ffff880221b37c18>
CR2: ffffc90020300004

Comment 6 Chuck Ebbert 2010-07-09 11:58:10 UTC
(In reply to comment #3)
> This kernel (the most recent 2.6.34):
> 
> http://koji.fedoraproject.org/koji/buildinfo?buildID=181064
> 
> Doesn't appear to have the Intel graphics hibernation fix. Can we can that
> applied and rebuilt? No point testing with 2.6.34 when we already know it's
> going to cause trouble without that patch.    

The fix is in CVS, but there's been no build since then. F-13 has been rebased to 2.6.34 and there will be a build soon.

Comment 7 Bojan Smojver 2010-07-11 08:09:03 UTC
(In reply to comment #6)
 
> The fix is in CVS, but there's been no build since then. F-13 has been rebased
> to 2.6.34 and there will be a build soon.    

Thank you for building 2.6.34.1-9.fc13.x86_64. I have tried to hibernate/thaw with this kernel several times and so far, so good. Image size was left at default (512 MB). Given this is an occasional problem, I'm not going to pronounce it fixed. I'm guessing this may have something to do with total memory allocated on the system over time, so I'll give it a good bashing over the next week.

PS. I also did one suspend/resume while writing this comment. So, that looks to be working still.

Comment 8 Bojan Smojver 2010-07-11 08:26:12 UTC
(In reply to comment #7)
> I have tried to hibernate/thaw
> with this kernel several times and so far, so good.

And, of course, on the next hibernate/thaw, it hung the box. Black screen with cursor in the upper left corner and an artfact in the upper right part of the screen. Nothing in the logs. Not pingable fron another machine.

So, we're back to square one with 2.6.34.1.

Comment 9 Andrew Duggan 2010-07-14 16:38:37 UTC
Sorry to be PIA, but after the segfault fix I upgraded by Inspiron E1505 to F13 and have installed the 2.6.34.1-9 PAE kernel. Seeing a similar thing as described in this bug.
It is quite random, sometimes I can get up to 14 hibernate/thaw cycles other times not even 1.  Other than this one comment, I'll keep quiet.

Here is the ABRT The call trace is a little different, so just ignore if really is different. 

BUG: unable to handle kernel paging request at a5e89046
IP: [<f7f8ec27>] drm_mode_getconnector+0x295/0x2b9 [drm]
*pdpt = 0000000036091001 *pde = 0000000000000000 
Oops: 0002 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:1c.0/0000:0b:00.0/ssb0:0/ieee80211/phy0/rfkill1/uevent
Modules linked in: aes_i586 aes_generic coretemp ipv6 cpufreq_ondemand acpi_cpufreq fuse uinput arc4 snd_hda_codec_idt ecb snd_hda_intel snd_hda_codec b43 snd_hwdep snd_seq snd_seq_device snd_pcm mac80211 snd_timer cfg80211 snd b44 ssb dell_laptop soundcore dell_wmi i2c_i801 iTCO_wdt snd_page_alloc iTCO_vendor_support rfkill wmi mii sdhci_pci sdhci mmc_core joydev microcode dcdbas firewire_ohci firewire_core crc_itu_t i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: kvm]
Pid: 1516, comm: Xorg Not tainted 2.6.34.1-9.fc13.i686.PAE #1 0KD882/MM061                           
EIP: 0060:[<f7f8ec27>] EFLAGS: 00013293 CPU: 1
EIP is at drm_mode_getconnector+0x295/0x2b9 [drm]
EAX: f36d313b EBX: 00000001 ECX: 00000003 EDX: f36d3e7c
ESI: f69d4000 EDI: f8032b74 EBP: f36d3e60 ESP: f36d3dec
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process Xorg (pid: 1516, ti=f36d2000 task=f3df5940 task.ti=f36d2000)
Stack:
000000d0 f69d7688 000a3e0c f69d4154 0000033b 00000003 00000001 f36d3e7c
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Call Trace:
[<f7f85ba3>] ? drm_ioctl+0x23c/0x31d [drm]
[<f7f8e992>] ? drm_mode_getconnector+0x0/0x2b9 [drm]
[<c04c64a7>] ? get_swap_bio+0x3b/0x6b
[<c057f0e8>] ? file_has_perm+0x8c/0xa6
[<c04c64a7>] ? get_swap_bio+0x3b/0x6b
[<c04e485d>] ? vfs_ioctl+0x2c/0x96
[<f7f85967>] ? drm_ioctl+0x0/0x31d [drm]
[<c04e4df3>] ? do_vfs_ioctl+0x488/0x4c6
[<c04c64a7>] ? get_swap_bio+0x3b/0x6b
[<c057f38c>] ? selinux_file_ioctl+0x43/0x46
[<c04c64a7>] ? get_swap_bio+0x3b/0x6b
[<c04e4e77>] ? sys_ioctl+0x46/0x66
[<c04c64a7>] ? get_swap_bio+0x3b/0x6b
[<c0408cdf>] ? sysenter_do_call+0x12/0x28
[<c04c64a7>] ? get_swap_bio+0x3b/0x6b
[<c04c64a7>] ? get_swap_bio+0x3b/0x6b
Code: 00 74 17 e8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 28 eb 05 bf f2 ff ff <ff> 8b 45 90 e8 a5 1f 81 c8 89 f8 8b 55 f0 65 33 15 14 00 00 00 
EIP: [<f7f8ec27>] drm_mode_getconnector+0x295/0x2b9 [drm] SS:ESP 0068:f36d3dec
CR2: 00000000a5e89046

Comment 10 Bojan Smojver 2010-07-20 00:46:01 UTC
Just tried 2.6.34.1-20.fc13.x86_64 (http://koji.fedoraproject.org/koji/buildinfo?buildID=184570). Performed hibernate/thaw 10 times. Each time on thaw, I started a different program in Gnome - no segfaults. After that, I rebooted the machine into the old kernel and ran file system check. No errors.

So, looks like that other hibernation fix made a difference.

Please push to testing, so that more folks can verify.

Comment 11 Bug Zapper 2011-06-01 14:23:03 UTC
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 12 Bug Zapper 2011-06-29 13:47:39 UTC
Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.