Bug 537494

Summary: With compiz and KMS enabled, hibernate/thaw doesn't work on Dell Inspiron 6400 (Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller)
Product: [Fedora] Fedora Reporter: Bojan Smojver <bojan>
Component: xorg-x11-drv-intelAssignee: Adam Jackson <ajax>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: medium    
Version: 13CC: ajax, awilliam, corsac, dougsland, gansalmon, HarrietSeverino, itamar, kernel-maint, lists, manisandro, mishu, redhat, sergio, theholyettlz, xgl-maint
Target Milestone: ---Keywords: Reopened, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.33.6-147.fc13 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-07-08 14:24:06 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
Output of lspci -vv
none
Duggan lspci -vv
none
Photo of a crash upon suspend, resume, hibernate, thaw cycle
none
X log from F-13: lockup on resume
none
F-13 messages showing sefaults after a few hibernate/thaw cycles
none
F-13 Xorg log file showing inablity of X run after a few hibernate/thaw cycles none

Description Bojan Smojver 2009-11-13 15:46:54 EST
Created attachment 369484 [details]
Output of lspci -vv

Description of problem:
Here is the matrix of what works and what doesn't on this machine with F-12:

1. KMS + compiz: works, but hibernate/thaw doesn't
2. noKMS + compiz: desktop becomes unbearably slow (opening a menu takes 30 sec)
3. KMS + metacity: works
4. noKMS + metacity: works

I have also tried with PAE kernel. That's similar. Essentially, after about 3 hibernate/thaw cycles, there is a blank or flickering screen and the machine is hung (no ping, Ctrl + Alt + Fn does nothing etc.)

Version-Release number of selected component (if applicable):
2.6.31.5-127.fc12.i686

How reproducible:
After a few hibernate/thaw cycles.

Steps to Reproduce:
1. Run KMS + compiz on this hardware.
2. Hibernate/thaw several times.
  
Actual results:
Machine freezes on thaw.

Expected results:
A variant of this used to work on F-11. With KMS disabled, compiz ran fine. Now that's unusable too (desktop becomes terribly slow).

Additional info:
Output of lcpci -vv attached.
Comment 1 Andrew Duggan 2009-11-13 16:25:47 EST
History:

F9 on this hardware ran compiz and hibernate/thaw great. Fast and reliable.
F10 on this hardware only ran compiz at speed if the F9 intel Xorg driver was used.
F11 on this hardware panics on thaw with KMS and compiz.  
F12 on this hardware same as F11 except compiz is so slow as to be unusable without KMS.

Guess I'll camp on this bug too. I guess my question is if ANY Intel graphics chips are usable with KMS/Compiz and hibernate/thaw. I wonder if F12 will run compiz (with enough speed to be usable) and hibernate/thaw on any currently available laptop hardware. 

When does the KMS stuff hit the mainline kernel?  Maybe then a kernel.org bz as a regression can finally be opened.  Torvalds shouldn't take any KMS stuff for intel until they fix this.   At least he gives some lip service to the idea that regressions are really very bad.
Comment 2 Bojan Smojver 2009-11-13 16:43:01 EST
(In reply to comment #1)
 
> F12 on this hardware same as F11 except compiz is so slow as to be unusable
> without KMS.

Yep, I'm back to metacity (which pegs Xorg CPU usage quite a bit) on both of my 6400 machines. Booting every time from scratch and then starting dozens of apps is just too tedious.
Comment 3 Bojan Smojver 2009-11-13 19:08:54 EST
(In reply to comment #0)

> 3. KMS + metacity: works

Actually, I'm exaggerating here. On one of the machines even this causes trouble. Flickering screen on thaw or dead machine.
Comment 4 Bojan Smojver 2009-11-23 01:07:51 EST
Anything in 2.6.31.6-134.fc12 worth testing regarding this?
Comment 5 Bojan Smojver 2009-11-23 01:11:40 EST
Or maybe 2.6.31.6-145.fc12?
Comment 6 Andrew Duggan 2009-11-27 10:53:58 EST
Here is some nice irony:

http://www.redhat.com/archives/fedora-devel-list/2009-November/msg02020.html
http://www.redhat.com/archives/fedora-devel-list/2009-November/msg02023.html

These are both by Arlie. I know it would be a lot easier to believe that if this bug did not exist. 

With this bug were only assured of what he labels a) and b).  Even though their goal by stating that's what their being paid for, goes a) through d)  inclusive.

It would also be a lot easier to believe if it weren't for the fact that this bug is a signal of an increasing level of regression from F9 through present, and that Intel Graphics chips were what everyone said to buy if you wanted good support in Fedora.  

Longing for the good old days, when I could recommend Linux and esp Fedora to people.
Comment 7 Bojan Smojver 2009-11-27 18:35:30 EST
(In reply to comment #6)
> Longing for the good old days, when I could recommend Linux and esp Fedora to
> people.

Personally, I was hoping KMS + compiz + hibernate/thaw would finally work in F-12. Instead, I get no compiz at all if I use hibernate/thaw. Clearly a regression.

Some people say "don't use hibernate/thaw or suspend/resume". I find that to be terribly inconvenient. I don't want to be starting all these apps every time - it's a waste time. I'd rather not have 3D graphics, which turns my desktop in 90's thing, but at least it works.

PS. On the topic of hibernate/thaw, long time ago I was trying to convince kernel folks to merge Suspend2/TuxOnIce into the kernel. They were telling me that better hibernate/thaw and suspend/resume support was coming. All these years later, we still have a shitty "blank screen, no progress bar" swsusp in F-12 and as far as I can see that uswsusp thing is a clumsy as it always was. Just ranting...
Comment 8 Bojan Smojver 2009-11-30 23:04:43 EST
Aha! Workaround from bug #541670 appears to fix the slowness with noKMS + compiz for me. I'll check if the latest kernel (-156) works out of the box.
Comment 9 Matěj Cepl 2009-12-01 10:37:25 EST
(In reply to comment #8)
> Aha! Workaround from bug #541670 appears to fix the slowness with noKMS +
> compiz for me. I'll check if the latest kernel (-156) works out of the box.  

So, isn't this just a dupe? What remains specific for this bug?
Comment 10 Andrew Duggan 2009-12-01 10:56:17 EST
Well Bojan can certainly comment, but the MAIN force of this bug is the segfaults on thaw from hibernate with KMS. So no it is not a dupe. It is really an F12 version of <a href="https://bugzilla.redhat.com/show_bug.cgi?id=500983">500983</a>, which received precious little love.
Comment 11 Bojan Smojver 2009-12-01 15:46:13 EST
(In reply to comment #10)
> Well Bojan can certainly comment, but the MAIN force of this bug is the
> segfaults on thaw from hibernate with KMS. So no it is not a dupe. It is really
> an F12 version of <a
> href="https://bugzilla.redhat.com/show_bug.cgi?id=500983">500983</a>, which
> received precious little love.

Correct. The regression without KMS was just a cherry on top. The underlying problem is that with KMS enabled, hibernate/thaw is unusable. Which then makes KMS unusable. At least for me.
Comment 12 Bojan Smojver 2009-12-07 15:38:57 EST
(In reply to comment #8)
> Aha! Workaround from bug #541670 appears to fix the slowness with noKMS +
> compiz for me. I'll check if the latest kernel (-156) works out of the box.  

Nope. I'm running -162 now, which has:

* Mon Nov 30 2009 Kyle McMartin <kyle@redhat.com>
- drm-i915-fix-sync-to-vbl-when-vga-is-off.patch: add (rhbz#541670)

And I still have to set sync_to_vblank to false. Otherwise, things are slow.
Comment 13 Adam Williamson 2009-12-13 12:54:56 EST
This bug would be a lot easier to deal with if there were a lot less verbiage and a lot more details. We don't have any X or kernel logs from either of you. We have no idea what hardware Andrew Duggan has. Can you please read the instructions at https://fedoraproject.org/wiki/How_to_debug_Xorg_problems and provide some of the information that's missing? Andrew, can you please explain what hardware you have and what your actual problem is? Thanks.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 14 Bojan Smojver 2009-12-13 16:42:05 EST
The machine hangs on resume/thaw. There is nothing special in kernel or X.org logs. Sometimes there is flickering screen, sometimes there is blank screen. That's all I can get out of it.
Comment 15 Andrew Duggan 2009-12-13 22:03:04 EST
(In reply to comment #13)
> This bug would be a lot easier to deal with if there were a lot less verbiage
> and a lot more details. We don't have any X or kernel logs from either of you.
> We have no idea what hardware Andrew Duggan has. Can you please read the


This is not an X bug it is a kernel bug, specifically related to KMS on Intel graphics.  

Bug can be reproduced from single user.  

Symptoms for me include user space segfaults on thaw from hibernate, when KMS is enabled. So, the "hanging" at least in my case is to due segfaults in userspace on thaw from hiberate where cycle count usually > 1.  Sometimes it take 5 cycles before it happens, but then it does.  It used to happen on every hibernate/thaw cycle.  Not all of user space segfaults, but usually all new processes immediately segfault

In short user space segfaults on thaw from hibernate with KMS on Intel Graphics.

Personally I believe is is this same bug at kernel.org 13811

http://bugzilla.kernel.org/show_bug.cgi?id=13811

Did anybody ever look at bug RHBZ 500983?  If you fix that one, this one will likely be fixed too.  It's been going on for more that 8 months, since before F11 was released.

I have a Dell Inspirion E1505 (MM061) which is the same HW as a Dell Inspiron E6400. I'll attach an lspci -vv

My graphics HW is:
00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller (rev 03)
	Subsystem: Dell Device 01bd
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Region 0: Memory at eff80000 (32-bit, non-prefetchable) [size=512K]
	Capabilities: [d0] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Comment 16 Andrew Duggan 2009-12-13 22:04:09 EST
Created attachment 378136 [details]
Duggan lspci -vv

As promised
Comment 17 Bojan Smojver 2009-12-14 00:00:38 EST
(In reply to comment #15)

> Symptoms for me include user space segfaults on thaw from hibernate, when KMS
> is enabled. So, the "hanging" at least in my case is to due segfaults in
> userspace on thaw from hiberate where cycle count usually > 1.  Sometimes it
> take 5 cycles before it happens, but then it does.  It used to happen on every
> hibernate/thaw cycle.  Not all of user space segfaults, but usually all new
> processes immediately segfault

I occasionally see the above as well. You kinda limp back into X, but then everything is broken. Every single binary crashes etc. So, I think we are describing one and the same thing (I've seen it too in F-11 as well). And, we also have the same hardware, so it makes sense.
Comment 18 Adam Williamson 2009-12-16 19:07:13 EST
andrew: "specifically related to KMS on Intel graphics.  "

we keep KMS bugs filed on the driver components rather than the kernel at present; while technically incorrect it's much easier for the developers to track.


-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 19 Bojan Smojver 2010-02-21 02:23:17 EST
Just a quick update here. Latest Fedora 12 kernel (2.6.32.8-58.fc12.i686) still does pretty much the same thing on this hardware. I booted, suspended, resumed, hibernated and then thawed. At that point, the system was dead. I'll attach the photo of the screen.
Comment 20 Bojan Smojver 2010-02-21 02:24:14 EST
Created attachment 395323 [details]
Photo of a crash upon suspend, resume, hibernate, thaw cycle
Comment 21 Bojan Smojver 2010-02-23 04:54:49 EST
The reason I'm getting particularly worried about this bug is because I read that as of Intel graphics driver 2.10.0 (which is in F-13) there will be no more user mode switching at all (i.e. removed completely from the source). Is that correct?

http://marc.info/?l=freedesktop-xorg-announce&m=126264406722609&w=2

If so, we will not be able to suspend/resume and hibernate/thaw on this hardware, unless something really changed in the new driver/kernel combo which is current in F-13. So, does anyone have any first hand knowledge of what the status here is?
Comment 22 Andrew Duggan 2010-02-23 08:02:58 EST
> If so, we will not be able to suspend/resume and hibernate/thaw on this
> hardware, unless something really changed in the new driver/kernel combo which
> is current in F-13. So, does anyone have any first hand knowledge of what the
> status here is?    

Just to pile on, I'm further concerned that this affects a broader range of intel hw than just the chips in our HW. (Based on the kernel.org bug I linked to above) Personally I think this bug should have been marked as a *regression* because KMS was advertised as a feature, and it broke existing functionality - Hibernate/Thaw.  

Is there any Intel Video hardware that doesn't have this bug? Can any Intel HW survive *multiple* hibernate/resume cycles with KMS enabled?  I fear not.  

F13 should be block on this issue, considering it broke in F11.

You first reported the same issue/bug in F11 on 2009-05-15 06:56 EDT, (500983) so we're 9 plus months with a serious regression. 

[Didn't want to clear your "Need Info" flag, so I put it back on. - Sorry]
Comment 23 Harriet Severino 2010-03-02 11:42:22 EST
I am getting a blank screen after booting into fedora 12 on a dell insperion, model pp31L with the Intel Centrino chip. This is probably related, so I am putting it here.

I can boot into centos 5.3 and all is well, but a fresh install on fedora 12, the screen goes blank less than a minute after booting. I am happy to add info on the fedora release, but I have just installed the fedora 12(final) DVD and the system in unusable booted to fedora. I can get info by mounting the fedora partition from centos.

Hardware info:
17" Dell Inspiron with 4GB and dual 250GB drives.
	
Quantity        Item Number     Description
1       N831M   Base,Notebook,Core Penryn T6600,2.2,1737
1       1YTJV   Module,Kit,Software,Roxio,Easy Cd & Dvd,10.3,Upsell
1       9R578   Module,Information,DHS,High, Value
1       C010J   Ship Group,Notebook,North England,1737,Dell Americas Organization
1       C635J   Module,Bezel,Cold Cathode Fluorescent Lamp,With,Camera 1737
1       CP1VF   Module,Software,W7HP64 Consumer Desktop,English Dao/bcc
1       D119D   Module,Liquid Crystal Display 17WXGA+,Cold Cathode Fluorescent Lamp,AUO
1       D850M   Module,Dual In-Line Memory Module,6G,800,DDR2,1X2G/1X4G
1       D9KVP   Module,Software,Quick Fix Engineering,W7HB32,Consumer Notebook
1       F112C   Module,Card (circuit),Network 1X2,Inspiron,Mercer Oliver Wyman,M09
1       F236N   Module,Assembly,Base,Discrete 1737,512F
1       HW426   Module,Adapter,Alternating Current,Delta - Ac Adapt,90W 3P,World Wide
1       J019D   Module,Software,DELL-DOCK Consumer
1       J206C   Module,Software,Dell Connect 2.1,Dell Americas
OrganizationEMEA,Brazil Customer Center
1       J6XND   Module,Software,W7HP64,Digital Video Disk Drive,Multiple User
Interface,NO-E
1       M017R   Module,Software,Powerdvd,8.3,Digital Video Disk Drive,True
Theatre High Definition,Factory Install
1       M239M   Module,Software,LTG DELL-DOWNLOAD-FLAG
1       M412C   Module,Cord,Power,125V,1M,C5 E,United States
1       MT343   Module,Battery,Primary,85WHR 9C,Dynapack International Technology Corp
1       N316C   Module,Keyboard,United States English,Inspiron Lightweight
Number Pad,PAO,D/E
1       N767N   Module,Hard Drive,5.4,250,#2 SGT-WYAT,XLO
1       P578F   Module,Information,Liquid Crystal Display,WXGA+,TLF,CCFLH
1       P578X   Module,Cover,Liquid Crystal Display,Paint,Blue,Black,1737
1       R164D   Module,Carrier,Hard Drive Second,Left,PACINO
1       T013C   Module,Software,Works,9 English
1       T22TT   Module,Software,Creative Camera,Consumer,1.4,Factory Install
1       T707G   Module,Media,Digital Video Disk Drive,Driver,Resource Dvd,1737
1       T708G   Service Install Module Software,Inspiron,1737
1       TG7TH   Module,Dvd+/-rw Pacino/hepburn,Hitachi Lg DataStorage,Inspiron
1       VR08H   Module,Software,Certificate Of Authenticity,W7HP32/64
1       8U335   Ship,Notebook,Inspiron Sputnik, 250N,US
1       W3HMK   Module,Label,Microsoft,Notebook,Windows7
1       W871N   Module,Software,WINDOWS-LIVE Consumer
1       WU264   Module,Hard Drive,250G,5.4K Samsung-M6,Inspiron
1       WXCT8   Module,Label,INTEL,Notebook CMT,Rebranding
1       X479D   Module,Information,With,Camera
1       X922N   Module,Card,Network,370,Latin Consumer Notebook,Dell Americas
Organization
1       XJ7JR   Module,Software,Roxio,EZCD,10.3,Upsell,Factory Install
1       XM544   Module,Software,PC-RESTORE Transactional Line Of Business
1       Y373R   Module,Kit,Software,Powerdvd,8.3,True Theatre High Definition
1       Y720M   Module,Software,DSPRT-CTR 64BIT,2.0
1       YP943   Module,Palmrest,W/O FPRDR Pacino
1       950-3337        1 Year Limited Warranty
1       992-4137        Dell Limited Hardware Warranty Plus Service, Initial Year
1       950-9057        No Warranty, Year 2 and 3
1       991-3680        Mail-in Service after Remote Diagnosis , Initial Year
1       960-2780        Dell Limited Hardware Warranty 7X24 Technical Support, Initial Year
1       3E476   Information,Equipment
Comment 24 Adam Williamson 2010-03-02 11:56:50 EST
"I am getting a blank screen after booting into fedora 12 on a dell insperion,
model pp31L with the Intel Centrino chip. This is probably related, so I am
putting it here."

Why do you say it's probably related? you're using a different system and your problem is different. Please file a new bug.
Comment 25 Harriet Severino 2010-03-02 12:02:30 EST
I am getting a blank screen after booting into fedora 12 on a dell insperion, model pp31L with the Intel Centrino chip. This is probably related, so I am putting it here.

I can boot into centos 5.3 and all is well, but a fresh install on fedora 12, the screen goes blank less than a minute after booting. I am happy to add info on the fedora release, but I have just installed the fedora 12(final) DVD and the system in unusable booted to fedora. I can get info by mounting the fedora partition from centos.

Hardware info:
17" Dell Inspiron with 4GB and dual 250GB drives.
	
Quantity        Item Number     Description
1       N831M   Base,Notebook,Core Penryn T6600,2.2,1737
1       1YTJV   Module,Kit,Software,Roxio,Easy Cd & Dvd,10.3,Upsell
1       9R578   Module,Information,DHS,High, Value
1       C010J   Ship Group,Notebook,North England,1737,Dell Americas Organization
1       C635J   Module,Bezel,Cold Cathode Fluorescent Lamp,With,Camera 1737
1       CP1VF   Module,Software,W7HP64 Consumer Desktop,English Dao/bcc
1       D119D   Module,Liquid Crystal Display 17WXGA+,Cold Cathode Fluorescent Lamp,AUO
1       D850M   Module,Dual In-Line Memory Module,6G,800,DDR2,1X2G/1X4G
1       D9KVP   Module,Software,Quick Fix Engineering,W7HB32,Consumer Notebook
1       F112C   Module,Card (circuit),Network 1X2,Inspiron,Mercer Oliver Wyman,M09
1       F236N   Module,Assembly,Base,Discrete 1737,512F
1       HW426   Module,Adapter,Alternating Current,Delta - Ac Adapt,90W 3P,World Wide
1       J019D   Module,Software,DELL-DOCK Consumer
1       J206C   Module,Software,Dell Connect 2.1,Dell Americas
OrganizationEMEA,Brazil Customer Center
1       J6XND   Module,Software,W7HP64,Digital Video Disk Drive,Multiple User
Interface,NO-E
1       M017R   Module,Software,Powerdvd,8.3,Digital Video Disk Drive,True
Theatre High Definition,Factory Install
1       M239M   Module,Software,LTG DELL-DOWNLOAD-FLAG
1       M412C   Module,Cord,Power,125V,1M,C5 E,United States
1       MT343   Module,Battery,Primary,85WHR 9C,Dynapack International Technology Corp
1       N316C   Module,Keyboard,United States English,Inspiron Lightweight
Number Pad,PAO,D/E
1       N767N   Module,Hard Drive,5.4,250,#2 SGT-WYAT,XLO
1       P578F   Module,Information,Liquid Crystal Display,WXGA+,TLF,CCFLH
1       P578X   Module,Cover,Liquid Crystal Display,Paint,Blue,Black,1737
1       R164D   Module,Carrier,Hard Drive Second,Left,PACINO
1       T013C   Module,Software,Works,9 English
1       T22TT   Module,Software,Creative Camera,Consumer,1.4,Factory Install
1       T707G   Module,Media,Digital Video Disk Drive,Driver,Resource Dvd,1737
1       T708G   Service Install Module Software,Inspiron,1737
1       TG7TH   Module,Dvd+/-rw Pacino/hepburn,Hitachi Lg DataStorage,Inspiron
1       VR08H   Module,Software,Certificate Of Authenticity,W7HP32/64
1       8U335   Ship,Notebook,Inspiron Sputnik, 250N,US
1       W3HMK   Module,Label,Microsoft,Notebook,Windows7
1       W871N   Module,Software,WINDOWS-LIVE Consumer
1       WU264   Module,Hard Drive,250G,5.4K Samsung-M6,Inspiron
1       WXCT8   Module,Label,INTEL,Notebook CMT,Rebranding
1       X479D   Module,Information,With,Camera
1       X922N   Module,Card,Network,370,Latin Consumer Notebook,Dell Americas
Organization
1       XJ7JR   Module,Software,Roxio,EZCD,10.3,Upsell,Factory Install
1       XM544   Module,Software,PC-RESTORE Transactional Line Of Business
1       Y373R   Module,Kit,Software,Powerdvd,8.3,True Theatre High Definition
1       Y720M   Module,Software,DSPRT-CTR 64BIT,2.0
1       YP943   Module,Palmrest,W/O FPRDR Pacino
1       950-3337        1 Year Limited Warranty
1       992-4137        Dell Limited Hardware Warranty Plus Service, Initial Year
1       950-9057        No Warranty, Year 2 and 3
1       991-3680        Mail-in Service after Remote Diagnosis , Initial Year
1       960-2780        Dell Limited Hardware Warranty 7X24 Technical Support, Initial Year
1       3E476   Information,Equipment
Comment 26 Bojan Smojver 2010-03-09 19:43:32 EST
(In reply to comment #21)
> Intel graphics driver 2.10.0 (which is in F-13) there will be no
> more user mode switching

I just tried F-13 Alpha Live i686 on this hardware and as suspected nomodeset passed to the kernel renders X unusable (no screens found). When booting as-is and trying to suspend, on resume no programs would start any more, although the desktop shows up (tried only once). Errors about squashfs flash by when trying to login as user fedora on one of the text consoles.

Guys, are we heading into a serious regression land here with F-13?
Comment 27 Adam Williamson 2010-03-09 20:25:17 EST
If the errors are about squashfs, they're likely specific to the live boot environment (the filesystem in the live boot environment is a big squashfs). I'm not actually sure if we support suspending/resuming in the live environment, it may simply be broken by design in all cases.

If you have a spare partition, try installing the Alpha and see if it works.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 28 Bojan Smojver 2010-03-09 20:35:06 EST
I will try this, eventually. However, I haven't seen anything anywhere (yet) that would convince me that suspend/resume and hibernate/thaw will work on this hardware with F-13 (i.e. under KMS). Still makes me feel worried.
Comment 29 Bojan Smojver 2010-03-10 04:00:35 EST
Things are a tiny bit better with kernel-2.6.32.9-70.fc12.i686.

I just upgraded to it in F-12 and did suspend/resume and hibernate/thaw cycles some 7 or 8 times. After that I got:
-------------------
Mar 10 19:50:42 shrek kernel: gnome-screensav[1903]: segfault at 1 ip 00c5b914 sp bfcc9b74 error 6 in libc-2.11.1.so[c09000+16f000]
Mar 10 19:50:42 shrek kernel: console-kit-dae[1336]: segfault at 1 ip 00c5b914 sp bf864be0 error 6 in libc-2.11.1.so[c09000+16f000]
Mar 10 19:54:28 shrek kernel: console-kit-dae[4714]: segfault at 1 ip 00c5b914 sp bfabe300 error 6 in libc-2.11.1.so[c09000+16f000]
Mar 10 19:54:28 shrek kernel: console-kit-dae[4873]: segfault at 1 ip 00c5b914 sp bfc011d0 error 6 in libc-2.11.1.so[c09000+16f000]
Mar 10 19:55:46 shrek kernel: ps[4970]: segfault at 1 ip 00162914 sp bfe2d3f0 error 6 in libc-2.11.1.so[110000+16f000]
-------------------

So, still not quite right. If you suspend/hibernate a few times only, you may not notice anything (I'm guessing that's what many people do). But, if you keep suspending/hibernating, eventually the problem persists.
Comment 30 Adam Williamson 2010-03-10 16:07:51 EST
you could try building the current f13 kernel on f12, you may have to whack a few things to make it fly but I've done that kind of build several times in the past to test things and you can usually get it going well enough to test something. that would give you a decent indication of whether it'll work, because all the heavy lifting for suspend is done in the kernel, so the fact that the rest of the system is f12 shouldn't matter.
Comment 31 Bojan Smojver 2010-03-11 06:30:34 EST
(In reply to comment #28)
> I will try this, eventually.

As promised, I now have F-13 on one of the partitions. I installed Alpha and fully updated.

Here is what I found (this is all with compiz + KMS, of course):

1. Artefacts are left over the screen. Things like Gnome terminal do it all the time.

2. Things don't appear in windows until something moves (e.g. a Google search in FF may not appear until you start scrolling). Switching to a text console may clear things up for a bit.

3. Hibernate/thaw doesn't work at all. The hibernate process writes image to swap and turns the machine off, but thaw whizzes by that and the boot process later reports that there was an image on swap, so it's going to get reinitialised. F-12 continues to hibernate/thaw just fine on this Dell Inspiron 6400.

4. I was able to suspend a few times, without seeing segfaults (the real failures usually occur with hibernate/thaw, so this is not really a proper test). However, after the latest suspend/resume after a clean boot, X locked up. I'll attach the log.

So, not so good, actually.
Comment 32 Bojan Smojver 2010-03-11 06:31:20 EST
Created attachment 399319 [details]
X log from F-13: lockup on resume
Comment 33 Bojan Smojver 2010-03-11 07:42:05 EST
(In reply to comment #31)
> F-12 continues to hibernate/thaw just fine on this Dell Inspiron 6400.

Just to be clear - without KMS only.
Comment 34 Andrew Duggan 2010-03-11 10:16:18 EST
(In reply to comment #31)

> 
> 3. Hibernate/thaw doesn't work at all. The hibernate process writes image to
> swap and turns the machine off, but thaw whizzes by that and the boot process
> later reports that there was an image on swap, so it's going to get
> reinitialised. F-12 continues to hibernate/thaw just fine on this Dell Inspiron
> 6400.
> 

So the F13 kernel 2.6.33-1.fc13.i686 is worse than the F12 update kernel-2.6.32.9-70.fc12.i686, which at least will hibernate / thaw in the 7 times range (with KMS).  I'm getting about 7 successful thaws on 2.6.32.9-70 too.  So to sum up 2.6.33.1 is totally broken for hibernate/thaw which is worse than 2.6.32.9. 

Would you mind adding your 2.6.33.1 experience to the kernel.org bug http://bugzilla.kernel.org/show_bug.cgi?id=13811

It looks like this has moved from simple regression to regressing regression.
Comment 35 Bojan Smojver 2010-03-11 16:00:36 EST
(In reply to comment #34)
 
> Would you mind adding your 2.6.33.1 experience to the kernel.org bug
> http://bugzilla.kernel.org/show_bug.cgi?id=13811
> 
> It looks like this has moved from simple regression to regressing regression.    

I see that bug is about hibernate/thaw, which I cannot do at all with at all with F-13. I'll add a link to comment #31 of this bug pointing that my experience is (for now) with suspend/resume only.
Comment 36 Adam Williamson 2010-03-11 17:04:58 EST
bojan: i'd say your f13 experience is worth a new report for f13, since it seems to have introduced new problems. thanks!



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 37 Bojan Smojver 2010-03-11 17:41:35 EST
(In reply to comment #36)
> bojan: i'd say your f13 experience is worth a new report for f13, since it
> seems to have introduced new problems. thanks!

Yeah, I'll open another bug report, or should I open two: one for lack of hibernate/thaw, the other for artefacts on the screen and no refresh?

PS. Incidentally, I'm going to be changing machines in the next few weeks. The new one will have Radeon graphics (most likely). So, if RH engineers want to get their hands on this hardware for a couple of weeks, I can have it brought to Sydney office. Or organise for you guys to SSH in, once I get my ducks in a row with the new machine. After that, my daughter gets it to run Windows on it.
Comment 38 Adam Williamson 2010-03-11 17:51:23 EST
Yeah, two bugs would be best.

If we could get the hardware to the Sydney office that might be great. Ajax, would that help you?



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 39 Bojan Smojver 2010-03-11 19:06:28 EST
(In reply to comment #38)
> Yeah, two bugs would be best.

Bug #572771 and bug #572772.

> If we could get the hardware to the Sydney office that might be great. Ajax,
> would that help you?

When I purchase and set up the new hardware, I'll let you know via this bug. If you want to have the machine for a few hours before I do that without dismantling my F-12 installation, I can drop by the the North Sydney office. I'm only 10 minutes away.
Comment 40 Matěj Cepl 2010-03-15 15:27:59 EDT

*** This bug has been marked as a duplicate of bug 570517 ***
Comment 41 Matěj Cepl 2010-03-20 19:41:34 EDT
(In reply to comment #40)
> 
> *** This bug has been marked as a duplicate of bug 570517 ***    

bug 570517 comment 16
Comment 42 Bojan Smojver 2010-04-06 03:12:48 EDT
Latest koji kernel, 2.6.32.11-99.fc12.i686, gives black screen and hung machine on second thaw.
Comment 43 Bojan Smojver 2010-05-04 02:48:03 EDT
After fixing hibernate/thaw in F-13 (see bug #572771), I can now confirm that this same problem affect F-13 as well. After a few hibernate/thaw cycles, segfaults start. I'll attach messages file.

One other interesting behaviour is that after about second thaw, I could no longer get into X. It kept restarting. So, I ran pm-hibernate from one of the text consoles. After a few more cycles, I could get back into X, but then the segfaults started. I'll attach Xorg log as well.
Comment 44 Bojan Smojver 2010-05-04 02:49:02 EDT
Created attachment 411190 [details]
F-13 messages showing sefaults after a few hibernate/thaw cycles
Comment 45 Bojan Smojver 2010-05-04 02:49:53 EDT
Created attachment 411192 [details]
F-13 Xorg log file showing inablity of X run after a few hibernate/thaw cycles
Comment 46 Bojan Smojver 2010-05-13 21:42:25 EDT
Just quick note that I have upgraded this machine to F-13, so I won't be able to test anything related to F-12 here any more. However, the problems persist in F-13 (I pointed to kernel bug that tracks that), so I will be able to get at least some feedback to you.

Note that if Dell get their act together and actually send me a new machine that works, this hardware will be retired, at which point I won't be able to submit any more feedback on this.
Comment 47 Matěj Cepl 2010-05-18 06:25:38 EDT
Moving to F13 then.
Comment 48 Bojan Smojver 2010-05-26 19:39:28 EDT
Maybe we should get 2.6.34 kernel for F-13 in testing:

https://bugzilla.kernel.org/show_bug.cgi?id=13811#c26
Comment 49 Adam Williamson 2010-05-26 19:56:42 EDT
you can boot the current F-14 2.6.34 kernel in F-13 fine, I'm doing this on one of my systems.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 50 Bojan Smojver 2010-05-26 20:11:05 EDT
OK, I'll give that a go and report back.
Comment 51 Bojan Smojver 2010-05-26 20:32:55 EDT
Nope, that's still no good:

----------------------------
May 27 10:24:24 shrek kernel: 99video[3780]: segfault at 2400aaaa ip 0806b9db sp bf97bc80 error 6 in bash[8047000+d1000]
May 27 10:24:24 shrek kernel: 99hd-apm-restor[3781]: segfault at 2400aaaa ip 0806b9db sp bfdc0f00 error 6 in bash[8047000+d1000]
May 27 10:24:24 shrek kernel: 98smart-kernel-[3782]: segfault at 2400aaaa ip 0806b9db sp bfcea8a0 error 6 in bash[8047000+d1000]
May 27 10:24:24 shrek kernel: 95packagekit[3783]: segfault at 2400aaaa ip 0806b9db sp bf871b20 error 6 in bash[8047000+d1000]
May 27 10:24:24 shrek kernel: 95led[3784]: segfault at 2400aaaa ip 0806b9db sp bfa80e10 error 6 in bash[8047000+d1000]
May 27 10:24:24 shrek kernel: 94cpufreq[3785]: segfault at 2400aaaa ip 0806b9db sp bfd8b6b0 error 6 in bash[8047000+d1000]
May 27 10:24:24 shrek kernel: 90clock[3786]: segfault at 2400aaaa ip 0806b9db sp bf991bd0 error 6 in bash[8047000+d1000]
May 27 10:24:24 shrek kernel: 75modules[3787]: segfault at 2400aaaa ip 0806b9db sp bf9bb510 error 6 in bash[8047000+d1000]
May 27 10:24:24 shrek kernel: 56dhclient[3788]: segfault at 2400aaaa ip 0806b9db sp bfd977b0 error 6 in bash[8047000+d1000]
May 27 10:24:24 shrek kernel: 56atd[3789]: segfault at 2400aaaa ip 0806b9db sp bfd4bd40 error 6 in bash[8047000+d1000]
May 27 10:26:11 shrek kernel: rc[3838]: segfault at 2400aaaa ip 0806b9db sp bfc8f4e0 error 6 in bash[8047000+d1000]
May 27 10:26:12 shrek kernel: Default[3840]: segfault at 2400aaaa ip 0806b9db sp bf92fd00 error 6 in bash[8047000+d1000]
May 27 10:26:28 shrek kernel: shutdown[3837]: segfault at bf9cea9f ip 00d51518 sp bf9c31d4 error 6 in libnss_files-2.12.so[d4b000+c000]
May 27 10:26:37 shrek kernel: rc[3856]: segfault at 2400aaaa ip 0806b9db sp bfe51750 error 6 in bash[8047000+d1000]
----------------------------

That happened on 4th hibernate/thaw cycle.

BTW, bug #593669 is still in 2.6.34-11.fc14.
Comment 52 Bojan Smojver 2010-06-24 23:09:20 EDT
Anything here that can help us?

http://intellinuxgraphics.org/2010Q2.html
Comment 53 Bojan Smojver 2010-07-02 23:58:46 EDT
Holy Batman! Using 2.6.33.6-142.rc1.fc13.i686 from koji (http://koji.fedoraproject.org/koji/buildinfo?buildID=181106), I was able to hibernate 10+ times and everything still works. Even had a suspend thrown in the mix once, for good measure.

I'm not going to pronounce this fixed yet (because I'm really paranoid about this one), but it's looking good so far!
Comment 54 Yves-Alexis Perez 2010-07-03 12:47:42 EDT
(In reply to comment #53)
> Holy Batman! Using 2.6.33.6-142.rc1.fc13.i686 from koji
> (http://koji.fedoraproject.org/koji/buildinfo?buildID=181106), I was able to
> hibernate 10+ times and everything still works. Even had a suspend thrown in
> the mix once, for good measure.
> 
> I'm not going to pronounce this fixed yet (because I'm really paranoid about
> this one), but it's looking good so far!    

Any idea which patch fixed that?
Comment 56 Fedora Update System 2010-07-07 03:21:04 EDT
kernel-2.6.33.6-147.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/kernel-2.6.33.6-147.fc13
Comment 57 Yves-Alexis Perez 2010-07-07 04:22:18 EDT
And has the been patch been submitted upstream?
Comment 58 Andrew Duggan 2010-07-07 07:33:58 EDT
(In reply to comment #57)
> And has the been patch been submitted upstream?    

Yes - 
From 0121d50088a9e04f3bbbee14043cd89164bdf4e6 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 2 Jul 2010 09:56:19 +1000
Subject: [PATCH] drm/i915: fix hibernation since 4bdadb9785696439c6e2b3efe34aa76df1149c83 '
Comment 59 Fedora Update System 2010-07-07 13:43:31 EDT
kernel-2.6.33.6-147.fc13 has been pushed to the Fedora 13 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update kernel'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/kernel-2.6.33.6-147.fc13
Comment 60 Bojan Smojver 2010-07-08 09:09:24 EDT
Hmm, I tested hibernate twice with -147 on my Dell Inspiron 6400 and that was OK. I'll test more.

What worries me is that my ThinkPad T510 with Intel graphics now hangs every time on thaw after hibernate. I am not sure if this is something to do with Intel graphics, but it's annoying nevertheless. Nothing in logs that would make it obvious. There was this iwlagn bug some time ago that used to do this kind of thing, but that was fixed, AFAIK...
Comment 61 Andrew Duggan 2010-07-08 10:15:11 EDT
I have a E1505 which is the same as the 6400, but am still running F12.  The patch in question, should be the fix for 2.6.32 and probably back to at least .30 since this bug is 14 months old.  

Nevertheless, I took the latest f12 kernel from koji as of the 3rd of July, rediffed the patch and applied and rebuilt the kernel.  Basically I see the same sort of hang every 2nd or 3rd to 4th time as well.  The segfaults are completely gone, but the end result is not really better.

Here is what I saw in ABRT 

BUG: unable to handle kernel NULL pointer dereference at 0000008c
IP: [<f7f85a9d>] drm_ioctl+0x2a3/0x2fa [drm]
*pdpt = 00000000359da001 *pde = 000000003da40067 
Oops: 0002 [#1] SMP 
last sysfs file: /sys/power/state
Modules linked in: aes_i586 aes_generic coretemp ipv6 cpufreq_ondemand acpi_cpufreq fuse dm_multipath uinput snd_hda_codec_idt arc4 snd_hda_intel ecb snd_hda_codec b43 snd_hwdep mac80211 b44 snd_seq cfg80211 ssb iTCO_wdt snd_seq_device iTCO_vendor_support i2c_i801 snd_pcm mii sdhci_pci sdhci snd_timer snd mmc_core dell_wmi wmi soundcore snd_page_alloc dell_laptop rfkill joydev dcdbas firewire_ohci firewire_core crc_itu_t i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: kvm]
Pid: 1539, comm: Xorg Not tainted (2.6.32.14-151.x.arnor.fc12.i686.PAE #1) MM061                           
EIP: 0060:[<f7f85a9d>] EFLAGS: 00010246 CPU: 1
EIP is at drm_ioctl+0x2a3/0x2fa [drm]
EAX: f6bfde78 EBX: f68ef000 ECX: 000000d2 EDX: ffffffff
ESI: 00000060 EDI: 00000000 EBP: f6bfdf08 ESP: f6bfde64
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process Xorg (pid: 1539, ti=f6bfc000 task=f6bb4c80 task.ti=f6bfc000)
Stack:
bfceba9c f8014b9b f803b690 40046460 f6bfde78 000000d2 f6960450 c19915e0
 00000163 f6bfdeb0 c042be51 f7097000 80000000 f6bfdea4 c0446302 e11c4ea0
 f6bfdec0 c04f3117 c1dd7d60 f0b21980 e11c4f50 80000000 c1dd7d60 f6bfdf14
Call Trace:
[<f8014b9b>] ? i915_gem_sw_finish_ioctl+0x0/0x7b [i915]
[<c042be51>] ? kmap_atomic_prot+0x10f/0x111
[<c0446302>] ? current_fs_time+0x1b/0x1e
[<c04f3117>] ? file_update_time+0x34/0xd2
[<c04c304e>] ? __do_fault+0x3ba/0x3f4
[<c058fb84>] ? file_has_perm+0x89/0xa3
[<f7f857fa>] ? drm_ioctl+0x0/0x2fa [drm]
[<c04ee52a>] ? vfs_ioctl+0x1d/0x76
[<c04eeac4>] ? do_vfs_ioctl+0x493/0x4d1
[<c058fe28>] ? selinux_file_ioctl+0x43/0x46
[<c04eeb48>] ? sys_ioctl+0x46/0x66
[<c040907b>] ? sysenter_do_call+0x12/0x28
Code: ff 3f 00 00 e8 5f 16 64 c8 85 c0 74 05 bf f2 ff ff ff 8d 85 70 ff ff ff 39 85 6c ff ff ff 74 0b 8b 85 6c ff ff ff e8 05 18 55 c8 <f0> ff 4e 2c 85 ff 74 1a 57 68 4f 31 f9 f7 68 14 22 f9 f7 68 07 
EIP: [<f7f85a9d>] drm_ioctl+0x2a3/0x2fa [drm] SS:ESP 0068:f6bfde64
CR2: 000000000000008c

The ONLY difference between the kernel I am running and 2.6.32.14-127.fc12.i686.PAE is the application of the patch below which is the fix for the segfaults on resume problem.

and my rpmbuild command line is:

rpmbuild -ba --without smp --without up --without kdump --without debug \
        --without debuginfo --with firmware --target i686 kernel.spec 

[So that it only builds the PAE kernel & firmware]

FWIW here is the patch rediffed for 2.6.32.14 (that way the fuzz is 0)

--- linux-2.6.32.i686-orig/drivers/gpu/drm/i915/i915_gem.c	2010-07-03 20:18:37.000000000 -0400
+++ linux-2.6.32.i686/drivers/gpu/drm/i915/i915_gem.c	2010-07-03 20:19:48.000000000 -0400
@@ -2244,7 +2244,7 @@ i915_gem_object_get_pages(struct drm_gem
 	mapping = inode->i_mapping;
 	for (i = 0; i < page_count; i++) {
 		page = read_cache_page_gfp(mapping, i,
-					   mapping_gfp_mask (mapping) |
+					   GFP_HIGHUSER |
 					   __GFP_COLD |
 					   gfpmask);
 		if (IS_ERR(page))

Since the latest f12 kernel 2.6.32.16-141 in Koji built yesterday didn't include this patch, I'm doing the same thing just to make sure, but while it builds, I thought I would pile on.... Of course, you can all right this off as just noise, but clearly there are still i915_gem bugs that don't interoperate with hibernate and resume, and it is likely they are in at least 2.6.33 and 2.6.32.  

Right now my thinking was is that this bug has been for the past 14 months been masking what we are now seeing.
Comment 62 Fedora Update System 2010-07-08 14:22:51 EDT
kernel-2.6.33.6-147.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 63 Bojan Smojver 2010-07-08 19:01:18 EDT
I just went through hibernate/thaw cycle a dozen times on my Dell Inspiron 6400 using 2.6.33.6-147.fc13.i686 kernel and I could not fault it. Each time I thawed, I started a different program in gnome, switched to text console and logged in there - it all worked. So, yeah, looks like this really is fixed for F-13 on my hardware.

Now back to chasing as to why ThinkPad hangs...
Comment 64 Bojan Smojver 2010-07-08 19:24:17 EDT
(In reply to comment #63)

> Now back to chasing as to why ThinkPad hangs...    

Just for the record, updated the rest of the system, which brought in new version of iwl6000-firmware. Did hibernate/thaw several times and no hang. Hmm, curious.
Comment 65 Bojan Smojver 2010-07-08 19:52:38 EDT
(In reply to comment #64)
 
> Just for the record, updated the rest of the system, which brought in new
> version of iwl6000-firmware. Did hibernate/thaw several times and no hang. Hmm,
> curious.

Actually, it is Intel graphics, just a different problem (it surfaces more readily when default /sys/power/image_size is used on my hardware). New bug #612757.
Comment 66 Bojan Smojver 2010-09-25 22:42:54 EDT
Although I don't run Fedora on this machine any more, I run it on another machine with Intel graphics. And it looks like hibernation still causes trouble, just of a slightly different kind, see bug #635868. Looks like we've been celebrating too early :-(
Comment 67 Andrew Duggan 2010-09-25 23:22:08 EDT
I can say that on my  E1505, running 2.6.34.7-56.fc13.i686.PAE fully updated Fedora 13, it is completely fixed. As it turns out there seems to be another bug or bugs in the pm handling of other modules.  I had to add the following to
/etc/pm/config.d/50-modules SUSPEND_MODULES

dell_wmi wmi dell_laptop dcdbas iTCO_wdt

FWIW, I've had these in the list dating back from FC8 

b43 ohci_hcd kvm kvm-intel

While the dell specific modules are not applicable to your Thinkpad, and that new bug does seem to be quite different, the DELL HW I can say still works for me.   The bigger problem remains, is there any laptop HW that is currently on the market that F13/14 will hibernate/thaw?
Comment 68 Bojan Smojver 2010-09-25 23:30:44 EDT
(In reply to comment #67)
> The bigger problem remains, is there any laptop HW that is currently on
> the market that F13/14 will hibernate/thaw?

My ThinkPad T510 was OK for a while with nothing blacklisted. Then it got borked again. I've seen some talk on LKML and recent -rc kernels about more fixes coming, but I cannot be sure.

I'll wait, I guess...