Description of problem: In FC5+6, both suspend to RAM and suspend to disk worked fine on my X41, with its i915GM. Now in F7, the laptop will suspend, but on resume it hangs after a few seconds of X activity. The display freezes, I can't change to a VT, the network appears to be dead... etc. If X is not running at all when I suspend, I can sometimes resume correctly. I haven't seen anything in /var/log/messages when this happens. Version-Release number of selected component (if applicable): kernel-2.6.20-1.3088.fc7.i686 pm-utils-0.99.3-1.fc7.i386 xorg-x11-drv-i810-1.6.5-19.fc7.i386 How reproducible: Always Additional info: When I said "the laptop will suspend", this includes tapping various keys on the keyboard to get the new tickless kernel to wake up. Should I file another bug on this? Hibernate sometimes works; it will hang while going down, but if I start tapping keys sometimes it will finish hibernating. Other times it won't, and I have to cut power manually.
Created attachment 153351 [details] tail -f /var/log/{messages,pm-suspend.log,Xorg.0.log}
Created attachment 153455 [details] dmesg output of boot sequence with apic=debug When running kernel-2.6.20-1.3110.fc7.i686, booting with 'noapic' seems to be a workaround. Attaching apic=debug dmesg output from boot.
Note: everything above is the result of testing with either no quirks, or just s3_bios. This is different from the default in hal-info, which is to use both vbe_post and vbestate_restore (I believe the default is wrong, and so does thinkwiki.org, but that's another issue).
Created attachment 153534 [details] dmesg containing noapic suspend/resume and soft lockup I tested suspend/resume with kernel-2.6.20-1.3111.fc7.i686 and noapic, using the vbe quirks which are default. During general use there were periods where my load average was over 6 with an 85% idle CPU and no disk or network activity. I don't know if this is a known side-effect of noapic, but it was weird. Sometimes these slowdowns even locked up the entire system temporarily. Attached is the dmesg output from that testing, which includes two "BUG: soft lockup detected on CPU#0!" traces at the bottom. A few minutes after the "soft lockup" messages showed up in dmesg, the entire system hung, much like it does a few seconds after resume without noapic, only this was a good hour or so after resume. Further up in that log you can see some "<device> LATE suspend" and "<device> EARLY resume" messages; I don't know the significance of those.
Created attachment 153535 [details] xorg log from X crash During the same boot cycle described in the previous comment, X crashed maybe 15 minutes after resuming. Here is the log from that session. I doubt if it's useful, but take a look if you want.
I realize the last two comments might be confusing. Here is what happened in chronological order: 1. boot 3111 with noapic 2. login and suspend with VBE quirks (and not S3 BIOS) 3. resume 4. notice periodic, unexplained slowdowns and load jumps (may have started before suspend/resume, not sure 5. maybe 15 minutes later, X crashes. gdm pops up. 6. log back in 7. about an hour (and lots of the strange slowdowns) later, soft lock occurs. 8. machine recovers, and I save dmesg output 9. immediately go to bugzilla to add this information 10. machine hardlocks (similar previous lockups on resume) 11. reboot, login, return to bugzilla :)
Ajax thought this might be a drm problem, so I disabled it in xorg.conf. With drm disabled, even booting into runlevel 1, suspending/resuming, *then* starting X causes the system hang.
Well, I think I've got it now. Booting without noapic but with nohz=off seems to *really* work around the problem.
Hmm, some of the dmesg outputs are really confusing Time: tsc clocksource has been installed. pnp: 00:00: iomem range 0x0-0x9ffff could not be reserved <SNIP> checking if image is initramfs... it is Switched to high resolution mode on CPU 0 Usually the switch happens right after the clocksource install. Can you please boot with nohz=off highres=off apic=verbose on the commandline and do # cat /proc/interrupts; sleep 10; cat /proc/interrupts and provide the output ?
Created attachment 153631 [details] /proc/interrupts on a 10 second interval with nohz=off highres=off apic=verbose
Looks sane. When the hang happens, is the machine still responding to SysRq ? If yes, the output of sysrq-t and sysrq-q would be probably helpful.
It's hard to say if it responds; this only happens when I'm on X's VT. How would I get that output saved? My laptop has no serial port. (and I've never done the serial console thing before)
Dave, how close is kernel-2.6.20-1.3088.fc7.i686 to the hrtimer/dyntick code in 2.6.21 ?
Thomas, this is still a problem with kernel-2.6.21-1.3116.fc7.i686 which is 2.6.21 final according to its changelog. I guess there could theoretically be patches to that code in our packages, though.
Zack, are you still tapping keys to get the box out of suspend / hibernate or did this change at least after the update to 2.6.21 ?
From RAM, no. I'll test disk later today since I can't remember.
Ok, can you please add "nolapic_timer" to the command line and try again ? If my suspicion is correct, then your box should freeze right after resume.
OK, so this seems a little odd. I'd been booting with nohz=off to work around this problem, and it's been working fine. I rebooted with nolapic_timer (and without nohz=off) and this appears to work around the problem also. I'm so far not seeing the strangeness that I'd seen with noapic either. No freezes yet, and it's been over 20 minutes since I resumed.
Hmm, I'm getting more confused. That's quite the contrary to the behaviour which I expected. Before you switched to 2.6.20 (+hres/dyntick) was it necessary to have noapic on the command line ? If yes, then it looks more like an apic problem, but the confusing thing is why does this only hurt after resume.
I don't recall ever having to use noapic before. The last kernels I used before 2.6.20 were the FC6 kernels. Also, to be clear, noapic didn't seem to work around this problem properly, whereas nolapic_timer does.
Looks like this got fixed somehow, and replaced with a different regression.