Bug 144415
Summary: | kernel-2.6.9-1.724_FC3 breaks APM suspend on Thinkpad | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Matthew Saltzman <mjs> |
Component: | kernel | Assignee: | Dave Jones <davej> |
Status: | CLOSED CANTFIX | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3 | CC: | andrejoh, balay, barryn, david, ed, edgar.hoch, egcp, emmanuel.druon, gneeki, imc, jhmail, joe.christy, lance, nayfield, p.dalgaard, pfrields, roessler, tvfischer, typrase, wtogami |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-09-30 10:34:44 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Matthew Saltzman
2005-01-06 20:15:16 UTC
I've encountered this issue with kernel-2.6.10-1.1063_FC4 from rawhide (on FC3/T40) I'm seeing *exactly* the same problem (including the flashing caps-lock) on my ThinkPad A22p (2629-Y1U). It happens frequently, but not on every suspend-resume cycle. I suspect that the bug was introduced between 2.6.9-1.681_FC3 and 2.6.9-1.724_FC3 because the latter is the first kernel that triggered the problem. I see the same effect on my workstation; .681 works fine, .724 hangs on resume, and the lights on the keyboard just blink. (HW: Athlon Thunderbird on an ABit KT7A motherboard.) kernel-2.6.10-1.735_FC3 is still broken (on T40) This bug is a major problem also for people using SWSUSP2. There ACPI suspend to disk has the same problem on resume. This is with vanilla 2.6.10 and swsusp2 patch. Seems to be dominantly thinkpads, I have T42 but it is affecting others too. Re: comment #5 Try 2.6.10 with the following two patches: http://www.ussg.iu.edu/hypermail/linux/kernel/0501.0/0284.html http://www.ussg.iu.edu/hypermail/linux/kernel/0501.0/0548.html (or you can run 2.6.10-mm2 instead of 2.6.10 + the two patches) I have no idea if these patches are compatible with swsusp2, so you may need to test without swsusp2 if that's possible. So far, things seem to be working with latest release kernel-2.6.10-1.737_FC3. All my crashes happened with overnight suspend. I've had 736 crash on me - and it doesn't look very different than 737 (+ debugging) Will try 737 tonight and see what happens.. Ok - 737 also crashes.. Not sure how 'overnight' suspend is different from 'short' suspend of minutes/1hour. There is a note about the bios setting "Hibernate after suspend expires -> Enabled" causing this problem - but I don't have this enabled. So something else (which must be similar) is causing this problem. http://mailman.linux-thinkpad.org/pipermail/linux-thinkpad/2005-January/022832.html It's true, kernel-2.6.10-1.737_FC3 still hangs on resume (this time after about 3 hours in suspension). Interestingly, the caps lock LED doesn't flash now. Probably to nobody's surprise, kernel-2.6.10-1.1076_FC4 also fails as above. I tried using 2.6.10-ac8 - and it survived a 6 hour suspend. I guess I'll have to wait and see if it can survive a few more long suspends before assuming this version is free of this bug.. Basically I grabbed http://kernel.org/pub/linux/kernel/people/alan/linux-2.6/2.6.9/RPM/SRPMS.kernel/kernel-2.6.10-0.ac7.1.src.rpm and rebuilt it with 'ac-8' patch. ok - 2.6.10-ac8 survived another 6hour suspend. Just curious, is there anything in the comments for -ac8 that indicate they fixed something related to this? Or any other differences between -ac* and the -bk* series they seem to use for FC? I just realized I was using ac7 not ac8. (error building kernel rpm on my part) And kernel-2.6.10-1.737_FC3 is supporsed to be 2.6.10-ac8 + additional patches - so I'm guessing the bug is in some of the fedora specific patches in 737. (I'm assuming differences in ac7 vs ac8 doesn't affect this bug) Created attachment 109680 [details]
Oops - init
Can confirm that kernel-2.6.10-1.737_FC3 with swsusp2 2.1.5.12 displays a similar problem on my Thinkpad T40 - Oops on init, capslock flashing, system still functional but init taking >90% cpu and unresponsive to telinit, reboot etc. Have tried to apply the patches at comment #6, which I can get to apply cleanly with some line number changes, but they make no difference - same symptoms. I know swsusp2 isn't part of FC3 - but oops attached (above! oops indeed) if of help to anyone. Would you mind trying again without swsusp2, with just the kernel's built-in swsusp1 + the comment #6 patches? I'm not suggesting this as a permanent change, just as another test to gather more information about the problem. OK, I've done so - complete recompile from SRPM, without and with the patches above. Note that I haven't tried swsusp1 with 2.6.10, as I've been happily using swsusp2 - both with and without patches worked fine for me for suspend to ram and disk on a Thinkpad T40. Both messed up X on resume, which I can't recall how to fix in swsusp1 - been using swsusp2 for too long. I'll attach versions of the two patches above reformatted and line-numbered for 2.6.10 plain kernel, if they're helpful to someone here. Still working on swsusp2 and my oops above - my init-in-a-spin problem may be different (or only vaguely related) to the main problem here. Created attachment 109723 [details] patch1 for 2.6.10 - see comment #6 Created attachment 109724 [details] patch2 for 2.6.10 - see comment #6 Problem persists with kernel-2.6.10-1.741_FC3, even though that is supposed to include -ac9 patches. A one-hour suspend produced a hard freeze on resume. Re: comment #22 It's always possible that the problem is being introduced by one of the FC3 patches... *** Bug 145203 has been marked as a duplicate of this bug. *** Just an update.. I haven't seen this probllem yet since moving to the 'ac' kernel builds. Currently using ac9. (Earlier I've used ac7, ac8) Problem is not limited to the Thinkpad. I have a Dell Latitude C840 and I have identical problem. Kernel 2.6.10-1.741_FC3 still has the problem with this laptop too. Confirmed. 741_FC3 locks up my T41 Thinkpad. I used to use 2.2.x on a ThinkPad 380 and APM worked perfectly. Since I upgraded to a ThinkPad T22 with Fedora Core 2 I've had only limited success (see for example bug 13095). ACPI seemed a non-starter when I first tried it, so I don't know whether it has improved since then. On upgrading from 2.6.9-1.6_FC2 to 2.6.10-1.9_FC2 it appeared that most of the issues had been resolved as the machine did a number of textbook suspend/resume cycles in situations that hadn't worked before. However, I then discovered that suspending for more than a few minutes reliably kills the machine. There are two possible results, which seem to occur at random: in one the machine seems completely dead although SysRq can be used to reboot it; in the other the caps-lock and scroll-lock LEDs flash and if I jiggle the SysRq key enough I can get it to print stuff on the screen - though none of the actions work except reboot. I tried booting with serial console enabled and logging what came through on another machine, but I didn't get anything at all after the machine was suspended. Anyway, to get to the actual point of this comment, I did copy down the results of SysRq's "show PC" function in the hope that it would be useful. The appearance of RTC functions in the traceback does seem to tally with the idea that it's dependent on how long the machine was suspended for. I hope there aren't too many transcription errors in the output, which I'll attach in just a moment. Created attachment 110004 [details] "Show PC" output after resume has crashed the machine See comment 28. Just adding a "me too" for my T40 (2373-94U). Short APM suspends recover fine, but longer ones lead to a kernel panic. Just an extra data point. If I switch to a VC before I do a long suspend, when I resume, I get a kernel panic on the screen. This is some of what I see: ==================== Warning: CPU frequency is 16000000, cpufreq assumed 600000 kHz. Kernel panic - not syncing: arch/i386/kernel/time.c:178: spin_lock(arch/i386/kernel/time.c:c0342be8) already locked by arch/i386/kernel/time.c/310 Badness in panic at kernel/panic.c:117 [what looks like a stack trace] ==================== This is a 1.6Mhz Pentium M with SpeedStep enabled, but it went from a AC connection to another AC connection, so it shouldn't have down-shifted to 600Mhz. Then again, that warning might just be a red herring. Created attachment 110348 [details] text of kernel panic on resume from APM suspend I'm duplicating comment 31 but without the initial cpufreq complaint. The attached is the entire text of the kernel panic which appears on screen after resuming the machine. In this case the caps and scroll lock lights were not flashing (and I have another failure almost exactly the same but with some minor differences at the bottom of the trace). Thanks to comment 31 I've been able to switch to VT-1 before suspned (This is with 2.6.10-1.753_FC3 on a thinkpad 600E - 366MHz P-II). In my case - the caps & scroll-lock lights blink. I get the same stack trace as the attachment in comment 32.. Something like (written down manually): [<c0112dc0>] suspend+0x3c6/0x513 do_ioctl recalc_task_prio sys_ioctl syscall_call Badness in i8042_panic_blink at drivers/input/serio/i8042.c:917 i8042_panic_blink panic set_rt_mmss timer_interrupt handle_IRQ_event __do_IRQ do_IRQ ============================================= common_interrupt get_cmos_time timer_resume sysdev_resume device_power [few more lines which I cou'dn't write down -as the screen went blank] My T41 survived a 2-hour suspend using kernel-2.6.10-1.1115_FC4 from Rawhide. Unfortunately, that kernel's built with gcc-3.4 so I can't build VMware modules against it. And it has too many Rawhide dependencies for me to rebuild myself. But it still does *not* work with kernel-2.6.10-1.760_FC3, even though it is rebased to 2.6.10-ac11. So still no completely functional kernel on my Thinkpad. Based on what I'm reading here, I suspect the problem is being triggered by one of the kernel-2.6.10-1.xxx_FC3 patches, aside from the -ac patch. Unfortunately I haven't been able to reproduce this on any of my hardware (I don't have an IBM ThinkPad), otherwise I would be able to narrow things down more. Anything else we can do to help test, let us know. BTW, the devel kernel-2.6.10-1.1115_FC4 seems to work fine WRT suspend-to-RAM, although it's not a complete solution for other reasons. Here's a strange thing... my 2.6.9-1.6_FC2 kernel is now printing spin_lock messages in the syslog. My message log begins on Jan 2. I have seven successful overnight suspends, then on Jan 12: Jan 12 00:18:03 starbright apmd[1481]: System Suspend Jan 12 08:06:34 starbright kernel: arch/i386/kernel/time.c:178: spin_lock(arch/i386/kernel/time.c:0235b028) already locked by arch/i386/kernel/time.c/310 Jan 12 08:06:34 starbright kernel: arch/i386/kernel/time.c:317: spin_unlock(arch/i386/kernel/time.c:0235b028) not locked This happens every time except three since then, with the latest one this morning. I've no idea what's changed (if anything) or whether the messages were happening before then (as the logs have been rotated out of existence), but I've been running this kernel since Nov 24, except for mid-January when I tried 2.6.10-1.9_FC2. The difference with 2.6.10 is linux-2.6.9-spinlock-debug-panic.patch which means that instead of this message we get a kernel panic, as documented at length in this bug report. What this means is the underlying bug isn't new in late-2.6.9 and 2.6.10 kernels; only the panic is new. (And this morning for the first time since installing the system last May, I was hit by bug 142329 - grr!) I am not a kernel hacker, but I would think that if it's possible to block the timer interrupt while executing timer_resume() then that would fix the problem. [Now that my T40 is back from IBM-repair] I've tried kernel-2.6.10-1.760_FC3 (rebuilt with CONFIG_X86_HZ=100) & kernel-2.6.10-0.ac11 The experience is similar to comment 38. kernel-2.6.10-0.ac11 gives the following on APM resume Feb 4 09:02:07 asterix kernel: arch/i386/kernel/time.c:178: spin_lock(arch/i386/kernel/time.c:c03bebe8) already locked by arch/i386/kernel/time.c/310 Feb 4 09:02:07 asterix kernel: arch/i386/kernel/time.c:317: spin_unlock(arch/i386/kernel/time.c:c03bebe8) not locked [root@asterix ~]# grep spin_lock /var/log/messages* | wc -l 42 kernel-2.6.10-1.760_FC3 gives: recall_task_prio sys_ioctl syscall_call Badness in i8042_panic_blink i8042_panic_blink panic set_rtc_mmss timer_interrupt handle_IRQ_event __do_IRQ =================================== common_interrupt get_cmos_time cpufreq_cpu_put cpufreq_resume timer_resume sysdev_resume device_power_up suspend do_ioctl recalc_task_prio sys_ioctl syscall_call Re: Comment #36: Note that it's not just Thinkpads. #146457 looks like a dup of this, and I've seen at least two reports of problems with Dell Latitude C840s (one in comment #26 above). DaveJ: please remove linux-2.6.9-spinlock-debug-panic.patch from future kernels! If not, maybe someone could update the patch so it is possible to turn off this feature with a kernel parameter? At http://people.redhat.com/davej/patchlist-fc3.txt the patch is descibed like this: "panic() instantly instead of printing a warning when spinlock debugging is triggered. This reduces the possibility of silent data corruption." What "silent data corruption" is this? Is it related to the annoying bug 142329? And -- maybe most important -- why is this "spinlock debugging" triggered at all? Re: comment #41 According to comment #37, a recent rawhide kernel isn't showing this problem -- and I looked at that kernel's specfile a few days ago; it still seems to be applying the spinlock debug panic patch. In my next comment I'll post quick instructions for recompiling a rawhide/FC4 kernel for FC3. Right now I don't have a convenient place for posting compiled kernel binaries, so my instructions will have to do. > What "silent data corruption" is this? Depends on what causes the panic... Created attachment 110750 [details] patch that converts FC4 kernel specfile for FC3 recompile I decided it would be easier to do it as a patch than to write out instructions for changing the specfile. Basically: rpm -ivh kernel-2.6.XX-1.YYYY_FC4.src.rpm cd /usr/src/redhat/SPECS (or wherever, if you've changed your RPM configuration) patch -p0 -i /path/to/kernel-spec-fc4-to-fc3.patch (i.e. this attachment) rpmbuild -ba --target i686 kernel-2.6.spec If/when bug 147281 is fixed, this patch will no longer be necessary (and will probably no longer apply to the specfile either). Anyway, this should let people recompile the FC4 kernels for FC3, without dependency or gcc version problems. That way, other people can test and see if the FC4 kernels really fix this bug. I've tried 2.6.10-1.1126 on the T40 [with a couple of mods: 1000Hz -> 100Hz, 2.6.11-rc3-bk2 -> 2.6.11-rc3-bk4] After an overnight suspend - I get a crash [capslock blink]. On VT-1 - The stack scrolls by - and I see repated prints of the form: atkbd.c: Spurious %s on %s. Some program,like XFree86, might be trying access hardware directly. I had to powercycle the machine. Kernel 2.6.10-1.1126 compiled for FC3 i686 as per comment 43: http://users.comlab.ox.ac.uk/ian.collier/linuxkernel/ Obviously, since I haven't signed it, you use it at your own risk. I installed it on my ThinkPad T22 last night [it's an FC2 system so the post-install script failed - easily fixed by running new-kernel-pkg manually]. Then suspended for 8 hours and got a successful wakeup this morning with no untoward messages at all. The machine was on a text console at the time of the suspend (though a gdm login screen was present on another VT) and my boot command line contains "atkbd.reset" in case it matters. I installed Ian Collier's 2.6.10-1.1126 RPM on a Dell Latitude C840 and still apmsleep does not work. I gave it apmsleep +1:00 and it went to sleep all right. However, it did not wake up. On pressing the power button, the screen went on to its dull mode, and nothing else happened. I did not set atkbd.reset=1, though I will try that. atkbd.reset=1 makes no difference.... OK, here is something more. If I go to a text console (Ctrl-Alt-F1-F2) the system does wake up, but with the comment that: CPU frequency is 2400000Hz, but cpufreq is assumed/set at 1200000Hz. However, Ctrl-Alt-F1-F7) to get back to the X screen reverts it back to the dull state, so pretty useless IMO. I built kernel-2.6.10-1.1137_FC3 using Barry's spec file. (What are kernel-xen0 and kernel-xenU?) It survived an overnight suspend with my T41. I have two (unrelated) issues with it, though. (1) rhgb hangs occasionally (radeon driver, 7500 Mobility, only change from default config is I've turned the DynamicClocks option on). (2) vmware modules won't build for this kernel, even in its FC3 form. This makes it not a solution for me, as I need vmware. (I haven't investigated vmware patches yet, though.) Re: Comment #48: I see that sort of message when resuming from ACPI suspends. It seems not to be a problem there. It's related to SpeedStep. Also, does it help to do the suspend/resume from a VC and then switch to X instead of suspending directly from X? Re Comment 49: kernel-xenO, kernel-xenU - they are releated to xen virtual machine [similar to vmware] - an upcoming feature for FC4. I just disable these two variables when building kernel for FC3 vmware modules: did you install 'kernel-devel' package? ------------------- I've built 1141 kernel [aka 2.6.11-rc4-bk1] with the following changes] - 1000Hz -> 100Hz - Disable linux-2.6.9-spinlock-debug-panic.patch It survived an overnight suspend - without any spinlock messages. Re: Comment #50: Interesting. I'll be interested to see how xen works. Meanwhile, disabling should cut kernel build time a bit 8^). Yes, I did install kernel-devel. (Oh, dear. Yet another new model for getting kernel buil;d environments. That will just thrill all the fedora-list denizens who are finally just getting used to the FC3 model...) The issue is some undefined symbols and failure of the module to insmod. I suppose I ought to file a separate bug for that, though. It's a bit off-topic here. Just another data point: 2.6.10-1.1126_FC3 (cf comment #45) has been running OK for me on a Toshiba Portege 3440CT for a couple of days now, surviving several suspend/resume events. This is considerably better than any of the update kernels since 2.6.9-1.681_FC3. I did see the effect of comment #45 when attempting a shutdown at one point, though. FWIW: kernel /vmlinuz-2.6.10-1.1126_FC3 ro root=/dev/VolGroup00/LogVol00 apic=no acpi=off rhgb quiet (without acpi=off, the resume problem disappears - the system refuses to enter suspend mode ;-) ) Re: Comment #49: vmware-any-any-update89 fixes the vmware-config.pl issues. So far, no further hangs (fingers crossed). Re: Comment #49, I tried apmsleep from VC and then it went to sleep and wake up all right, but when I switched to X, it never went to X, but the dull screen I mentioned. May be I should kill X completely? I experience pretty much the same problems on Toshiba Tecra8100, including caps lock blinking and that a longer suspend is necessary to trigger the bug, so it's not limited to ThinkPads. I think I've hit the problem described in comment 46 with my rebuilt rawhide kernels [2.6.11\-rc4-bk6, bk8]. On recovery the screen is blank - and nothing works [except Fn-F3 - which makes the screen c\ompletely dark] There were no blinking caps-lock or num-lock - but none of the following worked: Fn-F4, Alt-\Ctl-Backspace, Alt-Ctl-F1/Alt-Ctl-Del I had to powercycle to reboot. I'll start suspending in VT-1 to see if there is a trace. Cur\rently there is none - in /var/log/messages. [but then - I disabled the flags DEBUG_SLAB, DE\BUG_BUGVERBOSE, DEBUG_PAGEALLOC, DEBUG_HIGHMEM] Feb 21 00:11:56 asterix apmd[3263]: System Suspend < I guess reboot at this point> Feb 21 08:51:42 asterix syslogd 1.4.1: restart. Feb 21 08:51:42 asterix syslog: syslogd startup succeeded Feb 21 08:51:42 asterix syslog: klogd startup succeeded Tried again [ref comment 56] - this time suspending in VT-1. Its now same problem as comment 44. The stack-trace scrolls by so fast that I don't know if its the same spinlock issue or some new problem [the atkbd.c messages - which causes the scroll is new] I'll disable the linux-2.6.9-spinlock-debug-panic.patch and try again. Just a followup to comment 57 - I had an uptime of a week with the kernel I rebuilt without linux-2.6.9-spinlock-debug-panic.patch There were 3 variables I changed here - - disable linux-2.6.9-spinlock-debug-panic.patch - update to 2.6.11-rc4-bk9 [from bk8] - unload uhci_hcd/ehci_hcd after a couple of days. I strongly suspect linux-2.6.9-spinlock-debug-panic.patch causing the initial problem - but I'm not sure.. Today I've rebuilt by upping to 2.6.11 [with 1154 kernel from rawhile] and all the mods metinoed in comment 56 & 57 Hoping the good uptime won't be affected. all that removing the spinlock-debug-panic should do is make the spinlock bug non-fatal. Ie, you should still find a message in dmesg output/logs saying something bad happened. Continuing to run after such a situation, is very risky.
I didn't see any spin_lock messages after disabling the
linux-2.6.9-spinlock-debug-panic.patch [hence my hesitation about assuming this
to be the problem]
And the diff form bk8 to bk9 shows:
[jantu@asterix tmp]$ diff patch-2.6.11-rc4-bk8 patch-2.6.11-rc4-bk9 |grep diff
> diff -Nru a/arch/i386/kernel/setup.c b/arch/i386/kernel/setup.c
> diff -Nru a/drivers/ide/Kconfig b/drivers/ide/Kconfig
> diff -Nru a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
> diff -Nru a/drivers/ide/ide.c b/drivers/ide/ide.c
> diff -Nru a/drivers/net/natsemi.c b/drivers/net/natsemi.c
> diff -Nru a/drivers/net/s2io.c b/drivers/net/s2io.c
> diff -Nru a/drivers/net/wireless/strip.c b/drivers/net/wireless/strip.c
Maybe its the ide stuff?
Since the previous formula worked - I'm a bit hesitant to change the variables -
and try them all - one at a time [and see when things break]
*** Bug 146457 has been marked as a duplicate of this bug. *** Interesting note: for me at least, disabling NTP (system-config-date -> Network Time Protocol) appears to prevent my laptop (Tecra M2) from freezing after APM resume, at least the first time I tried it. If this can be confirmed by someone else, consider it one workaround for the bug. I suspect that this is because do_timer_interrupt() (in arch/i386/kernel/time.c) calls the apparently problematic set_rtc_mmss() only in case STA_UNSYNC if off (i.e. sync on, acc. to timex.h), which I am guessing is true only for NTP users (?). See bug #146457 for a few other notes, possibly of interest to someone debugging this. I have symptoms similar to comments #29-33 in this bug. More info: the stack traces seem to say that get_cmos_time() is getting an interrupt somewhere inside its body while holding a lock, presumably really in the inline mach_get_cmos_time() (why? is this predictable?), and timer_interrupt() is handling that by calling set_rtc_mmss(). But both get_cmos_time() and set_rtc_mmss() use the same spinlock, so of course it results in an error. See Linus' explanation (ss. 3) in Documentation/spinlocks.txt: ... IFF you know that the spinlocks are never used in interrupt handlers, you can use the non-irq versions: spin_lock(&lock); ... spin_unlock(&lock); Note the caveat. But this spinlock *is* being used in an interrupt handler. An oversight? Would using spin_lock_irqsave() + spin_unlock_irqrestore() in get_cmos_time() help? Or should get_cmos_time() just temporarily set some static flag which disables the attempted call to set_rtc_mmss() in do_timer_interrupt() until it is done with the lock - since the synching from software clock to CMOS in this case is happening only after a resume, in which case it is presumably useless because we just recently loaded the SW clock from CMOS? Created attachment 111719 [details]
*UNTRIED* patch for sake of experimentation
My thinking is along the lines of the attached patch (have not tried it yet).
It might succeed in preventing unwanted NTP-triggered synch to CMOS during an
APM resume. Note that this use of spinlocks is not 100% safe since it is still
possible for a timer interrupt to come between the call to spin_lock() and
setting the flag, or between unsetting the flag and spin_unlock(); but I am
guessing that would be far less likely than hitting a timer interrupt while
inside mach_get_cmos_time(). Not sure what a completely safe version would look
like but I guess it would have to use IRQ blocking of some sort.
Based on comments 62--64 which suggest ntpd as a culprit, I've run: /etc/init.d/ntpd stop ; /etc/init.d/pcmcia stop apm -s and have gone through three successful suspend/resume cycles with 2.6.10-1.770_FC3. All suspends lasted less than two hours, but this is still an improvement! Thanks! I'll try a longer suspend tonight. With ntpd off, a 3+-hr suspend resulted in yet another lock-up (w/ blinking shift-lock light) on a TP A22p running 2.6.10-1.770_FC3. I have been experiencing the problem with a Dell Latitude c610 also for awhile now (Sorry been very busy to participate in this discussion). I tried Ed's recommendation of turning off ntpd but it does not work for me (#65). So I guess my next question is should I try the Fc4 kernel or should I look at something else (like the spinlock-debug-panic.patch)? Had a blinking-caps-lock incident with the 2.6.10-1.1126 kernel the day before yesterday. This was after several suspend/resume cycles per day for almost four weeks. So it seems that it hasn't cured the problem but certainly reduced the frequency with which it appears. Hi, I experience the same problem once switched from FC2 to FC3 on Compaq Evo N600c notebook. Well, solution to my APM problem was to install vanilla 2.6.10 and APM again start working like a charm. I'm 100% sure that source of APM problem relies somewhere in FC3 patches to the kernel. It would be nice if people confirm that vanilla kernel doesn't have APM problem. My experience so far: - with fedora kernels, [currently using modified 2.6.11-1.14_FC3] disabling spinlock-debug-panic.patch avoids the crash&burn senario [replaced with friendly messages in /var/log/messages]. If this happens one can reboot at a convinent time. - disabling ntpd appears to get rid of the messages in /var/log/messages. I'm guessing my earlier success report was perhaps because ntpd couldn't start [due to a disabled network at boot] - and it remained disabled. - I've briefly tried kernel 2.6.11.6 without fedora patches - it survived one overnight suspend. I've had shutdown prblems with 2.6.11.6 - a hang at shutting down 'iptables'. So I switched to 2.6.11*FC3 kernels. Then I had this shudtown issue with one of the modified 2.6.11*FC3 kernels as well. [This is not always reproduceable. When I manually try 'service iptables restart' it always works] Created attachment 113161 [details] Show PC output from crash in 2.6.10-1.1126 I echo comment 68. I installed 2.6.10-1.1126 towards the beginning of February and had several successful overnight suspend/resume cycles until suddenly it crashed on the 27th. Attachment shows the text which appeared when I pressed SysRq+P, though it was difficult to catch because it was spewing the atkbd message (at the bottom) about once every second. Then it didn't crash again until March 27th (with the same traceback). Created attachment 113163 [details]
Replacement spinlock-debug-panic patch
...However, I seem to have an obscure hardware fault (memory problem?) which
makes my machine suddenly crash for no apparent reason maybe about once a month
(it's getting worse, though :-(). That's only relevant because it made me
reboot my machine last week - but since then, 2.6.10-1.1126 has panicked every
time I've tried an overnight suspend.
Which is why I now intend to run the standard (currently 2.6.10-11_FC2) kernel
with the spinlock-debug patch replaced by the attached. It's a horrible hack
which makes the spinlock error cause a panic *except* when it happens during
i386/kernel/time.c. So now I do sometimes get a spew of messages in the syslog
when I resume, but it doesn't crash any more. (On the other hand, it does
sometimes set the clock to a stupid value.)
<with my modified 2.6.11-1.14_FC3 kernel comment #70> I've noticed the following in my /var/log/messages [happened during an APM recovery-from-suspend] Apr 18 08:20:45 asterix kernel: drivers/block/cfq-iosched.c:1065: spin_is_locked on uninitialized spinlock f7bb481c. [and a stack trace with a taint flag for madwifi] Perhaps this one is a completely unrelated issue... Created attachment 113573 [details]
spinlock trace from /var/log/messages
I added the patch from comment 64 to my kernel (2.6.10-11_FC2 with the patch from comment 72) and for some reason it made my system clock run at double speed! An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which may contain a fix for your problem. Please update to this new kernel, and report whether or not it fixes your problem. If you have updated to Fedora Core 4 since this bug was opened, and the problem still occurs with the latest updates for that release, please change the version field of this bug to 'fc4'. Thank you. Since disabling 'ntpd' - I don't remember having a single crash.. Now I've enabled ntpd with 2.6.12-1.1372_FC3 - and it survived a 8 hour APM suspend. [which is a good sign] Will keep monitoring - and report if i see a crash. Seems to be working better on my workstation as well; it has survived two overnight suspends so far. kernel-2.6.12-1.1372_FC3 solved the problem on my Dell laptop, see comment #26 above. However, the latest kernel, kernel-2.6.12-1.1376_FC3 breaks it again. The laptop does not recover from sleep. It didn't even survived a 2 min APM suspend. Did it happen on the Thinkpad too? kernel-2.6.12-1.1372_FC3 solved the problem on my Dell laptop, see comment #26 above. However, the latest kernel, kernel-2.6.12-1.1376_FC3 breaks it again. The laptop does not recover from sleep. It didn't even survived a 2 min APM suspend. Did it happen on the Thinkpad too? kernel-2.6.12-1.1372_FC3 solved the problem on my Dell laptop, see comment #26 above. However, the latest kernel, kernel-2.6.12-1.1376_FC3 breaks it again. The laptop does not recover from sleep. It didn't even survived a 2 min APM suspend. Did it happen the same way for the Thinkpad? Just a couple of data/no-data points: - my ThinkPad A22p was sold so I no longer have it for testing purposes - the replacement ThinkPad T42p (2373-HTU) has suspended very reliably (using ACPI with the kernel options "pci=noacpi acpi_sleep=s3_bios") with all FC4 kernel updates including 2.6.12-1.1447_FC4 I'll admit that, once I had a solution to the Radeon hot-suspend (bug #142928), I switched to ACPI. I've had no problems since on my T41 (2373-JHU)--don't even need the kernel options in comment #82. Note that kernel-2.6.12-1.1447_FC4 doesn't do ACPI suspend properly in my case (bug #165819), but 2.6.12-1398_FC4 works fine. I'm still using APM with FC3 on my T40. Currently using 2.6.12-1.1378_FC3 - no crashes yet. I've been using 2.6.12-1.1376 backported to FC2 for just under a week with no crashes on a ThinkPad T22. I'm still using the patch from comment 72, but haven't seen any spinlock messages since August 23, when I was running 2.6.11-1.14. (I never said, but comment 75 was a false alarm - apparently my clock always goes at double speed when I boot on battery power and then connect the AC adaptor.) there are a number of different problems reported here, across a variety of kernels including a bunch of self compiled ones with add-ons, involving various features nothing to do with apm. if you're still having apm problems with the current fc3 errata, please file a new bug, as this one has become far too cluttered to make any coherent analysis upon. |