Bug 117032
Summary: | 4G/4G problems | S3 suspend works but resume reboots instantly | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Sergey V. Udaltsov <sergey_udaltsov> | ||||||||
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> | ||||||||
Status: | CLOSED RAWHIDE | QA Contact: | |||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | rawhide | ||||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | i686 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2004-04-16 09:11:45 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Sergey V. Udaltsov
2004-02-27 16:49:02 UTC
can you access the system from the network after resume? (ie. is just the video dead, or is the system totally unaccessible?) how about if you "init 3" before suspend to kill the window system? thanks, -Len Well, just took latest kernel rpms. After wake, the system just reboots - and that's it. No time to check whether it was accessible from the network:) *** Bug 118607 has been marked as a duplicate of this bug. *** How about if you disable acpid before you suspend? # /etc/init.d/acpid stop also, can you supply a version # for the kernel you tested? thanks, -Len This behavior is independent of using apm versus acpi. In bug https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=118607 Bill Nottingham had sugested that this is due to the 4G/4G split. I can confirm this. I compiled the kernel with 4G off and now resume works (for both, apm and acpi). I will try killing acpid. The kernel is ... # rpm -q kernel kernel-2.6.3-2.1.253 Tryed killing acpid before sleep - no changes. Well, that's what people call kernel modularity and maintainability. When changes in the memory mapping kill acpi:) Anyway, is there any possibility to disable this 4G feature? Just not using the patch is not an option - other patches depend on it (cannot be applied properly without it). change the kernel config to reflect the following: # CONFIG_X86_4G is not set # CONFIG_X86_4G_VM_LAYOUT is not set # CONFIG_HIGHMEM4G is not set # CONFIG_HIGHMEM64G is not set I will try and report. Just checked the kernel with disabled 4G - the situation returned to the original story. The machine does not reboot on resume - but does not wake up either. It is in some semi-frozen semi-awake state. Just for the record: it looks like this also happens on a Dell Latitude D800, BIOS rev. A07, kernel-2.6.3-2.1.253.2.1 when I do this: echo -n 'mem' > /sys/power/state which should achieve the same as: echo 3 > /proc/acpi/sleep If I try a mere suspend (S1): echo -n 'suspend' > /sys/power/state it seems to suspend, but without switching off the display (I've typed this from the screen so there may be some typos): --- 8< --- Stopping tasks: ===============================================================================================================================| hdc: start_power_step(step: 0) hdc: completing PM request, suspend hda: start_power_step(step: 0) hda: start_power_step(step: 1) hda: complete_power_step(step: 1, stat: 50, err: 0) hda: completing PM request, suspend PM: Entering state. --- >8 --- When pressing the power button, the systems seems to try to wake up, but freezes then: --- 8< --- Back to C! PM: Finishing up. PCI: Setting latency timer of device 000:00:1d.0 to 64 PCI: Setting latency timer of device 000:00:1d.1 to 64 PCI: Setting latency timer of device 000:00:1d.2 to 64 --- >8 --- The mentioned PCI devices are my onboard USB1.1 controllers. On my IBM thinkpad t41, it fails to wakeup from S3 sleep unless I first unload the USB and network modules. When 4G/4G is disabled in my custom rebuilt test kernel, this ugly script works for me: #!/bin/bash ifdown eth0 ifdown eth1 sleep 1 rmmod -f airo rmmod e1000 rmmod uhci-hcd rmmod ehci-hcd echo -n 3 > /proc/acpi/sleep sleep 1 ifup eth0 *** Bug 119279 has been marked as a duplicate of this bug. *** I saw some messages in the console when unsuspending but the reboot was too quick to transcribe them. I prepared a serial console using a second machine and a cable. Run minicom on the second machine. Add console=ttyS0,9600 to boot arguments of test machine Allow boot to start X Suspend by closing lid, the following is seen in the console: hde: start_power_step(step: 0) hde: start_power_step(step: 1) hde: complete_power_step(step: 1, stat: 50, err: 0) hde: completing PM request, suspend hdc: start_power_step(step: 0) hdc: completing PM request, suspend hda: start_power_step(step: 0) hda: start_power_step(step: 1) hda: complete_power_step(step: 1, stat: 50, err: 0) hda: completing PM request, suspend open lid to exit suspend. Enter BIOS password at prompt. The following messages appear: arch/i386/kernel/time.c:178: spin_lock(arch/i386/kernel/time.c:02318008) already locked 8arch/i386/kernel/time.c:305: spin_unlock(arch/i386/kernel/time.c:02318008) not locked Does that shed any light on the problem? I can test further patches etc. if necessary. *** Bug 119448 has been marked as a duplicate of this bug. *** - Tested with stock FC2test2 kernel 2.6.3-2.1.253.2.1 on FC2test1 distrib : reboots after suspend with both APM and ACPI. - Recompiled kernel 2.6.3-2.1.253.2.1 with CONFIG_X86_4G,CONFIG_X86_4G_VM_LAYOUT,CONFIG_HIGHMEM4G=n : after booting in single user mode (no extra modules loaded), both APM and ACPI resume after suspend. (For ACPI, this is a first on my IBM ThinkPad A30p.) Remarks : 1. APM yields spin_lock as in comment #15 ; 2. ACPI also suspends/resumes in initlevel 3 ; 3. ACPI suspend in initlevel 5 yields : localhost kernel: Stopping tasks: =========== localhost kernel: stopping tasks failed (1 tasks remaining) localhost kernel: Restarting tasks...<6> Strange, khubd not stopped 4. uhci_hcd gets corrupted in both APM and ACPI, yielding e.g. mouse operations after resume impossible : 4a. "# rmmod uhci_hcd" OOPSes after APM resume : Mar 31 00:20:08 localhost kernel: uhci_hcd 0000:00:1d.2: USB bus 3 deregistered Mar 31 00:20:08 localhost kernel: slab error in kmem_cache_destroy(): cache `uhci_urb_priv': Can't free all objects Mar 31 00:20:08 localhost kernel: Call Trace: Mar 31 00:20:08 localhost kernel: [<c014e954>] kmem_cache_destroy+0x90/0x103 Mar 31 00:20:08 localhost kernel: [<f187c45f>] uhci_hcd_cleanup+0x14/0x44 [uhci_hcd] Mar 31 00:20:08 localhost kernel: [<c013f7f1>] sys_delete_module+0xfe/0x11e Mar 31 00:20:08 localhost kernel: [<c015a4c1>] unmap_vma_list+0xe/0x17 Mar 31 00:20:08 localhost kernel: [<c015a9e8>] do_munmap+0x1dc/0x1e6 Mar 31 00:20:08 localhost kernel: [<c02cef0b>] syscall_call+0x7/0xb Mar 31 00:20:08 localhost kernel: Mar 31 00:20:08 localhost kernel: drivers/usb/host/uhci-hcd.c: not all urb_priv's were freed! 4b. resume after ACPI suspend : Mar 30 21:21:10 localhost kernel: uhci_hcd 0000:00:1d.2: host system error, PCI problems? Mar 30 21:21:10 localhost kernel: uhci_hcd 0000:00:1d.2: host controller halted, very bad! Mar 30 21:21:10 localhost kernel: uhci_hcd 0000:00:1d.0: host system error, PCI problems? Mar 30 21:21:10 localhost kernel: uhci_hcd 0000:00:1d.0: host controller halted, very bad! Mar 30 21:21:10 localhost kernel: uhci_hcd 0000:00:1d.1: host system error, PCI problems? Mar 30 21:21:10 localhost kernel: uhci_hcd 0000:00:1d.1: host controller halted, very bad! HTH. Update : tested with kernel 2.6.4-1.300 ; major improvements. 1. Test with stock 4G/4G : both APM and ACPI reboot the machine when resuming from suspend (no change in behaviour wrt to previous kernels). 2. APM test with disabled 4G/4G : - suspend : fails if module 'yenta_socket' is loaded (see bug #117574) - resume : works (no uhci_hcd corruption) 3. ACPI test with disabled 4G/4G : - suspend : works regardless of loaded modules - resume : 3a. with module 'ohci_hcd' loaded : resume hangs, SysRq events are honoured 3b. without module 'ohci_hcd' loaded : * telinit 3 : everything OK * telinit 5 : switching to either VT1 or VT7 locks the display (CTRL-ALT-DEL and SysRq events are honoured) Note to Arjan, Dave : are RedHat engineers interested in continued testing of current development kernels with disabled 4G/4G ? Tested with FC2t2 & kernel-2.6.4-1.305 (4G/4G enabled) : both APM and ACPI reboot the machine (IBM ThinkPad A30p) when resuming from suspend. kernel-2.6.5-1.308 appears to have resolved the problem for me (Toshiba Satellite Pro 4300). Tested with kernel-2.6.5-1.308, with 4G/4G both enabled and disabled (IBM ThinkPad A30p). Synopsis : "echo -n mem > /sys/power/state" works with FC2t2 (excluding FC1), with issues. 1. On FC1 (initlevel 5), resume after suspend blocks both screen and keyboard (remote access, SysRq and CTRLALTDEL are honoured ; there is no error output to a serial console), both with and without 4G/4G ; see comment #18, test 3b. 2. On FC2t2, suspending/resuming in both VT1(console) and VT7(X) works, with the following caveats : - with a VESA VGA framebuffer console (e.g. vga=884 as kernel parameter) screen and keyboard are blocked (see above) if 4G/4G is enabled; this is independent from the loaded X modules (tested with 'radeon' and 'svga'). This does not happen with disabled 4G/4G; - if a suspend/resume cycle is initiated in initlevel=1, a subsequent "telinit 5" : * spews USB error messages : "hub 3-0:1.0: Cannot enable port 2 ; USB cable bad ? hub 3-0:1.0: over-current charge on port 2" (and 1, etc.) * locks the machine hard when switching to X. This happens both with and without kernel frame buffer console. Note : during the approx. 20 permutations I tested, I encountered twice or thrice severe screen corruption when resuming from suspend. Tested with stock kernel-2.6.5-1.315, 4G/4G enabled ; IBM ThinkPad A30p. 1. On FC1, S3 suspend/resume does not function (see comment #21, testcase #1). 2. On FC2t2, a. framebuffer issues when switching between VT1/VT7 (initlevel 5) are resolved ; b. telinit=6 shows "mm/memory.c:102: bad pmd" errors (see log) ; c. S3 in init1, then telinit=5 yields USB errors (USB mouse unusable, trackpoint functions) and some oopses (sorry, no serial console : I'll try to reproduce next tuesday). Created attachment 99299 [details]
mm/memory.c and USB errors after ACPI S3
For USB errors, see also http://bugme.osdl.org/show_bug.cgi?id=1373 "uhci-hcd fails after software suspend" which has a patch to fix some of the problems. Another data point; using kernel 2.6.5-1.315 suspend and resume sometimes works, with no special attention to usb or pcmcia devices (other than whatever Fedora Core 2 test 2 does). Sharp PC-AR50 notebook, Pentium III / Intel 440BX (82371AB/EB/MB PIIX4 ISA (rev 02)) Twice I have seen suspend and resume fail; in these cases the battery state had changed from charging to fully charged or from fully charged to discharging. The instant the machine tried to unsuspend, the fan started (which is unusual, the machine was not hot) and the machine hung. Plenty of error messages on a successful resume but they have already been reported here. Hardware: IBM Thinpad T41 2.6.5-1.315 is fully stable for me until I use X. Then most of the time X locks up the entire system and before getting to the gdm login screen. Sometimes it does work and I am able to get into the system, and S3 sleep & resume works. If it does get to the gdm screen, CTRL-ALT-F1 works to VT1, but when I type root then <ENTER> I get a kernel panic every time. Arjan has seen photos of these kernel panics, but they do not appear useful. I may need to get a serial cable to capture a complete panic. When I commented out `Load "dri"' from /etc/X11/XF86Config, I am able to fully use X with S3 sleep and resume, and all appears to be stable. cam, please try disabling DRI and see if if any behavior improves for you too? 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility M7 LW [Radeon Mobility 7500] 01:00.0 Class 0300: 1002:4c57 I forgot to mention, my Thinkpad T41 has this video card. Warren, I have disabled dri by commenting out the Load "dri" line in the config file. It seems to have no bearing on the bug. I was running 2.6.5-1.315 with acpi=off psmouse.proto=imps root=LABEL=/ rhgb and xorg-x11-0.0.6.6-0.0.2004_03_11.9 ATI Technologies Inc Rage Mobility P/M AGP 2x (rev 64) I still get random hangs on unsuspend (I am away from my serial cable now but can get a log on wednesday if it helps). So far I have had about three or four successful unsuspends and three hangs. The last hang was when X without dri was running and I was in a VT (hoping to see some error messages). I had previously seen error messages on a successful unsuspend, but that time it hung. When unsuspend works there is no problem switching VTs, or switching in and out of X etc. Also the wireless network and other PCMCIA card come up OK. This could be two separate issues that were exposed by the 4G/4G "fix". We do have very different hardware. Created attachment 99355 [details]
my .config file
Sorry for my fault. I attach my config file first. HW: Toshiba Satellite M20 Distribution: SuSE SLES 9 kernel: vanila 2.6.5 , acpi enabled, 4g/4g enabled After I send "echo -n 3 > sleep", system go to sleep well. Then I press the single power button, system do wake up but shutdown immediately. It seems that pressing power button takes two times effects. First, resume from S3. Second, halt the machine. Test with S4 mode is sucessful. If I try to "rmmod button", shutdown operation disappear. Of coz, power button can't make it into S5. Is there any issue in GPE ops? Distribution: SuSE SLES 9 uhm I think you meant to use bugzilla.suse.com instead ;) Created attachment 99356 [details]
lspci -v and /proc/interrupt
http://people.redhat.com/arjanv/2.6/ Try the 319 kernel from here. It seems to have solved all problems on my IBM Thinkpad T41. With kernel-2.6.5-1.319, no essential ACPI improvements WRT comment #22 are noticed ; the mm/memory.c errors (comment #23) seem to have disappeared, though. Sidenote : After an ACPI suspend/resume, the USB mouse (MS Optical) functions for approx. 3-8 seconds, after which it ceases to function. The optical light stays on ; there are no serial console messages. Just tested 2.6.5-1.319. Still no proper wake up. Even without X Window loaded - the wake up process does not complete. The screen is black, the keyboard is not responding (sorry, no local net around me). For those of you with laptops that are not IBM Thinkpad T40 series models, please try this: http://people.redhat.com/wtogami/temp/ kernel-2.6.5-1.322 test i686 RPMS at the above URL for your convenience. Only difference in configuration is the disabled 4G/4G memory split. If behavior between this kernel and the official development 2.6.5-1.322 is identical, then your remaining issues may be unrelated to the 4G/4G memory split problems. Please test and report back. Warren, I couldn't find your disabled 4G/4G kernels but repeated the test on 2.6.5-1.321. For a successful suspend/resume I got the following error messages: hermes @ IO 0x100: Timeout waiting for command completion. eth1: Error -110 disabling MAC port [suspended at this point -Cam] arch/i386/kernel/time.c:178: spin_lock(arch/i386/kernel/time.c:0232ee08) alread8arch/i386/kernel/time.c:305: spin_unlock(arch/i386/kernel/time.c:0232ee08) not deth1: get_wireless_stats() called while device not present blk: queue 19d92c00, I/O limit 4095Mb (mask 0xffffffff) ip_tables: (C) 2000-2002 Netfilter core team ip_tables: (C) 2000-2002 Netfilter core team eth1: error -110 reading info frame. Frame dropped. On an unsucessful attempt I got no output from the kernel. The fan starts but nothing else happens - I don't even get asked for the BIOS password. Is it possible that the fault is caused at the point where suspend is entered, and through not suspending properly, there is no hope of unsuspend? this looks like your wireless driver doesn't support suspend/resume yet. Entirely different bug than this one so please open a separate bug instead... I am convinced with a great deal of certainty that the remaining issues are unrelated to the original problem of this report, which has been solved. As Arjan indicated in comment #40 other drivers very often have issues with suspend. Another common blocker may be lack of power management in the X driver. For example, Bug #117690 radeon DRI prior to xorg-x11 in FC2 lacked power management. I could workaround the issue back then by commenting out the dri line from my X config file. In any case I recommend reporting your problem to upstream mailing lists, discussing it on fedora-test-list, and do further testing to attempt to isolate individual problems. Only after you have done all of this file new reports in bugzilla. Warren, look at my report #37. Exactly what I started with. So X is not involved in my case. Well, drivers can be involved - but in this case they were in charge in the initial report as well. So why did you close this report? At least this bug allowed to track the activity on this subject... Could you please reopen it? IMO this is still a kernel problem, even if it happens in drivers. I'll reopen this bug as an "umbrella bug" for individual drivers. Sergey, myself, ... should open individual bug reports for each driver involved that each block this bug. eh please no. This bug is a mess and confusing already, if you feel the need for some umbrella bug please make a new one. Please don't start adding more confusion to this messy bug already. Nils, could you please open umbrella bug and post the id here. Or I can do it myself. Arjan: well, it is kind of strange to see the bug closed while the original problem still persists. Segey: well yes; however this bug got rather messy. I think the right way forward is to have the umbrella bug (warren is making one) and have bugs for individual components that break suspend/resume. This bug has a far too confusing/messy history to be useful for that; I'd really prefer new bug(s). OK. So who will open the new bug? Lads, if you don't have time in an hour or so - I will do it myself and post the id here. 121020 is the tracker bug the idea is that individual problems get their own bug, but get marked as blocker for that one; that way bugzilla can make nice overview graphs etc etc. Thanks lads. I will attach my info there tonight. Or should I start my VERY OWN bug and put it under this umbrella? please start your own bug and be as specific as possible both in the subject of it as in the text, to avoid the situation where everyone with a suspend/resume issue thinks your new bug is exactly the same as they see. |