Bug 117032

Summary: 4G/4G problems | S3 suspend works but resume reboots instantly
Product: [Fedora] Fedora Reporter: Sergey V. Udaltsov <sergey_udaltsov>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED RAWHIDE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: rawhide   
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-04-16 09:11:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
mm/memory.c and USB errors after ACPI S3
none
my .config file
none
lspci -v and /proc/interrupt none

Description Sergey V. Udaltsov 2004-02-27 16:49:02 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6)
Gecko/20040116 Galeon/1.3.13

Description of problem:
"echo 3 > /proc/acpi/sleep" actually works - the system goes to sleep.
And it is trying to wake up on the keypress. But fails - the screen
remains black, the caps lock light does not trigger. Only turning the
system off/on helps at this stage.

Intel Centrino platform, Acer TravelMate 803

Version-Release number of selected component (if applicable):
kernel-2.6.3-1.100

How reproducible:
Always

Steps to Reproduce:
1.echo 3 > /proc/acpi/sleep
2.Try to wake up
    

Actual Results:  Resume of work

Expected Results:  Some semi-dead state.

Additional info:

Comment 1 Len Brown 2004-03-13 04:03:51 UTC
can you access the system from the network after resume?
(ie. is just the video dead, or is the system totally unaccessible?)

how about if you "init 3" before suspend to kill the window system?

thanks,
-Len



Comment 2 Sergey V. Udaltsov 2004-03-13 23:32:20 UTC
Well, just took latest kernel rpms. After wake, the system just
reboots - and that's it. No time to check whether it was accessible
from the network:)

Comment 3 Bill Nottingham 2004-03-19 04:09:34 UTC
*** Bug 118607 has been marked as a duplicate of this bug. ***

Comment 4 Len Brown 2004-03-20 08:44:32 UTC
How about if you disable acpid before you suspend? 
# /etc/init.d/acpid stop 
 
also, can you supply a version # for the kernel you tested? 
 
thanks, 
-Len 
 
 

Comment 5 Jost Diederichs 2004-03-22 04:25:19 UTC
This behavior is independent of using apm versus acpi.  
In bug https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=118607 
Bill Nottingham had sugested that this is due to the 4G/4G split.  
I can confirm this. I compiled the kernel with 4G off and now resume 
works (for both, apm and acpi).  
  

Comment 6 Sergey V. Udaltsov 2004-03-22 22:33:07 UTC
I will try killing acpid. The kernel is ...
# rpm -q kernel
kernel-2.6.3-2.1.253

Comment 7 Sergey V. Udaltsov 2004-03-23 14:34:57 UTC
Tryed killing acpid before sleep - no changes.

Comment 8 Sergey V. Udaltsov 2004-03-23 14:38:18 UTC
Well, that's what people call kernel modularity and maintainability.
When changes in the memory mapping kill acpi:)

Anyway, is there any possibility to disable this 4G feature? Just not
using the patch is not an option - other patches depend on it (cannot
be applied properly without it).

Comment 9 Jost Diederichs 2004-03-23 15:45:39 UTC
change the kernel config to reflect the following:
# CONFIG_X86_4G is not set
# CONFIG_X86_4G_VM_LAYOUT is not set
# CONFIG_HIGHMEM4G is not set
# CONFIG_HIGHMEM64G is not set


Comment 10 Sergey V. Udaltsov 2004-03-23 15:50:00 UTC
I will try and report.

Comment 11 Sergey V. Udaltsov 2004-03-23 23:42:21 UTC
Just checked the kernel with disabled 4G - the situation returned to
the original story. The machine does not reboot on resume - but does
not wake up either. It is in some semi-frozen semi-awake state.

Comment 12 Nils Philippsen 2004-03-25 11:37:26 UTC
Just for the record: it looks like this also happens on a Dell
Latitude D800, BIOS rev. A07, kernel-2.6.3-2.1.253.2.1 when I do this:

echo -n 'mem' > /sys/power/state

which should achieve the same as:

echo 3 > /proc/acpi/sleep

If I try a mere suspend (S1):

echo -n 'suspend' > /sys/power/state

it seems to suspend, but without switching off the display (I've typed
this from the screen so there may be some typos):

--- 8< ---
Stopping tasks:
===============================================================================================================================|
hdc: start_power_step(step: 0)
hdc: completing PM request, suspend
hda: start_power_step(step: 0)
hda: start_power_step(step: 1)
hda: complete_power_step(step: 1, stat: 50, err: 0)
hda: completing PM request, suspend
PM: Entering state.
--- >8 ---

When pressing the power button, the systems seems to try to wake up,
but freezes then:

--- 8< ---
Back to C!
PM: Finishing up.
PCI: Setting latency timer of device 000:00:1d.0 to 64
PCI: Setting latency timer of device 000:00:1d.1 to 64
PCI: Setting latency timer of device 000:00:1d.2 to 64
--- >8 ---

The mentioned PCI devices are my onboard USB1.1 controllers.

Comment 13 Warren Togami 2004-03-25 16:17:21 UTC
On my IBM thinkpad t41, it fails to wakeup from S3 sleep unless I
first unload the USB and network modules.  When 4G/4G is disabled in
my custom rebuilt test kernel, this ugly script works for me:

#!/bin/bash
ifdown eth0
ifdown eth1
sleep 1
rmmod -f airo
rmmod e1000
rmmod uhci-hcd
rmmod ehci-hcd
echo -n 3 > /proc/acpi/sleep
sleep 1
ifup eth0

Comment 14 Bill Nottingham 2004-03-29 21:08:01 UTC
*** Bug 119279 has been marked as a duplicate of this bug. ***

Comment 15 cam 2004-03-30 13:08:53 UTC
I saw some messages in the console when unsuspending but the reboot
was too quick to transcribe them. I prepared a serial console using a
second machine and a cable.

Run minicom on the second machine.
Add console=ttyS0,9600 to boot arguments of test machine

Allow boot to start X

Suspend by closing lid, the following is seen in the console:
hde: start_power_step(step: 0)
hde: start_power_step(step: 1)
hde: complete_power_step(step: 1, stat: 50, err: 0)
hde: completing PM request, suspend
hdc: start_power_step(step: 0)
hdc: completing PM request, suspend
hda: start_power_step(step: 0)
hda: start_power_step(step: 1)
hda: complete_power_step(step: 1, stat: 50, err: 0)
hda: completing PM request, suspend


open lid to exit suspend. Enter BIOS password at prompt. The following
messages appear:

arch/i386/kernel/time.c:178:
spin_lock(arch/i386/kernel/time.c:02318008) already locked
8arch/i386/kernel/time.c:305:
spin_unlock(arch/i386/kernel/time.c:02318008) not locked

Does that shed any light on the problem? I can test further patches
etc. if necessary.

Comment 16 Bill Nottingham 2004-03-30 16:51:49 UTC
*** Bug 119448 has been marked as a duplicate of this bug. ***

Comment 17 Didier 2004-03-30 21:03:09 UTC
- Tested with stock FC2test2 kernel 2.6.3-2.1.253.2.1 on FC2test1
distrib : reboots after suspend with both APM and ACPI.

- Recompiled kernel 2.6.3-2.1.253.2.1 with
CONFIG_X86_4G,CONFIG_X86_4G_VM_LAYOUT,CONFIG_HIGHMEM4G=n :
after booting in single user mode (no extra modules loaded), both APM
and ACPI resume after suspend. (For ACPI, this is a first on my IBM
ThinkPad A30p.)

Remarks : 

1. APM yields spin_lock as in comment #15 ;

2. ACPI also suspends/resumes in initlevel 3 ;

3. ACPI suspend in initlevel 5 yields :
localhost kernel: Stopping tasks: ===========
localhost kernel:  stopping tasks failed (1 tasks remaining)
localhost kernel: Restarting tasks...<6> Strange, khubd not stopped

4. uhci_hcd gets corrupted in both APM and ACPI, yielding e.g. mouse
operations after resume impossible :

4a. "# rmmod uhci_hcd" OOPSes after APM resume :

Mar 31 00:20:08 localhost kernel: uhci_hcd 0000:00:1d.2: USB bus 3
deregistered
Mar 31 00:20:08 localhost kernel: slab error in kmem_cache_destroy():
cache `uhci_urb_priv': Can't free all objects
Mar 31 00:20:08 localhost kernel: Call Trace: Mar 31 00:20:08
localhost kernel:  [<c014e954>] kmem_cache_destroy+0x90/0x103
Mar 31 00:20:08 localhost kernel:  [<f187c45f>]
uhci_hcd_cleanup+0x14/0x44 [uhci_hcd]
Mar 31 00:20:08 localhost kernel:  [<c013f7f1>]
sys_delete_module+0xfe/0x11e
Mar 31 00:20:08 localhost kernel:  [<c015a4c1>] unmap_vma_list+0xe/0x17
Mar 31 00:20:08 localhost kernel:  [<c015a9e8>] do_munmap+0x1dc/0x1e6
Mar 31 00:20:08 localhost kernel:  [<c02cef0b>] syscall_call+0x7/0xb
Mar 31 00:20:08 localhost kernel: Mar 31 00:20:08 localhost kernel:
drivers/usb/host/uhci-hcd.c: not all urb_priv's were freed!


4b. resume after ACPI suspend :

Mar 30 21:21:10 localhost kernel: uhci_hcd 0000:00:1d.2: host system
error, PCI problems?
Mar 30 21:21:10 localhost kernel: uhci_hcd 0000:00:1d.2: host
controller halted, very bad!
Mar 30 21:21:10 localhost kernel: uhci_hcd 0000:00:1d.0: host system
error, PCI problems?
Mar 30 21:21:10 localhost kernel: uhci_hcd 0000:00:1d.0: host
controller halted, very bad!
Mar 30 21:21:10 localhost kernel: uhci_hcd 0000:00:1d.1: host system
error, PCI problems?
Mar 30 21:21:10 localhost kernel: uhci_hcd 0000:00:1d.1: host
controller halted, very bad!


HTH.

Comment 18 Didier 2004-04-01 13:09:40 UTC
Update : tested with kernel 2.6.4-1.300 ; major improvements.

1. Test with stock 4G/4G : both APM and ACPI reboot the machine when
resuming from suspend (no change in behaviour wrt to previous kernels).

2. APM test with disabled 4G/4G :

- suspend : fails if module 'yenta_socket' is loaded (see bug #117574)
- resume : works (no uhci_hcd corruption)

3. ACPI test with disabled 4G/4G :

- suspend : works regardless of loaded modules
- resume :
  3a. with module 'ohci_hcd' loaded :
   resume hangs, SysRq events are honoured
  3b. without module 'ohci_hcd' loaded :
   * telinit 3 : everything OK
   * telinit 5 : switching to either VT1 or VT7 locks the display
(CTRL-ALT-DEL and SysRq events are honoured)


Note to Arjan, Dave : are RedHat engineers interested in continued
testing of current development kernels with disabled 4G/4G ?

Comment 19 Didier 2004-04-05 07:58:53 UTC
Tested with FC2t2 & kernel-2.6.4-1.305 (4G/4G enabled) : both APM and
ACPI reboot the machine (IBM ThinkPad A30p) when resuming from suspend.


Comment 20 Jost Diederichs 2004-04-09 07:10:04 UTC
kernel-2.6.5-1.308 appears to have resolved the problem for me 
(Toshiba Satellite Pro 4300). 

Comment 21 Didier 2004-04-09 13:35:50 UTC
Tested with kernel-2.6.5-1.308, with 4G/4G both enabled and disabled
(IBM ThinkPad A30p).

Synopsis : "echo -n mem > /sys/power/state" works with FC2t2
(excluding FC1), with issues.


1. On FC1 (initlevel 5), resume after suspend blocks both screen and
keyboard (remote access, SysRq and CTRLALTDEL are honoured ; there is
no error output to a serial console), both with and without 4G/4G ;
see comment #18, test 3b.


2. On FC2t2, suspending/resuming in both VT1(console) and VT7(X)
works, with the following caveats :

- with a VESA VGA framebuffer console (e.g. vga=884 as kernel
parameter) screen and keyboard are blocked (see above) if 4G/4G is
enabled; this is independent from the loaded X modules (tested with
'radeon' and 'svga'). This does not happen with disabled 4G/4G;

- if a suspend/resume cycle is initiated in initlevel=1, a subsequent
"telinit 5" :

* spews USB error messages :
  "hub 3-0:1.0: Cannot enable port 2 ; USB cable bad ?
   hub 3-0:1.0: over-current charge on port 2" (and 1, etc.)

* locks the machine hard when switching to X.
This happens both with and without kernel frame buffer console.


Note : during the approx. 20 permutations I tested, I encountered
twice or thrice severe screen corruption when resuming from suspend.

Comment 22 Didier 2004-04-10 14:50:59 UTC
Tested with stock kernel-2.6.5-1.315, 4G/4G enabled ; IBM ThinkPad A30p.

1. On FC1, S3 suspend/resume does not function (see comment #21,
testcase #1).

2. On FC2t2, 
a. framebuffer issues when switching between VT1/VT7 (initlevel 5) are
resolved ;
b. telinit=6 shows "mm/memory.c:102: bad pmd" errors (see log) ;
c. S3 in init1, then telinit=5 yields USB errors (USB mouse unusable,
trackpoint functions) and some oopses (sorry, no serial console : I'll
try to reproduce next tuesday).



Comment 23 Didier 2004-04-10 14:51:57 UTC
Created attachment 99299 [details]
mm/memory.c and USB errors after ACPI S3

Comment 24 Aleksey Nogin 2004-04-10 18:42:08 UTC
For USB errors, see also http://bugme.osdl.org/show_bug.cgi?id=1373
"uhci-hcd fails after software suspend" which has a patch to fix some
of the problems.

Comment 25 cam 2004-04-10 22:08:21 UTC
Another data point; using kernel 2.6.5-1.315 suspend and resume
sometimes works, with no special attention to usb or pcmcia devices
(other than whatever Fedora Core 2 test 2 does). Sharp PC-AR50
notebook, Pentium III / Intel 440BX (82371AB/EB/MB PIIX4 ISA (rev 02))

Twice I have seen suspend and resume fail; in these cases the battery
state had changed from charging to fully charged or from fully charged
to discharging. The instant the machine tried to unsuspend, the fan
started (which is unusual, the machine was not hot) and the machine hung.

Plenty of error messages on a successful resume but they have already
been reported here.

Comment 26 Warren Togami 2004-04-10 22:51:20 UTC
Hardware: IBM Thinpad T41

2.6.5-1.315 is fully stable for me until I use X.  Then most of the
time X locks up the entire system and before getting to the gdm login
screen.  Sometimes it does work and I am able to get into the system,
and S3 sleep & resume works.  If it does get to the gdm screen,
CTRL-ALT-F1 works to VT1, but when I type root then <ENTER> I get a
kernel panic every time.  Arjan has seen photos of these kernel
panics, but they do not appear useful.  I may need to get a serial
cable to capture a complete panic.

When I commented out `Load  "dri"' from /etc/X11/XF86Config, I am able
to fully use X with S3 sleep and resume, and all appears to be stable.

Comment 27 Warren Togami 2004-04-10 22:57:29 UTC
cam, please try disabling DRI and see if if any behavior improves for
you too?

Comment 28 Warren Togami 2004-04-11 08:16:00 UTC
01:00.0 VGA compatible controller: ATI Technologies Inc Radeon
Mobility M7 LW [Radeon Mobility 7500]
01:00.0 Class 0300: 1002:4c57
I forgot to mention, my Thinkpad T41 has this video card.

Comment 29 cam 2004-04-11 12:38:03 UTC
Warren,
I have disabled dri by commenting out the Load "dri" line in the
config file. It seems to have no bearing on the bug.

I was running 2.6.5-1.315 with acpi=off psmouse.proto=imps
root=LABEL=/ rhgb
and xorg-x11-0.0.6.6-0.0.2004_03_11.9
ATI Technologies Inc Rage Mobility P/M AGP 2x (rev 64)

I still get random hangs on unsuspend (I am away from my serial cable
now but can get a log on wednesday if it helps). So far I have had
about three or four successful unsuspends and three hangs. The last
hang was when X without dri was running and I was in a VT (hoping to
see some error messages). I had previously seen error messages on a
successful unsuspend, but that time it hung.

When unsuspend works there is no problem switching VTs, or switching
in and out of X etc. Also the wireless network and other PCMCIA card
come up OK.


Comment 30 Warren Togami 2004-04-11 12:43:55 UTC
This could be two separate issues that were exposed by the 4G/4G
"fix".  We do have very different hardware.

Comment 31 Wang, Zhenyu Z 2004-04-13 09:02:51 UTC
Created attachment 99355 [details]
my .config file

Comment 32 Wang, Zhenyu Z 2004-04-13 09:09:42 UTC
Sorry for my fault. I attach my config file first.

HW: Toshiba Satellite M20
Distribution: SuSE SLES 9
kernel: vanila 2.6.5 , acpi enabled, 4g/4g enabled

After I send "echo -n 3 > sleep", system go to sleep well.
Then I press the single power button, system do wake up but 
shutdown immediately.
It seems that pressing power button takes two times effects.
First, resume from S3. Second, halt the machine.
Test with S4 mode is sucessful.

If I try to "rmmod button", shutdown operation disappear.
Of coz, power button can't make it into S5.

Is there any issue in GPE ops?


Comment 33 Arjan van de Ven 2004-04-13 09:11:08 UTC
Distribution: SuSE SLES 9

uhm I think you meant to use bugzilla.suse.com instead ;)


Comment 34 Wang, Zhenyu Z 2004-04-13 09:12:39 UTC
Created attachment 99356 [details]
lspci -v and /proc/interrupt

Comment 35 Warren Togami 2004-04-13 09:55:15 UTC
http://people.redhat.com/arjanv/2.6/

Try the 319 kernel from here.  It seems to have solved all problems on
my IBM Thinkpad T41.

Comment 36 Didier 2004-04-13 12:43:28 UTC
With kernel-2.6.5-1.319, no essential ACPI improvements WRT comment
#22 are noticed ; the mm/memory.c errors (comment #23) seem to have
disappeared, though.

Sidenote : After an ACPI suspend/resume, the USB mouse (MS Optical)
functions for approx. 3-8 seconds, after which it ceases to function.
The optical light stays on ; there are no serial console messages.

Comment 37 Sergey V. Udaltsov 2004-04-13 23:36:38 UTC
Just tested 2.6.5-1.319. Still no proper wake up. Even without X
Window loaded - the wake up process does not complete. The screen is
black, the keyboard is not responding (sorry, no local net around me).

Comment 38 Warren Togami 2004-04-15 01:49:48 UTC
For those of you with laptops that are not IBM Thinkpad T40 series
models, please try this:

http://people.redhat.com/wtogami/temp/
kernel-2.6.5-1.322 test i686 RPMS at the above URL for your
convenience.  Only difference in configuration is the disabled 4G/4G
memory split.

If behavior between this kernel and the official development
2.6.5-1.322 is identical, then your remaining issues may be unrelated
to the 4G/4G memory split problems.

Please test and report back.

Comment 39 cam 2004-04-15 11:42:49 UTC
Warren, I couldn't find your disabled 4G/4G kernels but repeated the
test on 2.6.5-1.321. For a successful suspend/resume I got the
following error messages:

hermes @ IO 0x100: Timeout waiting for command completion.
eth1: Error -110 disabling MAC port

[suspended at this point -Cam]

arch/i386/kernel/time.c:178:
spin_lock(arch/i386/kernel/time.c:0232ee08)
alread8arch/i386/kernel/time.c:305:
spin_unlock(arch/i386/kernel/time.c:0232ee08) not deth1:
get_wireless_stats() called while device not present
blk: queue 19d92c00, I/O limit 4095Mb (mask 0xffffffff)
ip_tables: (C) 2000-2002 Netfilter core team
ip_tables: (C) 2000-2002 Netfilter core team
eth1: error -110 reading info frame. Frame dropped.


On an unsucessful attempt I got no output from the kernel. The fan
starts but nothing else happens - I don't even get asked for the BIOS
password. Is it possible that the fault is caused at the point where
suspend is entered, and through not suspending properly, there is no
hope of unsuspend?


Comment 40 Arjan van de Ven 2004-04-15 11:44:27 UTC
this looks like your wireless driver doesn't support suspend/resume yet.
Entirely different bug than this one so please open a separate bug
instead...

Comment 41 Warren Togami 2004-04-16 08:23:00 UTC
I am convinced with a great deal of certainty that the remaining
issues are unrelated to the original problem of this report, which has
been solved.  As Arjan indicated in comment #40 other drivers very
often have issues with suspend.  Another common blocker may be lack of
power management in the X driver.  For example, Bug #117690 radeon DRI
prior to xorg-x11 in FC2 lacked power management.  I could workaround
the issue back then by commenting out the dri line from my X config file.

In any case I recommend reporting your problem to upstream mailing
lists, discussing it on fedora-test-list, and do further testing to
attempt to isolate individual problems.  Only after you have done all
of this file new reports in bugzilla.

Comment 42 Sergey V. Udaltsov 2004-04-16 08:33:34 UTC
Warren, look at my report #37. Exactly what I started with. So X is
not involved in my case. Well, drivers can be involved - but in this
case they were in charge in the initial report as well. So why did you
close this report? At least this bug allowed to track the activity on
this subject... Could you please reopen it?

Comment 43 Nils Philippsen 2004-04-16 09:06:35 UTC
IMO this is still a kernel problem, even if it happens in drivers.
I'll reopen this bug as an "umbrella bug" for individual drivers.
Sergey, myself, ... should open individual bug reports for each driver
involved that each block this bug.

Comment 44 Arjan van de Ven 2004-04-16 09:11:18 UTC
eh please no.
This bug is a mess and confusing already, if you feel the need for
some umbrella bug please make a new one. Please don't start adding
more confusion to this messy bug already.

Comment 46 Sergey V. Udaltsov 2004-04-16 09:19:00 UTC
Nils, could you please open umbrella bug and post the id here. Or I
can do it myself.

Arjan: well, it is kind of strange to see the bug closed while the
original problem still persists.

Comment 47 Arjan van de Ven 2004-04-16 09:32:19 UTC
Segey: well yes; however this bug got rather messy.
I think the right way forward is to have the umbrella bug (warren is
making one) and have bugs for individual components that break
suspend/resume. This bug has a far too confusing/messy history to be
useful for that; I'd really prefer new bug(s).



Comment 48 Sergey V. Udaltsov 2004-04-16 09:37:47 UTC
OK. So who will open the new bug? Lads, if you don't have time in an
hour or so - I will do it myself and post the id here.

Comment 49 Arjan van de Ven 2004-04-16 09:40:52 UTC
121020 is the tracker bug
the idea is that individual problems get their own bug, but get marked
as blocker for that one; that way bugzilla can make nice overview
graphs etc etc.

Comment 50 Sergey V. Udaltsov 2004-04-16 09:48:10 UTC
Thanks lads. I will attach my info there tonight. Or should I start my
VERY OWN bug and put it under this umbrella?

Comment 51 Arjan van de Ven 2004-04-16 09:49:16 UTC
please start your own bug and be as specific as possible both in the
subject of it as in the text, to avoid the situation where everyone
with a suspend/resume issue thinks your new bug is exactly the same as
they see.