Bug 140112 - RHEL4 U1: init 0/poweroff not working
RHEL4 U1: init 0/poweroff not working
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Jim Paradis
Brian Brock
Depends On:
Blocks: 147461
  Show dependency treegraph
Reported: 2004-11-19 15:29 EST by jordan hargrave
Modified: 2013-08-05 21:09 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2005-04-13 21:24:19 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
ACPI debug output during init 0 on PE1750 (466.29 KB, text/plain)
2004-12-07 20:04 EST, jordan hargrave
no flags Details

  None (edit)
Description jordan hargrave 2004-11-19 15:29:07 EST
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET 
CLR 1.1.4322)

Description of problem:
init 0 and poweroff no longer work on Dell PowerEdge 4600, 1600SC, 
1750.  These functions did work properly on RHEL4 Beta 1.

The function acpi_power_off is in /drivers/acpi/sleep/poweroff.c. It 
is supposed to place the system in the S5 power state.  There were 
several changes made between Beta1 and Beta2 
in /drivers/acpi/hardware/hwsleep.c, specifically in 
acpi_enter_sleep_state().  When these changes were reverted to Beta1 
code, the system powers down properly.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.Install RHEL4 Beta 2 on Dell PowerEdge (4600, 1600SC, 1750)
2.Issue an 'init 0' or 'poweroff'

Actual Results:  System shuts down services, displays:
Power down
acpi_power_off called

then hangs. System does not power down

Additional info:
Comment 4 Susan Denham 2004-11-30 12:38:48 EST
Per Dell today:  Amit says that they will retest this with the 751
kernel and post status.
Comment 5 jordan hargrave 2004-11-30 14:31:15 EST
The 751 kernel does not work, the system still hangs at 
Comment 6 Tim Burke 2004-11-30 14:35:04 EST
OK with Dell if we allow Len Brown of Intel (upstream ACPI maintainer)
to access this issue?
Comment 7 Dale Kaisner 2004-11-30 18:19:53 EST
Yes, these are shipping systems.
Comment 8 Susan Denham 2004-12-01 09:30:43 EST
Adding Len Brown and Geoff Gustafson of Intel to the cc: list.   Len:
 need you to jump in here, please!
Comment 9 jordan hargrave 2004-12-02 11:51:47 EST
init 0/poweroff work with the following kernels:

does not work with:
Comment 11 jordan hargrave 2004-12-07 18:00:43 EST
  There's definitely a difference in the _PTS method on our systems:

The 1750 and 4600 (systems that are failing) have similar (complex) 
PTS methods.  All other systems that work have simple PTS methods.

;; 1750 
000001aa: Method _PTS (\_PTS)
000001b1:   ArgCount 1; NotSerialized
000001b2:   If
000001b4:     LEqual
000001b5:       Arg0
000001b6:       0x04
000001b8:     Store
000001b9:       0x18
000001bb:       WKSL (000001a0)
000001bf:     Or
000001c0:       WKEN (000001a5)
000001c4:       0x01
000001c6:       WKEN (000001a5)
000001ca:   Else
000001cc:     Store
000001cd:       0x18
000001cf:       WKSL (000001a0)
000001d3:     And
000001d4:       WKEN (000001a5)
000001d8:       0xfe
000001da:       WKEN (000001a5)
000001de:   Sleep
000001e0:     0x0f
;; 2800
00000118: Method _PTS (\_PTS)
0000011f:   ArgCount 1; NotSerialized
00000120:   Sleep
00000122:     0x0f
;; 2600
0000008d: Method _PTS (\_PTS)
00000094:   ArgCount 1; NotSerialized
00000095:   Sleep
00000097:     0x0f
Comment 12 jordan hargrave 2004-12-07 20:04:16 EST
Created attachment 108082 [details]
ACPI debug output during init 0 on PE1750
Comment 13 jordan hargrave 2004-12-07 20:05:23 EST
System is hanging within acpi_enter_sleep_state_prep; the call to 
acpi_evaluate_object("_PTS") never returns.
Comment 14 jordan hargrave 2004-12-09 15:03:12 EST
Bumping this to a Sev 1 as we reproduced this on a 2800 (X26 BIOS), 
but have not seen with a 2850.

Reproduced on the following systems:

init 0 works on the base 2.6.9 and 2.6.10 kernels, as well as beta1 
kernel.  Reverting to base 2.6.9 ACPI code did NOT fix the issue; so 
it is being caused by something else.
Comment 16 Len Brown 2004-12-09 17:22:02 EST
Are there any kernel.org kernels that fail,
or is this regression specific to the RHEL4 kernels
after 2.6.9-1.648?

Is it always true that upon a failure, the system
never returns from acpi_evaluate_object("_PTS")?

I don't have an archive of RHEL kernel trees here;
and it isn't clear that there are analogous upstream
kernel changes associated with this failure.
Can you attach the before/after of the source changes
you made to make the regression go away?
Comment 17 Len Brown 2004-12-09 17:55:17 EST
re: comment #11
is the 2600 an example of a system with a simple _PTS
that does not fail?
Comment 18 Len Brown 2004-12-09 18:12:12 EST
i wonder if this is due to the changes in acpi_os_sleep()...
Comment 19 jordan hargrave 2004-12-09 18:51:50 EST
2850 works, 2800 does not. Both have simple _PTS methods.
Correct in the testing that I've seen that the system does not return 
from _PTS method.
Comment 20 jordan hargrave 2004-12-10 04:00:46 EST
Looks like problem  might be in the linux-2.6.9-kexec.patch.  At 
shutdown, the 8259 is masked off on all interrupts . 
The i8259A_shutdown function is a new function 
in /arch/i386/kernel/i8259.c.  

So far I've been able to isolate it into masking off IRQ4 or IRQ5 (Is 
this disabling the SMI?)
Comment 21 Len Brown 2004-12-12 01:54:41 EST
Huh? i8259A_shutdown? -- in the (remote) event you mean "lapic_shutdown", 
then note if this 2.6.10-ism got into RHEL4, then it may need to be 
updated per these two 2.6.10 patches: 
Where can I find a copy of  linux-2.6.9-kexec.patch? 
No, masking IRQ5 or IRQ6 should have no effect on SMI. 
However, SMM is very tricky, and sometimes it is tricked 
out when the OS makes changes to hardware state that 
the BIOS didn't expect. 
BTW. are all the machines that fail SMP? 
If so, do they still fail if booted with maxcpus=1, 
or "maxcpus=1" "nolapic"? 
Comment 22 jordan hargrave 2004-12-13 10:16:33 EST
Yeah, it's bizzare. There are two lines in i8259A_shutdown, which 
mask off the interrupt bits:

If I comment out the first line (0x21) then the system shuts down 
properly.  Also have tried various combinations.
0xFF - fails
0xF0 - system shuts down
0xC0 - fails

which seems odd to me. Maybe a combination of interrupts working 
together here?

The problem is seen in both SMP and UP kernels.

The linkx-2.6.9-kexec.patch is in the source of the -648 kernel.  
Install the RPM then the patch is in /usr/src/redhat/SOURCES
Comment 23 jordan hargrave 2004-12-14 12:39:01 EST
0xFE works, so wondering is this a timer tick issue? Does the ACPI
code require the timer tick to be active?
Comment 24 jordan hargrave 2004-12-20 11:58:35 EST
Problem occurs during call to acpi_os_sleep.  Unfortunately
i8259A_shutdown has already turned off the timer tick interrupt, so
the call to schedule_timeout() in osl.c:acpi_os_sleep never returns.
This causes a hang instead of the system power off.
Comment 25 jordan hargrave 2005-01-21 11:12:37 EST
Any updates on this?  Does anyone know why interrupts are masked off 
in i8259A_shutdown?
Comment 26 Amit Bhutani 2005-01-25 15:05:09 EST
Changing the title to reflect the Update in which a fix for this 
issue has been committed or being tracked for..
Comment 27 Len Brown 2005-03-01 02:42:33 EST
same issue tracked upstream in mm tree: 
Comment 28 Dave Jones 2005-03-01 14:03:51 EST
I dropped the kexec patch in the latest builds. this should fix the issue.
(it did when the same thing affected Fedora).
Comment 29 Amit Bhutani 2005-04-13 21:24:19 EDT
Raghavendra from Dell has regressed and confirms that this issue is resolved 
in U1 beta. Closing. Thanks!
Comment 30 David Ruggiero 2005-05-24 16:08:48 EDT
I am still seeing this issue with a PowerEdge 1600SC (latest A12 bios) and
RHEL4, kernel 2.6.9-5.0.5.EL . Does the "U1 beta" mentioned above containing the
fix imply a later patch/version of the kernel than this?
Comment 31 Geoff Gustafson 2005-05-26 18:08:12 EDT
Yes, U1 beta was kernel-2.6.9-6.37.EL.

Note You need to log in before you can comment on or make changes to this bug.