Bug 140112 - RHEL4 U1: init 0/poweroff not working
Summary: RHEL4 U1: init 0/poweroff not working
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Jim Paradis
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 147461
TreeView+ depends on / blocked
 
Reported: 2004-11-19 20:29 UTC by jordan hargrave
Modified: 2013-08-06 01:09 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-04-14 01:24:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ACPI debug output during init 0 on PE1750 (466.29 KB, text/plain)
2004-12-08 01:04 UTC, jordan hargrave
no flags Details

Description jordan hargrave 2004-11-19 20:29:07 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET 
CLR 1.1.4322)

Description of problem:
init 0 and poweroff no longer work on Dell PowerEdge 4600, 1600SC, 
1750.  These functions did work properly on RHEL4 Beta 1.

The function acpi_power_off is in /drivers/acpi/sleep/poweroff.c. It 
is supposed to place the system in the S5 power state.  There were 
several changes made between Beta1 and Beta2 
in /drivers/acpi/hardware/hwsleep.c, specifically in 
acpi_enter_sleep_state().  When these changes were reverted to Beta1 
code, the system powers down properly.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Install RHEL4 Beta 2 on Dell PowerEdge (4600, 1600SC, 1750)
2.Issue an 'init 0' or 'poweroff'
    

Actual Results:  System shuts down services, displays:
Power down
acpi_power_off called

then hangs. System does not power down

Additional info:

Comment 4 Susan Denham 2004-11-30 17:38:48 UTC
Per Dell today:  Amit says that they will retest this with the 751
kernel and post status.

Comment 5 jordan hargrave 2004-11-30 19:31:15 UTC
The 751 kernel does not work, the system still hangs at 
acpi_power_off.

Comment 6 Tim Burke 2004-11-30 19:35:04 UTC
OK with Dell if we allow Len Brown of Intel (upstream ACPI maintainer)
to access this issue?

Comment 7 Dale Kaisner 2004-11-30 23:19:53 UTC
Yes, these are shipping systems.

Comment 8 Susan Denham 2004-12-01 14:30:43 UTC
Adding Len Brown and Geoff Gustafson of Intel to the cc: list.   Len:
 need you to jump in here, please!

Comment 9 jordan hargrave 2004-12-02 16:51:47 UTC
init 0/poweroff work with the following kernels:
2.6.10-rc2
2.6.8-1.528-2.10

does not work with:
2.6.9-1.648
2.6.9-1.751


Comment 11 jordan hargrave 2004-12-07 23:00:43 UTC
  There's definitely a difference in the _PTS method on our systems:

The 1750 and 4600 (systems that are failing) have similar (complex) 
PTS methods.  All other systems that work have simple PTS methods.

;; 1750 
000001aa: Method _PTS (\_PTS)
000001b1:   ArgCount 1; NotSerialized
000001b2:   If
000001b4:     LEqual
000001b5:       Arg0
000001b6:       0x04
000001b8:     Store
000001b9:       0x18
000001bb:       WKSL (000001a0)
000001bf:     Or
000001c0:       WKEN (000001a5)
000001c4:       0x01
000001c6:       WKEN (000001a5)
000001ca:   Else
000001cc:     Store
000001cd:       0x18
000001cf:       WKSL (000001a0)
000001d3:     And
000001d4:       WKEN (000001a5)
000001d8:       0xfe
000001da:       WKEN (000001a5)
000001de:   Sleep
000001e0:     0x0f
                                                                      
                                                
;; 2800
00000118: Method _PTS (\_PTS)
0000011f:   ArgCount 1; NotSerialized
00000120:   Sleep
00000122:     0x0f
                                                                      
                                                
;; 2600
0000008d: Method _PTS (\_PTS)
00000094:   ArgCount 1; NotSerialized
00000095:   Sleep
00000097:     0x0f


Comment 12 jordan hargrave 2004-12-08 01:04:16 UTC
Created attachment 108082 [details]
ACPI debug output during init 0 on PE1750

Comment 13 jordan hargrave 2004-12-08 01:05:23 UTC
System is hanging within acpi_enter_sleep_state_prep; the call to 
acpi_evaluate_object("_PTS") never returns.

Comment 14 jordan hargrave 2004-12-09 20:03:12 UTC
Bumping this to a Sev 1 as we reproduced this on a 2800 (X26 BIOS), 
but have not seen with a 2850.

Reproduced on the following systems:
2800
1750
4600

init 0 works on the base 2.6.9 and 2.6.10 kernels, as well as beta1 
kernel.  Reverting to base 2.6.9 ACPI code did NOT fix the issue; so 
it is being caused by something else.

Comment 16 Len Brown 2004-12-09 22:22:02 UTC
Are there any kernel.org kernels that fail,
or is this regression specific to the RHEL4 kernels
after 2.6.9-1.648?

Is it always true that upon a failure, the system
never returns from acpi_evaluate_object("_PTS")?

I don't have an archive of RHEL kernel trees here;
and it isn't clear that there are analogous upstream
kernel changes associated with this failure.
Can you attach the before/after of the source changes
you made to make the regression go away?


Comment 17 Len Brown 2004-12-09 22:55:17 UTC
re: comment #11
is the 2600 an example of a system with a simple _PTS
that does not fail?


Comment 18 Len Brown 2004-12-09 23:12:12 UTC
i wonder if this is due to the changes in acpi_os_sleep()...

Comment 19 jordan hargrave 2004-12-09 23:51:50 UTC
2850 works, 2800 does not. Both have simple _PTS methods.
Correct in the testing that I've seen that the system does not return 
from _PTS method.

Comment 20 jordan hargrave 2004-12-10 09:00:46 UTC
Looks like problem  might be in the linux-2.6.9-kexec.patch.  At 
shutdown, the 8259 is masked off on all interrupts . 
The i8259A_shutdown function is a new function 
in /arch/i386/kernel/i8259.c.  

So far I've been able to isolate it into masking off IRQ4 or IRQ5 (Is 
this disabling the SMI?)

Comment 21 Len Brown 2004-12-12 06:54:41 UTC
Huh? i8259A_shutdown? -- in the (remote) event you mean "lapic_shutdown", 
then note if this 2.6.10-ism got into RHEL4, then it may need to be 
updated per these two 2.6.10 patches: 
 
http://linux.bkbits.net:8080/linux-2.6/cset@41ae020fyPpvz9mhbi1ycuqRZH6kJQ 
http://linux.bkbits.net:8080/linux-2.6/cset@41ae14advqqMGMgR3rtIQw0iN6c29w 
 
Where can I find a copy of  linux-2.6.9-kexec.patch? 
No, masking IRQ5 or IRQ6 should have no effect on SMI. 
However, SMM is very tricky, and sometimes it is tricked 
out when the OS makes changes to hardware state that 
the BIOS didn't expect. 
 
BTW. are all the machines that fail SMP? 
If so, do they still fail if booted with maxcpus=1, 
or "maxcpus=1" "nolapic"? 
 

Comment 22 jordan hargrave 2004-12-13 15:16:33 UTC
Yeah, it's bizzare. There are two lines in i8259A_shutdown, which 
mask off the interrupt bits:
out(0xFF,0x21);
out(0xFF,0xA1);

If I comment out the first line (0x21) then the system shuts down 
properly.  Also have tried various combinations.
0xFF - fails
0xF0 - system shuts down
0xC0 - fails

which seems odd to me. Maybe a combination of interrupts working 
together here?

The problem is seen in both SMP and UP kernels.

The linkx-2.6.9-kexec.patch is in the source of the -648 kernel.  
Install the RPM then the patch is in /usr/src/redhat/SOURCES


Comment 23 jordan hargrave 2004-12-14 17:39:01 UTC
0xFE works, so wondering is this a timer tick issue? Does the ACPI
code require the timer tick to be active?

Comment 24 jordan hargrave 2004-12-20 16:58:35 UTC
Problem occurs during call to acpi_os_sleep.  Unfortunately
i8259A_shutdown has already turned off the timer tick interrupt, so
the call to schedule_timeout() in osl.c:acpi_os_sleep never returns.
This causes a hang instead of the system power off.

Comment 25 jordan hargrave 2005-01-21 16:12:37 UTC
Any updates on this?  Does anyone know why interrupts are masked off 
in i8259A_shutdown?

Comment 26 Amit Bhutani 2005-01-25 20:05:09 UTC
Changing the title to reflect the Update in which a fix for this 
issue has been committed or being tracked for..

Comment 27 Len Brown 2005-03-01 07:42:33 UTC
same issue tracked upstream in mm tree: 
http://bugzilla.kernel.org/show_bug.cgi?id=4041 

Comment 28 Dave Jones 2005-03-01 19:03:51 UTC
I dropped the kexec patch in the latest builds. this should fix the issue.
(it did when the same thing affected Fedora).


Comment 29 Amit Bhutani 2005-04-14 01:24:19 UTC
Raghavendra from Dell has regressed and confirms that this issue is resolved 
in U1 beta. Closing. Thanks!

Comment 30 David Ruggiero 2005-05-24 20:08:48 UTC
I am still seeing this issue with a PowerEdge 1600SC (latest A12 bios) and
RHEL4, kernel 2.6.9-5.0.5.EL . Does the "U1 beta" mentioned above containing the
fix imply a later patch/version of the kernel than this?


Comment 31 Geoff Gustafson 2005-05-26 22:08:12 UTC
Yes, U1 beta was kernel-2.6.9-6.37.EL.



Note You need to log in before you can comment on or make changes to this bug.