From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322) Description of problem: init 0 and poweroff no longer work on Dell PowerEdge 4600, 1600SC, 1750. These functions did work properly on RHEL4 Beta 1. The function acpi_power_off is in /drivers/acpi/sleep/poweroff.c. It is supposed to place the system in the S5 power state. There were several changes made between Beta1 and Beta2 in /drivers/acpi/hardware/hwsleep.c, specifically in acpi_enter_sleep_state(). When these changes were reverted to Beta1 code, the system powers down properly. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.Install RHEL4 Beta 2 on Dell PowerEdge (4600, 1600SC, 1750) 2.Issue an 'init 0' or 'poweroff' Actual Results: System shuts down services, displays: Power down acpi_power_off called then hangs. System does not power down Additional info:
Per Dell today: Amit says that they will retest this with the 751 kernel and post status.
The 751 kernel does not work, the system still hangs at acpi_power_off.
OK with Dell if we allow Len Brown of Intel (upstream ACPI maintainer) to access this issue?
Yes, these are shipping systems.
Adding Len Brown and Geoff Gustafson of Intel to the cc: list. Len: need you to jump in here, please!
init 0/poweroff work with the following kernels: 2.6.10-rc2 2.6.8-1.528-2.10 does not work with: 2.6.9-1.648 2.6.9-1.751
There's definitely a difference in the _PTS method on our systems: The 1750 and 4600 (systems that are failing) have similar (complex) PTS methods. All other systems that work have simple PTS methods. ;; 1750 000001aa: Method _PTS (\_PTS) 000001b1: ArgCount 1; NotSerialized 000001b2: If 000001b4: LEqual 000001b5: Arg0 000001b6: 0x04 000001b8: Store 000001b9: 0x18 000001bb: WKSL (000001a0) 000001bf: Or 000001c0: WKEN (000001a5) 000001c4: 0x01 000001c6: WKEN (000001a5) 000001ca: Else 000001cc: Store 000001cd: 0x18 000001cf: WKSL (000001a0) 000001d3: And 000001d4: WKEN (000001a5) 000001d8: 0xfe 000001da: WKEN (000001a5) 000001de: Sleep 000001e0: 0x0f ;; 2800 00000118: Method _PTS (\_PTS) 0000011f: ArgCount 1; NotSerialized 00000120: Sleep 00000122: 0x0f ;; 2600 0000008d: Method _PTS (\_PTS) 00000094: ArgCount 1; NotSerialized 00000095: Sleep 00000097: 0x0f
Created attachment 108082 [details] ACPI debug output during init 0 on PE1750
System is hanging within acpi_enter_sleep_state_prep; the call to acpi_evaluate_object("_PTS") never returns.
Bumping this to a Sev 1 as we reproduced this on a 2800 (X26 BIOS), but have not seen with a 2850. Reproduced on the following systems: 2800 1750 4600 init 0 works on the base 2.6.9 and 2.6.10 kernels, as well as beta1 kernel. Reverting to base 2.6.9 ACPI code did NOT fix the issue; so it is being caused by something else.
Are there any kernel.org kernels that fail, or is this regression specific to the RHEL4 kernels after 2.6.9-1.648? Is it always true that upon a failure, the system never returns from acpi_evaluate_object("_PTS")? I don't have an archive of RHEL kernel trees here; and it isn't clear that there are analogous upstream kernel changes associated with this failure. Can you attach the before/after of the source changes you made to make the regression go away?
re: comment #11 is the 2600 an example of a system with a simple _PTS that does not fail?
i wonder if this is due to the changes in acpi_os_sleep()...
2850 works, 2800 does not. Both have simple _PTS methods. Correct in the testing that I've seen that the system does not return from _PTS method.
Looks like problem might be in the linux-2.6.9-kexec.patch. At shutdown, the 8259 is masked off on all interrupts . The i8259A_shutdown function is a new function in /arch/i386/kernel/i8259.c. So far I've been able to isolate it into masking off IRQ4 or IRQ5 (Is this disabling the SMI?)
Huh? i8259A_shutdown? -- in the (remote) event you mean "lapic_shutdown", then note if this 2.6.10-ism got into RHEL4, then it may need to be updated per these two 2.6.10 patches: http://linux.bkbits.net:8080/linux-2.6/cset@41ae020fyPpvz9mhbi1ycuqRZH6kJQ http://linux.bkbits.net:8080/linux-2.6/cset@41ae14advqqMGMgR3rtIQw0iN6c29w Where can I find a copy of linux-2.6.9-kexec.patch? No, masking IRQ5 or IRQ6 should have no effect on SMI. However, SMM is very tricky, and sometimes it is tricked out when the OS makes changes to hardware state that the BIOS didn't expect. BTW. are all the machines that fail SMP? If so, do they still fail if booted with maxcpus=1, or "maxcpus=1" "nolapic"?
Yeah, it's bizzare. There are two lines in i8259A_shutdown, which mask off the interrupt bits: out(0xFF,0x21); out(0xFF,0xA1); If I comment out the first line (0x21) then the system shuts down properly. Also have tried various combinations. 0xFF - fails 0xF0 - system shuts down 0xC0 - fails which seems odd to me. Maybe a combination of interrupts working together here? The problem is seen in both SMP and UP kernels. The linkx-2.6.9-kexec.patch is in the source of the -648 kernel. Install the RPM then the patch is in /usr/src/redhat/SOURCES
0xFE works, so wondering is this a timer tick issue? Does the ACPI code require the timer tick to be active?
Problem occurs during call to acpi_os_sleep. Unfortunately i8259A_shutdown has already turned off the timer tick interrupt, so the call to schedule_timeout() in osl.c:acpi_os_sleep never returns. This causes a hang instead of the system power off.
Any updates on this? Does anyone know why interrupts are masked off in i8259A_shutdown?
Changing the title to reflect the Update in which a fix for this issue has been committed or being tracked for..
same issue tracked upstream in mm tree: http://bugzilla.kernel.org/show_bug.cgi?id=4041
I dropped the kexec patch in the latest builds. this should fix the issue. (it did when the same thing affected Fedora).
Raghavendra from Dell has regressed and confirms that this issue is resolved in U1 beta. Closing. Thanks!
I am still seeing this issue with a PowerEdge 1600SC (latest A12 bios) and RHEL4, kernel 2.6.9-5.0.5.EL . Does the "U1 beta" mentioned above containing the fix imply a later patch/version of the kernel than this?
Yes, U1 beta was kernel-2.6.9-6.37.EL.