Bug 103813

Summary: kernel crash at shutdown
Product: Red Hat Enterprise Linux 3 Reporter: Shinya Narahara <naraha_s>
Component: kernelAssignee: Larry Woodman <lwoodman>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: anderson, jparadis
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-04-21 15:02:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Shinya Narahara 2003-09-05 10:25:55 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.0.2) Gecko/20021216

Description of problem:
On kenrel-2.4.21-1.1931.2.421.ent, kernel can't shutdown
propely because of its crash.
"shutdown -r now" success anytime, but "shutdown -h now"
fail anytime.

Version-Release number of selected component (if applicable):
kenrel-2.4.21-1.1931.2.421.ent(also beta1 kernel)

How reproducible:
Always

Steps to Reproduce:
1. shutdown -h now
2.
3.
    

Actual Results:  kernel crash

Expected Results:  power off

Additional info:

Log's below at the shutdown time.
It seems that kernel can't power off and crashed.
I guess ACPI routine of this kernel causes this issue.
If we can't do power-off, it has great effects
for remote administration...

Sending all processes the TERM signal... 
Sending all processes the KILL signal... 
md: recovery thread got woken up ...
Syncing hardwaremd: recovery thread finished ...
clock to system time 
Turning off swap:  
Turning off quotas:  
Unmounting file systems:  
Halting system...
md: stopping all md devices.
flushing ide devices: hda hdb 
Power down.
halt[3691]: Unsupported data reference 8821862825984
Pid: 3691, comm:                 halt
EIP is at acpi_ev_acquire_global_lock [kernel] 0x190 (2.4.21-1.1931.2.349.2.2.ent)
psr : 0000101008026018 ifs : 8000000000000185 ip  : [<e0000000045c9e30>]    Not
tainted
unat: 0000000000000000 pfs : 0000000000000184 rsc : 0000000000000003
rnat: 0000101008026018 bsps: e0000000045c3ae0 pr  : 80000000f5569555
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270433f
b0  : e0000000045d9100 b6  : e0000000046b9340 b7  : e0000000045d3080
f6  : 1003ecacacacacacacaca f7  : 1003ecccccccccccccccd
f8  : 1003e00000000000025e1 f9  : 1003e0000000000010927
r1  : e000000004c9fd00 r2  : 0000000000000000 r3  : 0000000000980000
r8  : 0000000000004f30 r9  : 0000000000000000 r10 : 0000000000000000
r11 : 0000000000004f00 r12 : e00000007af7fc20 r13 : e00000007af78000
r14 : 000000000000004f r15 : 0000000000000030 r16 : e000000004c5fb39
r17 : 0000000000000098 r18 : e000000004c5fb3a r19 : c00000007f984f30
r20 : 0000000000000000 r21 : c000000000000000 r22 : 000000007f984f30
r23 : 0000000000000000 r24 : 000000007f984f30 r25 : e000000004b6cacd
r26 : 000000000000007f r27 : 0000000000000000 r28 : 000000007f984f30
r29 : 0000000000000002 r30 : 0000000000000000 r31 : 000000007f000000
Call Trace: 
[<e0000000044155c0>] sp=0xe00000007af7f7c0 bsp=0xe00000007af79838 show_stack
[kernel] 0x80
[<e00000000442fe10>] sp=0xe00000007af7f980 bsp=0xe00000007af79810 die [kernel] 0x1b0
[<e000000004430ef0>] sp=0xe00000007af7f980 bsp=0xe00000007af797c8 ia64_fault
[kernel] 0x170
[<e00000000440e680>] sp=0xe00000007af7fa80 bsp=0xe00000007af797c8
ia64_leave_kernel [kernel] 0x0
[<e0000000045c9e30>] sp=0xe00000007af7fc20 bsp=0xe00000007af797a0
acpi_ev_acquire_global_lock [kernel] 0x190
[<e0000000045d9100>] sp=0xe00000007af7fc20 bsp=0xe00000007af79788
acpi_ex_acquire_global_lock [kernel] 0x60
[<e0000000045cffd0>] sp=0xe00000007af7fc20 bsp=0xe00000007af79750
acpi_ex_write_data_to_field [kernel] 0x190
[<e0000000045d80a0>] sp=0xe00000007af7fc20 bsp=0xe00000007af79720
acpi_ex_store_object_to_node [kernel] 0xe0
[<e0000000045d7c80>] sp=0xe00000007af7fc30 bsp=0xe00000007af796f8 acpi_ex_store
[kernel] 0x1a0
[<e0000000045d3220>] sp=0xe00000007af7fc30 bsp=0xe00000007af796b8
acpi_ex_opcode_1A_1T_1R [kernel] 0x1a0
[<e0000000045c43e0>] sp=0xe00000007af7fc50 bsp=0xe00000007af79678
acpi_ds_exec_end_op [kernel] 0x620
[<e0000000045e5c40>] sp=0xe00000007af7fc50 bsp=0xe00000007af79618
acpi_ps_parse_loop [kernel] 0xb00
[<e0000000045e6d40>] sp=0xe00000007af7fd00 bsp=0xe00000007af795d8
acpi_ps_parse_aml [kernel] 0x3e0
[<e0000000045e83c0>] sp=0xe00000007af7fd00 bsp=0xe00000007af79580
acpi_psx_execute [kernel] 0x2a0
[<e0000000045de510>] sp=0xe00000007af7fd00 bsp=0xe00000007af79550
acpi_ns_execute_control_method [kernel] 0xb0
[<e0000000045de430>] sp=0xe00000007af7fd00 bsp=0xe00000007af79520
acpi_ns_evaluate_by_handle [kernel] 0x170
[<e0000000045de2b0>] sp=0xe00000007af7fd10 bsp=0xe00000007af794f0
acpi_ns_evaluate_by_name [kernel] 0x150
[<e0000000045e2350>] sp=0xe00000007af7fd20 bsp=0xe00000007af794a8
acpi_evaluate_object [kernel] 0x2b0
[<e0000000045dbea0>] sp=0xe00000007af7fd30 bsp=0xe00000007af79490
acpi_enter_sleep_state_prep [kernel] 0xe0
[<e000000004604180>] sp=0xe00000007af7fd60 bsp=0xe00000007af79480 acpi_power_off
[kernel] 0x20
[<e000000004417340>] sp=0xe00000007af7fd60 bsp=0xe00000007af79468
machine_power_off [kernel] 0x80
[<e0000000044abf20>] sp=0xe00000007af7fd60 bsp=0xe00000007af79418 sys_reboot
[kernel] 0x5c0
[<e00000000440e660>] sp=0xe00000007af7fe60 bsp=0xe00000007af79410
ia64_ret_from_syscall [kernel] 0x0

Comment 1 Shinya Narahara 2003-10-02 07:58:36 UTC
You are right, this issue doesn't occur on HP IPF system,
only occurs on HITACHI ColdFusion-2 system.
We guess this is caused by acpi bios or Linux ACPI Handler,
so we are investigating into this, to determine the cause
depend on OS or hardware/firmware.


Comment 2 Shinya Narahara 2003-10-14 10:16:25 UTC
ACPI Global Lock must be on a page which memory attribute is WB,
because ACPI Global Lock uses cmpxchg to rewrite a lock variable.
But after kernel-2.4.19, Linux sets memory attribute each 64MB.
Our Itanium machine has ACPI Global Lock set as UC attribute
because of memory hole near it. Thus cmpxchg causes kernel panic,
like "Unsupported reference fault".

EFI Spec desn't say our original SAL is wrong, but we've decided that
we change the SAL to avoid this issue.
So this'll never occur anymore.