Bug 463421

Summary: Unable to Reboot
Product: Red Hat Enterprise Linux 4 Reporter: Qian Cai <qcai>
Component: kernelAssignee: Peter Martuccelli <peterm>
Status: CLOSED WONTFIX QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.7.zCC: duck, ebdoran, michael.hagmann, vgoyal
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 15:53:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 473047    
Attachments:
Description Flags
dmidecode
none
dmesg none

Description Qian Cai 2008-09-23 11:50:57 UTC
Created attachment 317462 [details]
dmidecode

Description of problem:
I have seen machines could hung forever here,

Unmounting pipe file systems:  
Unmounting file systems:  
Please stand by while rebooting the system...
md: stopping all md devices.
md: md0 switched to read-only mode.
Synchronizing SCSI cache for disk sda: 
Restarting system.

I have seen it on both UP and SMP Kernels.

I have also tried the following parameters without luck,

reboot=a
reboot=k
reboot=t (always hang)
reboot=b (always panic. A bug?)

Turning off quotas:  
Unmounting pipe file systems:  
Unmounting file systems:  
Please stand by while rebooting the system...
md: stopping all md devices.
md: md0 switched to read-only mode.
Synchronizing SCSI cache for disk sda: 
Restarting system.
warm reboot
general protection fault: 0000 [1] 
CPU 0 
Modules linked in: md5 ipv6 parport_pc lp parport autofs4 sunrpc cpufreq_powersave loop button battery ac ohci_hcd ehci_hcd tg3 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod sata_svw libata mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod
Pid: 4067, comm: reboot Not tainted 2.6.9-78.0.1.EL
RIP: 0010:[<ffffffff80119338>] <ffffffff80119338>{machine_restart+248}
RSP: 0018:000001022b7f1e20  EFLAGS: 00010006
RAX: 00000000000006e0 RBX: 0000000000001000 RCX: 0000000000000000
RDX: 0000000000000660 RSI: ffffffff80119427 RDI: 0000010000001087
RBP: 0000000001234567 R08: 00000000000927bf R09: 00000000000927c0
R10: 0000000000000246 R11: 0000ffff8046b980 R12: 00000000fee1dead
R13: 0000000000000008 R14: 0000000000000001 R15: 0000000000000000
FS:  0000002a95561b00(0000) GS:ffffffff80573b00(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003241ac1f50 CR3: 0000000000101000 CR4: 00000000000006e0
Process reboot (pid: 4067, threadinfo 000001022b7f0000, task 000001022ffea230)
Stack: 0000000000001000 0000000000000008 0000000000000006 0000000000002000 
       0000000000000000 0000000028121969 ffffffff8014e620 0000000000000246 
       ffffffff802ffc8c 0000003241ac1f50 
Call Trace:<ffffffff8014e620>{sys_reboot+871} <ffffffff802ffc8c>{lock_sock+599} 
       <ffffffff80125da1>{do_page_fault+577} <ffffffff801ad52e>{dput+56} 
       <ffffffff8019178a>{__fput+257} <ffffffff8018fd1b>{filp_close+103} 
       <ffffffff80110b5a>{system_call+126} 

Code: 48 cf 31 db 31 d2 e4 64 e6 80 a8 02 74 0a ff c2 81 fa ff ff 
RIP <ffffffff80119338>{machine_restart+248} RSP <000001022b7f1e20>
 <0>Kernel panic - not syncing: Oops

Version-Release number of selected component (if applicable):
kernel-2.6.9-78.0.1.EL

How reproducible:
I could reproduced it on amd-shanghai-01.rhts.bos.redhat.com. I have seen some other machines failed repetitive reboot test as well. I am not sure if they are the same thing,

http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=4395967
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=4396074

Steps to Reproduce:
1. reserve amd-shanghai-01.rhts.bos.redhat.com from RHTS.
2. crontab cron
   === cron ===
   @reboot echo 1 >>/root/reboot-count; sleep 60; reboot
  
Actual results:
The machine would hung after around 20-40 reboots.

Comment 1 Qian Cai 2008-09-23 11:51:32 UTC
Created attachment 317464 [details]
dmesg

Comment 2 Brian Doran 2008-10-07 19:04:13 UTC
I am seeing the exact same issue on a HP xw8600 workstation and rhel 5.1 32bit.  Only way to get it to reboot was with adding acpi=off to kernel stanza.  Was there ever a resolution on this?

Comment 3 Brian Maly 2008-10-07 19:38:54 UTC
Does anyone know if the problem exists with the RHEL5 kernel? And if so, how about the most recent upstream kernel?

Comment 4 Qian Cai 2008-10-08 11:17:55 UTC
It also failed for the latest RHEL 5.3 Beta and Fedora 8 Kernels, i.e. 2.6.18-118.el5 and 2.6.23.1-42.fc8.

Comment 6 Jiri Pallich 2012-06-20 15:53:34 UTC
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.