Bug 672621

Summary: System PANICS during RHEL5u5 install
Product: Red Hat Enterprise Linux 5 Reporter: PaulB <pbunyan>
Component: kernelAssignee: David Milburn <dmilburn>
Status: CLOSED NEXTRELEASE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.5CC: dmilburn, jarod, jburke, jfeeney, peterm, shiyer
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-10-24 01:29:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
SerialOutput
none
per515_RHEL5u5_PANIC_Reproduced none

Description PaulB 2011-01-25 18:19:41 UTC
Created attachment 475235 [details]
SerialOutput

Description of problem:
Installing RHEL5-Server-U5_nfs-x86_64 on the systems listed in comment#2 we get the following kernel PANIC:
=========================================================
Code: 7e f9 e9 f9 fe ff ff f3 90 83 3f 00 7e f9 e9 f8 fe ff ff f3 
Kernel panic - not syncing: nmi watchdog
 BUG: warning at kernel/panic.c:137/panic() (Not tainted)

Call Trace:
 <NMI>  [<ffffffff80092d73>] panic+0x1da/0x1eb
 [<ffffffff8006caef>] _show_stack+0xdb/0xea
 [<ffffffff8006cbe2>] show_registers+0xe4/0x100
 [<ffffffff800662c5>] die_nmi+0x66/0xa3
 [<ffffffff80066a0b>] nmi_watchdog_tick+0x157/0x1d3
 [<ffffffff80066629>] default_do_nmi+0x81/0x225
 [<ffffffff80066896>] do_nmi+0x43/0x61
 [<ffffffff80065eef>] nmi+0x7f/0x88
 [<ffffffff8003c71c>] __ide_dma_off_quietly+0x0/0x26
 [<ffffffff80065c0b>] .text.lock.spinlock+0x11/0x30
 <<EOE>>  [<ffffffff801d243c>] atiixp_ide_dma_host_off+0x23/0x8d
 [<ffffffff8003c738>] __ide_dma_off_quietly+0x1c/0x26
 [<ffffffff801df787>] do_reset1+0x50/0x1c1
 [<ffffffff801deb9b>] __ide_error+0x1bc/0x1d7
 [<ffffffff80026891>] ide_wait_stat+0xfb/0x110
 [<ffffffff8000ef44>] ide_do_request+0x43a/0x77d
 [<ffffffff80143dc4>] elv_insert+0xac/0x1c0
 [<ffffffff80041e16>] ide_do_drive_cmd+0xc0/0x116
 [<ffffffff88254605>] :ide_cd:cdrom_queue_packet_command+0x46/0xe2
 [<ffffffff801de6bc>] ide_init_drive_cmd+0x10/0x24
 [<ffffffff88254914>] :ide_cd:cdrom_lockdoor+0x64/0xe1
 [<ffffffff801452e1>] blk_end_sync_rq+0x0/0x2e
 [<ffffffff8012f99f>] selinux_socket_unix_may_send+0x52/0x5e
 [<ffffffff88237526>] :cdrom:cdrom_release+0x190/0x1f4
 [<ffffffff8002e511>] __wake_up+0x38/0x4f
 [<ffffffff80047be1>] skb_dequeue+0x48/0x50
 [<ffffffff88254de5>] :ide_cd:idecd_release+0x2c/0x43
 [<ffffffff800e5da2>] __blkdev_put+0x6d/0x169
 [<ffffffff80012ac5>] __fput+0xd3/0x1bd
 [<ffffffff80023bd1>] filp_close+0x5c/0x64
 [<ffffffff8001dff3>] sys_close+0x88/0xbd
 [<ffffffff8005e28d>] tracesys+0xd5/0xe0

BUG: warning at drivers/input/serio/i8042.c:846/i8042_panic_blink() (Not tainte)
=========================================================

Version-Release number of selected component (if applicable):
2.6.18-194.el5

How reproducible:
Reserve system listed in comment#2 and install RHEL5-Server-U5_nfs-x86_64.

Actual results:
System PANICS.

Expected results:
Installation should be successful.

Additional info:
I have attached file containing serial output to this BZ.
Prior to the PANIC, a few lines in the output caught my attention:
<-SNIP->
�Red Hat nash version 5.1.19.6 starting
mpt2sas0: failure at drivers/scsi/mpt2sas/mpt2sas_scsih.c:4046/_scsih_add_devic!
<-SNIP->
Starting kernel logger: [  OK  ]
powernow-k8: Pre-initialization of ACPI failed
powernow-k8: Your BIOS does not provide _PSS objects.  PowerNow! does not work .
powernow-k8: Your BIOS does not provide _PSS objects.  PowerNow! does not work .
<-SNIP->
Starting anamon: [  OK  ]
Starting smartd: hda: drive_cmd: status=0x58 { DriveReady SeekComplete DataRequ}
ide: failed opcode was: 0xa1
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: status timeout: status=0xd8 { Busy }
ide: failed opcode was: unknown
NMI Watchdog detected LOCKUP on CPU 10
CPU 10 
<-SNIP->


-pbunyan

Comment 2 Prarit Bhargava 2011-02-04 18:44:48 UTC
Paul, is this reproducible?

P.

Comment 3 David Milburn 2011-02-04 18:55:37 UTC
Prarit,

I can reproduce these errors on dell-per415-01 on an upstream kernel

hda: drive not ready for command
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
hda: possibly failed opcode: 0xa1


[root@dell-per415-01 ~]# uname -a
Linux dell-per415-01.lab.bos.redhat.com 2.6.37 #1 SMP Tue Feb 1 10:12:42 EST 2011 x86_64 x86_64 x86_64 GNU/Linux

Comment 5 PaulB 2011-02-07 13:35:27 UTC
Created attachment 477411 [details]
per515_RHEL5u5_PANIC_Reproduced

Comment 6 Prarit Bhargava 2011-02-10 12:34:50 UTC
(In reply to comment #3)
> Prarit,
> 
> I can reproduce these errors on dell-per415-01 on an upstream kernel
> 
> hda: drive not ready for command
> hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
> hda: possibly failed opcode: 0xa1
> 
> 
> [root@dell-per415-01 ~]# uname -a
> Linux dell-per415-01.lab.bos.redhat.com 2.6.37 #1 SMP Tue Feb 1 10:12:42 EST
> 2011 x86_64 x86_64 x86_64 GNU/Linux

David ... I wonder if this is the "running smartd on a non-smartd capable drive leads to a system panic" issue I've heard about?

I'll try and grab a system to see what is going on...

P.

Comment 7 David Milburn 2011-02-10 14:12:26 UTC
Prarit,

You are correct, I stopped the smartd and no longer saw the dmesg output.

[root@dell-per415-01 ~]# smartctl -i /dev/hda
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     ����������������������������������������
Serial Number:    10100405173221  ����
Firmware Version: ��������
User Capacity:    2,199,023,255,040 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   1
ATA Standard is:  Not recognized. Minor revision code: 0xffff
Local Time is:    Thu Feb 10 09:08:06 2011 EST
SMART is only available in ATA Version 3 Revision 3 or greater.
We will try to proceed in spite of this.
SMART support is: Unavailable - Packet Interface Devices [this device: CD/DVD] don't support ATA SMART
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

Previously, I had noticed wierd status (BUSY_STAT | READY_STAT), and I had
tried to increase the wait time, but still could reproduce.

Thanks,
David

Comment 8 David Milburn 2011-02-10 14:17:15 UTC
Also, to note, John mentioned in previous email this system had been certified
with samsung sh-s162L, different drive so we didn't see the problem before.

Comment 9 David Milburn 2011-02-10 15:02:05 UTC
Looks like maybe spinlock issue in ide/ati driver.