Hide Forgot
Created attachment 475235 [details] SerialOutput Description of problem: Installing RHEL5-Server-U5_nfs-x86_64 on the systems listed in comment#2 we get the following kernel PANIC: ========================================================= Code: 7e f9 e9 f9 fe ff ff f3 90 83 3f 00 7e f9 e9 f8 fe ff ff f3 Kernel panic - not syncing: nmi watchdog BUG: warning at kernel/panic.c:137/panic() (Not tainted) Call Trace: <NMI> [<ffffffff80092d73>] panic+0x1da/0x1eb [<ffffffff8006caef>] _show_stack+0xdb/0xea [<ffffffff8006cbe2>] show_registers+0xe4/0x100 [<ffffffff800662c5>] die_nmi+0x66/0xa3 [<ffffffff80066a0b>] nmi_watchdog_tick+0x157/0x1d3 [<ffffffff80066629>] default_do_nmi+0x81/0x225 [<ffffffff80066896>] do_nmi+0x43/0x61 [<ffffffff80065eef>] nmi+0x7f/0x88 [<ffffffff8003c71c>] __ide_dma_off_quietly+0x0/0x26 [<ffffffff80065c0b>] .text.lock.spinlock+0x11/0x30 <<EOE>> [<ffffffff801d243c>] atiixp_ide_dma_host_off+0x23/0x8d [<ffffffff8003c738>] __ide_dma_off_quietly+0x1c/0x26 [<ffffffff801df787>] do_reset1+0x50/0x1c1 [<ffffffff801deb9b>] __ide_error+0x1bc/0x1d7 [<ffffffff80026891>] ide_wait_stat+0xfb/0x110 [<ffffffff8000ef44>] ide_do_request+0x43a/0x77d [<ffffffff80143dc4>] elv_insert+0xac/0x1c0 [<ffffffff80041e16>] ide_do_drive_cmd+0xc0/0x116 [<ffffffff88254605>] :ide_cd:cdrom_queue_packet_command+0x46/0xe2 [<ffffffff801de6bc>] ide_init_drive_cmd+0x10/0x24 [<ffffffff88254914>] :ide_cd:cdrom_lockdoor+0x64/0xe1 [<ffffffff801452e1>] blk_end_sync_rq+0x0/0x2e [<ffffffff8012f99f>] selinux_socket_unix_may_send+0x52/0x5e [<ffffffff88237526>] :cdrom:cdrom_release+0x190/0x1f4 [<ffffffff8002e511>] __wake_up+0x38/0x4f [<ffffffff80047be1>] skb_dequeue+0x48/0x50 [<ffffffff88254de5>] :ide_cd:idecd_release+0x2c/0x43 [<ffffffff800e5da2>] __blkdev_put+0x6d/0x169 [<ffffffff80012ac5>] __fput+0xd3/0x1bd [<ffffffff80023bd1>] filp_close+0x5c/0x64 [<ffffffff8001dff3>] sys_close+0x88/0xbd [<ffffffff8005e28d>] tracesys+0xd5/0xe0 BUG: warning at drivers/input/serio/i8042.c:846/i8042_panic_blink() (Not tainte) ========================================================= Version-Release number of selected component (if applicable): 2.6.18-194.el5 How reproducible: Reserve system listed in comment#2 and install RHEL5-Server-U5_nfs-x86_64. Actual results: System PANICS. Expected results: Installation should be successful. Additional info: I have attached file containing serial output to this BZ. Prior to the PANIC, a few lines in the output caught my attention: <-SNIP-> �Red Hat nash version 5.1.19.6 starting mpt2sas0: failure at drivers/scsi/mpt2sas/mpt2sas_scsih.c:4046/_scsih_add_devic! <-SNIP-> Starting kernel logger: [ OK ] powernow-k8: Pre-initialization of ACPI failed powernow-k8: Your BIOS does not provide _PSS objects. PowerNow! does not work . powernow-k8: Your BIOS does not provide _PSS objects. PowerNow! does not work . <-SNIP-> Starting anamon: [ OK ] Starting smartd: hda: drive_cmd: status=0x58 { DriveReady SeekComplete DataRequ} ide: failed opcode was: 0xa1 hda: status error: status=0x58 { DriveReady SeekComplete DataRequest } ide: failed opcode was: unknown hda: drive not ready for command hda: status timeout: status=0xd8 { Busy } ide: failed opcode was: unknown NMI Watchdog detected LOCKUP on CPU 10 CPU 10 <-SNIP-> -pbunyan
Paul, is this reproducible? P.
Prarit, I can reproduce these errors on dell-per415-01 on an upstream kernel hda: drive not ready for command hda: status error: status=0x58 { DriveReady SeekComplete DataRequest } hda: possibly failed opcode: 0xa1 [root@dell-per415-01 ~]# uname -a Linux dell-per415-01.lab.bos.redhat.com 2.6.37 #1 SMP Tue Feb 1 10:12:42 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
Created attachment 477411 [details] per515_RHEL5u5_PANIC_Reproduced
(In reply to comment #3) > Prarit, > > I can reproduce these errors on dell-per415-01 on an upstream kernel > > hda: drive not ready for command > hda: status error: status=0x58 { DriveReady SeekComplete DataRequest } > hda: possibly failed opcode: 0xa1 > > > [root@dell-per415-01 ~]# uname -a > Linux dell-per415-01.lab.bos.redhat.com 2.6.37 #1 SMP Tue Feb 1 10:12:42 EST > 2011 x86_64 x86_64 x86_64 GNU/Linux David ... I wonder if this is the "running smartd on a non-smartd capable drive leads to a system panic" issue I've heard about? I'll try and grab a system to see what is going on... P.
Prarit, You are correct, I stopped the smartd and no longer saw the dmesg output. [root@dell-per415-01 ~]# smartctl -i /dev/hda smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: ���������������������������������������� Serial Number: 10100405173221 ���� Firmware Version: �������� User Capacity: 2,199,023,255,040 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 1 ATA Standard is: Not recognized. Minor revision code: 0xffff Local Time is: Thu Feb 10 09:08:06 2011 EST SMART is only available in ATA Version 3 Revision 3 or greater. We will try to proceed in spite of this. SMART support is: Unavailable - Packet Interface Devices [this device: CD/DVD] don't support ATA SMART A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. Previously, I had noticed wierd status (BUSY_STAT | READY_STAT), and I had tried to increase the wait time, but still could reproduce. Thanks, David
Also, to note, John mentioned in previous email this system had been certified with samsung sh-s162L, different drive so we didn't see the problem before.
Looks like maybe spinlock issue in ide/ati driver.