Bug 506242 - irq timeout message resulting in system hanging
Summary: irq timeout message resulting in system hanging
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.2
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Prarit Bhargava
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-06-16 10:53 UTC by cormac
Modified: 2009-10-20 18:37 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-10-20 18:37:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description cormac 2009-06-16 10:53:26 UTC
Description of problem:
Root filsystem disk reporting irq time out and system goes into unresponsive state.  Hard reboot required to rectify the issue.


Version-Release number of selected component (if applicable):  

Kernel version 2.6.18-53.el5


How reproducible:

system setup with oracle 10G running.  No actually reproduce steps as it occurs over time and this is the second instance in a number of weeks.


Steps to Reproduce:
1.  IBM x3950M2 hardware setup with external SAN.
2.  System has oracle 10G running on it.
3.  A number of weeks ago system was unresponsive and required a hardware reset.
4.  Examined /var/log/messages and there was no info related to any issues with syste.
5.  Increased logging levels in /var/log/messages incase we ran into issue in future
6.  issue occured again overnight with system totally unresponsive. /var/log/messages has a number of unknowing messages as below:

Jun 15 20:32:11 $HOSTNAME setroubleshoot:      SELinux is preventing access to files with the label, file_t.      For complete SELinux messages. run sealert -l c6f5dcfc-9982-4261-bfae-330a6f231206

Jun 15 20:54:17 $HOSTNAME kernel: hda: irq timeout: status=0xd0 { Busy }
Jun 15 20:54:17 $HOSTNAME kernel: ide: failed opcode was: unknown
Jun 15 20:54:47 $HOSTNAME kernel: hda: ATAPI reset timed-out, status=0xd0
Jun 15 20:55:18 $HOSTNAME kernel: ide0: reset timed-out, status=0xd0
Jun 15 20:55:22 $HOSTNAME kernel: hda: status timeout: status=0xd0 { Busy }
Jun 15 20:55:22 $HOSTNAME kernel: ide: failed opcode was: unknown
Jun 15 20:55:22 $HOSTNAME kernel: hda: drive not ready for command
Jun 15 20:55:52 $HOSTNAME kernel: hda: ATAPI reset timed-out, status=0xd0
Jun 15 20:56:22 $HOSTNAME kernel: ide0: reset timed-out, status=0xd0
Jun 15 20:56:31 $HOSTNAME kernel: BUG: soft lockup detected on CPU#21!
Jun 15 20:56:31 $HOSTNAME kernel:
Jun 15 20:56:31 $HOSTNAME kernel: Call Trace:
Jun 15 20:56:31 $HOSTNAME kernel:  <IRQ>  [<ffffffff800b50fa>] softlockup_tick+0xd5/0xe7
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff800930e2>] update_process_times+0x42/0x68
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff800746e3>] smp_local_timer_interrupt+0x23/0x47
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff80074da5>] smp_apic_timer_interrupt+0x41/0x47
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff8005bc8e>] apic_timer_interrupt+0x66/0x6c
Jun 15 20:56:31 $HOSTNAME kernel:  <EOI>  [<ffffffff80062ad0>] _spin_unlock_irqrestore+0x8/0x9
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff8000ae66>] ide_end_request+0xf0/0xfc
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff8000edc7>] ide_do_request+0x708/0x787
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff8003d03b>] lock_timer_base+0x1b/0x3c
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff80031c4f>] del_timer+0x4e/0x57
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff80134f37>] elv_insert+0xd6/0x1f7
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff801367d6>] blk_execute_rq_nowait+0x7e/0x9a
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff80136890>] blk_execute_rq+0x9e/0xce
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff80139ac0>] sg_io+0x235/0x333
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff8013a036>] scsi_cmd_ioctl+0x1c3/0x3a6
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff801c0af4>] generic_ide_ioctl+0x1f/0x50c
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff881e0d8b>] :cdrom:cdrom_ioctl+0x31/0xc18
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff8000a2e0>] __link_path_walk+0xdf8/0xf42
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff881fad4e>] :ide_cd:idecd_ioctl+0x13f/0x159
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff8011c70b>] avc_has_perm+0x43/0x55
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff80138055>] blkdev_driver_ioctl+0x5d/0x72
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff801386a9>] blkdev_ioctl+0x63f/0x69a
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff8011d242>] inode_has_perm+0x56/0x63
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff800da67d>] blkdev_open+0x0/0x4f
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff800da6b7>] blkdev_open+0x3a/0x4f
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff8001e11c>] __dentry_open+0x101/0x1dc
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff800d9af4>] block_ioctl+0x1b/0x1f
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff8003fc22>] do_ioctl+0x21/0x6b
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff8002fc67>] vfs_ioctl+0x248/0x261
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff8004a242>] sys_ioctl+0x59/0x78
Jun 15 20:56:31 $HOSTNAME kernel:  [<ffffffff8005b28d>] tracesys+0xd5/0xe0
Jun 15 20:56:31 $HOSTNAME kernel:

  
Actual results:


Expected results:


Additional info:

Comment 1 Prarit Bhargava 2009-06-29 13:35:01 UTC
Cormac,

I've seen similar reports to this BZ -- IBM x3XXX systems hanging in IDE/CDROM access.

Can you verify that you are running the latest FW, and there are no HW upgrades.  In some of the other reported cases updating to the latest FW or performing a HW upgrade seems to resolve the problem.

P.

Comment 2 cormac 2009-07-10 13:12:42 UTC
I will take a look at the current firmware verison on the system and see if it needs updating.  Thankfully we have not encountered the issue since we last saw it.  

I will provide feedback early next week (week starting 13th July 2009).

Cormac.

Comment 3 Prarit Bhargava 2009-09-23 14:16:43 UTC
Cormac, any updates?

Thanks,

P.


Note You need to log in before you can comment on or make changes to this bug.