Bug 173196 - machine hangs with lpfc intensive IO+HT
Summary: machine hangs with lpfc intensive IO+HT
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Tom Coughlan
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-11-14 21:21 UTC by didi
Modified: 2007-11-30 22:07 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-12-15 13:37:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description didi 2005-11-14 21:21:41 UTC
Description of problem:
An HP DL380 with an Emulex LP10000 connected to an EMC Clariion CX300
hangs when doing intensive IO. Disabling HyperThreading in the BIOS solves
this.

Version-Release number of selected component (if applicable):
kernel version 2.4.21-37.ELsmp, with builtin lpfc 7.3.2.

How reproducible:
Not very easy to reproduce, but it does happen eventually.
Two examples:
One machine has informix, and sometimes crashes when doing dbimport of a DB
of a few GB. When running such imports in a loop, it usually crashes after
a few hours (10-20 times).
Another machine has a tape library connected and netbackup installed. When
running in a loop a backup of data that's on the FC storage, it crashes after
a few hours. Backup of local disks works well.
I tried to run varios copies of files from/to it and did not manage to cause
a crash in a shorter time.

Steps to Reproduce:
1.
2.
3.
  
Actual results:
The machine completely freezes. SysRQ combinations do not work.

Expected results:
The machine should continue working normally.

Additional info:
As mentioned, disabling HT prevents the hangs. I did not check if it also
happens with two real CPUs.

Comment 1 didi 2005-11-15 10:35:43 UTC
A small update - I tried the same dbimport loop with the older driver
lpfc_703 which is shipped as part of kernel-smp-2.4.21-37.EL. The machine
was stuck after 14 hours.

Comment 2 didi 2005-12-15 04:55:14 UTC
The problem was solved by upgrading EMC PowerPath (a non-free product that does
multipath) from 4.3.0 to 4.3.4.

Didi



Note You need to log in before you can comment on or make changes to this bug.