Bug 239376 - SCSI Driver timeouts during journal replay
SCSI Driver timeouts during journal replay
Status: CLOSED NOTABUG
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gfs (Show other bugs)
4
ppc64 Linux
medium Severity medium
: ---
: ---
Assigned To: Abhijith Das
GFS Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-05-07 18:56 EDT by Nate Straz
Modified: 2010-11-12 10:19 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-11-12 10:19:54 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Nate Straz 2007-05-07 18:56:31 EDT
Description of problem:

While doing GFS recovery testing on ppc64 I'm seeing SCSI errors on the nodes
which are trying to do journal replays.

Configuration:

4 pSeries servers w/ Emulex LP11000 HBAs.
Winchester FlashDisk SATA
QLogic FC Switch.

The cluster is using fence_apc for fencing.

Scenario:

Run an I/O load on nodes A, B, C, and D in the cluster.
Node A gets rebooted immediately (with -fin)
Node B fences node A with fence_apc
Node C gets the journal lock and attempts to replay the journal
Node C hits a SCSI error as shown below.

GFS: fsid=ppcs:ppcs2.2: jid=0: Looking at journal...
SCSI error : <2 0 0 0> return code = 0x6000000
end_request: I/O error, dev sdc, sector 3417316297
SCSI error : <2 0 0 4> return code = 0x6000000
end_request: I/O error, dev sdg, sector 3417340481
GFS: fsid=ppcs:ppcs2.2: fatal: I/O error
GFS: fsid=ppcs:ppcs2.2:   block = 427165504
GFS: fsid=ppcs:ppcs2.2:   function = gfs_dreread
GFS: fsid=ppcs:ppcs2.2:   file =
/builddir/build/BUILD/gfs-kernel-2.6.9-72/largesmp/src/gfs/dio.c, line = 576
GFS: fsid=ppcs:ppcs2.2:   time = 1178570667
GFS: fsid=ppcs:ppcs2.2: about to withdraw from the cluster

I have run the tests with HP MSA1000 hardware without hitting the SCSI errors. 
When I switched back to the Winchester the errors returned.


Version-Release number of selected component (if applicable):
kernel-largesmp-2.6.9-55.EL
GFS-kernel-largesmp-2.6.9-72.2

How reproducible:
100%

Steps to Reproduce:
1. 
2.
3.
  
Actual results:


Expected results:


Additional info:

The Emulex LP11000 is the only card certified for use in the pSeries hardware.
Comment 1 Kiersten (Kerri) Anderson 2007-11-02 14:42:24 EDT
Is this still happening?
Comment 2 Kiersten (Kerri) Anderson 2008-11-11 16:19:38 EST
Can we close this one?
Comment 3 Nate Straz 2008-11-11 16:27:53 EST
I'll probably hit this next time I try to do recovery testing.  Leave it open with needinfo set to so I can find it again.
Comment 5 Nate Straz 2010-11-12 10:19:54 EST
Probably due to non-IBM Emulex Fibre Channel card.

Note You need to log in before you can comment on or make changes to this bug.