Red Hat Bugzilla – Bug 239376
SCSI Driver timeouts during journal replay
Last modified: 2010-11-12 10:19:54 EST
Description of problem:
While doing GFS recovery testing on ppc64 I'm seeing SCSI errors on the nodes
which are trying to do journal replays.
4 pSeries servers w/ Emulex LP11000 HBAs.
Winchester FlashDisk SATA
QLogic FC Switch.
The cluster is using fence_apc for fencing.
Run an I/O load on nodes A, B, C, and D in the cluster.
Node A gets rebooted immediately (with -fin)
Node B fences node A with fence_apc
Node C gets the journal lock and attempts to replay the journal
Node C hits a SCSI error as shown below.
GFS: fsid=ppcs:ppcs2.2: jid=0: Looking at journal...
SCSI error : <2 0 0 0> return code = 0x6000000
end_request: I/O error, dev sdc, sector 3417316297
SCSI error : <2 0 0 4> return code = 0x6000000
end_request: I/O error, dev sdg, sector 3417340481
GFS: fsid=ppcs:ppcs2.2: fatal: I/O error
GFS: fsid=ppcs:ppcs2.2: block = 427165504
GFS: fsid=ppcs:ppcs2.2: function = gfs_dreread
GFS: fsid=ppcs:ppcs2.2: file =
/builddir/build/BUILD/gfs-kernel-2.6.9-72/largesmp/src/gfs/dio.c, line = 576
GFS: fsid=ppcs:ppcs2.2: time = 1178570667
GFS: fsid=ppcs:ppcs2.2: about to withdraw from the cluster
I have run the tests with HP MSA1000 hardware without hitting the SCSI errors.
When I switched back to the Winchester the errors returned.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
The Emulex LP11000 is the only card certified for use in the pSeries hardware.
Is this still happening?
Can we close this one?
I'll probably hit this next time I try to do recovery testing. Leave it open with needinfo set to so I can find it again.
Probably due to non-IBM Emulex Fibre Channel card.