Description of problem: While doing GFS recovery testing on ppc64 I'm seeing SCSI errors on the nodes which are trying to do journal replays. Configuration: 4 pSeries servers w/ Emulex LP11000 HBAs. Winchester FlashDisk SATA QLogic FC Switch. The cluster is using fence_apc for fencing. Scenario: Run an I/O load on nodes A, B, C, and D in the cluster. Node A gets rebooted immediately (with -fin) Node B fences node A with fence_apc Node C gets the journal lock and attempts to replay the journal Node C hits a SCSI error as shown below. GFS: fsid=ppcs:ppcs2.2: jid=0: Looking at journal... SCSI error : <2 0 0 0> return code = 0x6000000 end_request: I/O error, dev sdc, sector 3417316297 SCSI error : <2 0 0 4> return code = 0x6000000 end_request: I/O error, dev sdg, sector 3417340481 GFS: fsid=ppcs:ppcs2.2: fatal: I/O error GFS: fsid=ppcs:ppcs2.2: block = 427165504 GFS: fsid=ppcs:ppcs2.2: function = gfs_dreread GFS: fsid=ppcs:ppcs2.2: file = /builddir/build/BUILD/gfs-kernel-2.6.9-72/largesmp/src/gfs/dio.c, line = 576 GFS: fsid=ppcs:ppcs2.2: time = 1178570667 GFS: fsid=ppcs:ppcs2.2: about to withdraw from the cluster I have run the tests with HP MSA1000 hardware without hitting the SCSI errors. When I switched back to the Winchester the errors returned. Version-Release number of selected component (if applicable): kernel-largesmp-2.6.9-55.EL GFS-kernel-largesmp-2.6.9-72.2 How reproducible: 100% Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: The Emulex LP11000 is the only card certified for use in the pSeries hardware.
Is this still happening?
Can we close this one?
I'll probably hit this next time I try to do recovery testing. Leave it open with needinfo set to so I can find it again.
Probably due to non-IBM Emulex Fibre Channel card.