Description of problem:
It appears that device mapper multipath used with fence_scsi ( scsi reservations ) and the tur path checker has problems with paths failing and reservation conflicts.
From what I could find we should be supporting such a configuration. Specific failure messages:
kernel: sd 1:0:7:0: reservation conflict
kernel: sd 1:0:7:0: SCSI error: return code = 0x00000018
kernel: end_request: I/O error, dev sds, sector 118904128
kernel: device-mapper: multipath: Failing path 65:32.
From the fence agent itself :
fenced: agent "fence_scsi" reports: Execuing [sg_persist -n -d /dev/dm-9 -o -A -K 63b40001 -S 63b40004 -T 5] Unable to execute sg_persist (/dev/dm-9).
fenced: fence "node-b" failed
fenced: fencing node "node-b"
kernel: sd 3:0:1:0: reservation conflict
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Install RHEL5.3
2. Add HBA, connect to fibre chan switch, and SAN storage - multiple paths
3. Utilise tur ( device selected ) path checker
4. Configure cluster
5. enable fence_scsi
Reservation conflicts , path failures
Paths should remain up, scsi reservation should be transparently passed to real devices via multipath and managed properly via path checker tur
We performed a divide and conquer approach and tested the following scenarious:
1. Remove device-mapper-multipath from the equation. Only present 1 device from the SAN instead of 16. Test to see if you can reproduce the problem.
The result was NO problems, scsi reservation appears to work
2. Keep zoning with only 1 disk path, and re-add device-mapper-multipath. Try to reproduce with multipath and only 1 path. Test to see if you can reproduce the problem.
Result was no problem scsi reservation works
3. Re-add some more paths (2-4) and attempt to reproduce the problem.
We see inconsistent results - Sometimes ONE host of the cluster sees reservation conflicts, with a regular time period when the errors are logged. On other occasions, all four hosts see the errors.
With multiple paths to the active/passive storage processors on the SAN, we see problems. More details will be provided in an update.