Hide Forgot
Description of problem: From my understanding and I haven't looked at the code yet. Multipath time checks of good paths was modified to use the scsi block timeout assuming that timeout was below the default 20 second good path check. The typical scsi block timeout is 60 seconds. So the default multipath time check for good paths is 20 seconds. This can be validated by multipathd -k multipathd>show paths 2:0:0:16 sdq 65:0 0 [active][ready] XXXXXXX... 15/20 2:0:0:17 sdr 65:16 0 [active][ready] XXXXXXX... 15/20 2:0:0:18 sds 65:32 1 [active][ready] XXXXXXX... 15/20 2:0:0:19 sdt 65:48 0 [active][ready] XXXXXXX... 15/20 2:0:0:20 sdu 65:64 0 [active][ready] XXXXXXX... 15/20 We changed the scsi block timeout to 10 seconds and using /etc/rc.local for i in $(ls -d /sys/block/sd*/device/timeout); do echo "10" > $i; done #validate they are changed with cat /sys/block/*/device/timeout Then restart multipath /etc/init.d/multipathd restart multipathd -k multipathd>show paths 2:0:0:15 sdp 8:240 0 [active][ready] XXX....... 3/10 2:0:0:16 sdq 65:0 0 [active][ready] XXX....... 3/10 2:0:0:17 sdr 65:16 0 [active][ready] XXX....... 3/10 2:0:0:18 sds 65:32 1 [active][ready] XXX....... 3/10 2:0:0:19 sdt 65:48 0 [active][ready] XXX....... 3/10 2:0:0:20 sdu 65:64 0 [active][ready] XXX....... 3/10 2:0:0:21 sdv 65:80 1 [active][ready] XXX....... 3/10 2:0:0:22 sdw 65:96 1 [active][ready] XXX....... 3/10 2:0:0:23 sdx 65:112 1 [active][ready] XXX....... 3/10 2:0:0:24 sdy 65:128 1 [active][ready] XXX....... 3/10 2:0:1:0 sdz 65:144 0 [active][ready] XXX....... 3/10 So after the system is running for a while, it flips back to the default of 20 seconds. There must be something in the code that is flipping it back, like a hardware handler recheck. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: RHEL5u6 and backend storage is an EMC CX4-240
This is not correct. The only thing that determines how often multipathd checks the paths is the multipath.conf polling_interval parameter. This defaults to 5 seconds, but if a path is active, it will increase to 4 * polling_interval (or 20 seconds). You are thinking of the checker_timeout. If this parameter is not set in multipath.conf, multipathd will use the scsi timeout. This is used as a timeout for scsi commands. If the device doesn't respond to a scsi command within this time, the checker assumes the device has failed.
Sorry if I was not clear I was referring to the active path check, checker_timeout. 4*polling interval. So we don't set the checker_timeout, which means it should use scsi timeout. As soon as I restart multipath I see 10 seconds, but after a while it switches back to 20 seconds automatically. I'll take a look at setting the checker_timeout.