+++ This bug was initially created as a clone of Bug #566685 +++ Description of problem: On ALUA enabled setups, it is seen that dm-multipath fails to update its maps after a path state transition (when an active/optimized path transitions to an active/non-optimized path & vice versa). Version-Release number of selected component (if applicable): RHEL 5.4 GA (2.6.18-164.el5) device-mapper-multipath-0.4.7-30.el5 iscsi-initiator-utils-6.2.0.871-0.10.el5 ALUA settings are used in the multipath.conf - the ALUA priority callout (/sbin/mpath_prio_alua) is used with group-by-prio enabled along with the ALUA hardware handler (as per bug 562080). How reproducible: Always Steps to Reproduce: 1. Map an iSCSI lun (with ALUA enabled) to a RHEL 5.4 host. In this case, I have 1 active/optimized paths + 4 active/non-optimized paths to the lun. Configure dm-multipath on it as follows: # multipath -ll mpath1 (360a98000572d42746b4a555039386553) dm-3 NETAPP,LUN [size=2.0G][features=1 queue_if_no_path][hwhandler=1 alua][rw] \_ round-robin 0 [prio=50][enabled] \_ 11:0:0:1 sdk 8:160 [active][ready] \_ round-robin 0 [prio=40][enabled] \_ 7:0:0:1 sdg 8:96 [active][ready] \_ 8:0:0:1 sdh 8:112 [active][ready] \_ 9:0:0:1 sdi 8:128 [active][ready] \_ 10:0:0:1 sdj 8:144 [active][ready] The individual path priority weights & RTPGs are as follows: # /sbin/mpath_prio_alua -v /dev/sdk Target port groups are implicitly supported. Reported target port group is 4 [active/optimized] 50 # /sbin/mpath_prio_alua -v /dev/sdg Target port groups are implicitly supported. Reported target port group is 2 [active/non-optimized] 10 # /sbin/mpath_prio_alua -v /dev/sdh Target port groups are implicitly supported. Reported target port group is 1 [active/non-optimized] 10 # /sbin/mpath_prio_alua -v /dev/sdi Target port groups are implicitly supported. Reported target port group is 3 [active/non-optimized] 10 # /sbin/mpath_prio_alua -v /dev/sdj Target port groups are implicitly supported. Reported target port group is 1 [active/non-optimized] 10 2) Now run IO on the above multipath device. IOstats show up as: Before path state transition: avg-cpu: %user %nice %system %iowait %steal %idle 48.25 0.00 51.75 0.00 0.00 0.00 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdg 0.00 0.00 0.00 0 0 sdi 0.00 0.00 0.00 0 0 sdh 0.00 0.00 0.00 0 0 sdk 614.50 0.00 17184.00 0 34368 sdj 0.00 0.00 0.00 0 0 avg-cpu: %user %nice %system %iowait %steal %idle 28.32 0.00 43.86 17.29 0.00 10.53 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdg 0.00 0.00 0.00 0 0 sdi 0.00 0.00 0.00 0 0 sdh 0.00 0.00 0.00 0 0 sdk 5826.37 0.00 204573.13 0 411192 sdj 0.00 0.00 0.00 0 0 IO is running fine till here. 3) Now trigger a path state transition on the target storage array. In this case, the active/optimized path transitions to RTPG 2 i.e. sdg. And the original active/optimized path in RTPG 4 i.e. sdk, transitions to an active/non-optimized path as shown below: # /sbin/mpath_prio_alua -v /dev/sdk Target port groups are implicitly supported. Reported target port group is 4 [active/non-optimized] 10 # /sbin/mpath_prio_alua -v /dev/sdg Target port groups are implicitly supported. Reported target port group is 2 [active/optimized] 50 # /sbin/mpath_prio_alua -v /dev/sdh Target port groups are implicitly supported. Reported target port group is 1 [active/non-optimized] 10 # /sbin/mpath_prio_alua -v /dev/sdi Target port groups are implicitly supported. Reported target port group is 3 [active/non-optimized] 10 # /sbin/mpath_prio_alua -v /dev/sdj Target port groups are implicitly supported. Reported target port group is 1 [active/non-optimized] 10 But the IOstats now show up as follows: avg-cpu: %user %nice %system %iowait %steal %idle 3.99 0.00 12.22 83.29 0.00 0.50 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdg 725.50 0.00 23552.00 0 47104 sdi 736.00 0.00 25656.00 0 51312 sdh 639.50 0.00 23552.00 0 47104 sdk 0.00 0.00 0.00 0 0 sdj 752.00 0.00 26992.00 0 53984 avg-cpu: %user %nice %system %iowait %steal %idle 31.58 0.00 40.60 27.57 0.00 0.25 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdg 619.00 0.00 22016.00 0 44032 sdi 702.50 0.00 22016.00 0 44032 sdh 672.50 0.00 22016.00 0 44032 sdk 0.00 0.00 0.00 0 0 sdj 694.00 0.00 22168.00 0 44336 And multipath -ll shows up as: # multipath -ll mpath1 (360a98000572d42746b4a555039386553) dm-3 NETAPP,LUN [size=2.0G][features=1 queue_if_no_path][hwhandler=1 alua][rw] \_ round-robin 0 [prio=10][enabled] \_ 11:0:0:1 sdk 8:160 [active][ready] \_ round-robin 0 [prio=80][active] \_ 7:0:0:1 sdg 8:96 [active][ready] \_ 8:0:0:1 sdh 8:112 [active][ready] \_ 9:0:0:1 sdi 8:128 [active][ready] \_ 10:0:0:1 sdj 8:144 [active][ready] Obviously the multipath path groups are messed up. IO is now running through all the underlying devices of the 2nd path group i.e. sdg, sdh, sdi & sdj, whereas it should have been actually running on sdg alone (since that is the only active/optimized path available now). Actual results: IO is running on all underlying paths of the 2nd path group, after path state transition. Expected results: IO should have been running on the active/optimized path alone, after path state transition. Additional info: Restarting the multipathd daemon or running multipathd -k"reconfigure" properly reconfigures the multipath maps. But this should have been automatically handled by dm-multipath. --- Additional comment from marting on 2010-02-19 07:27:11 EST --- Created an attachment (id=395088) Multipath.conf for the above scenario
Tracking this for RHEL 4.9.
I'm not sure how much more work this is to do in RHEL4 than RHEL5, but I'll take a look.
There's a significant amount more work necessary to get this working properly in RHEL4 than RHEL5.
This BZ requests a significant change to the established RHEL 4 behavior. Currently in RHEL 4, the path groups are established when the device is configured, and they remain unchanged until an explicit reconfiguration is done. If we make this change, then path groups will change dynamically when the storage configuration changes. Although it is true that this is generally desirable, it is a change in behavior that may come as a surprise to existing users. This sort of change is not appropriate at this very advanced stage in the life of RHEL 4. (We are, by the way, still planning to deal with this in RHEL5.) The right way to handle this in RHEL 4 is to document the workaround (multipathd -k"reconfigure") in the Release Notes.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Proposed RHEL 4.9 Release Note: (Ben, please review.) When dm-multipath is used on a storage device that implements ALUA, and group-by-prio is enabled, then the path groups are established when the device is configured. The paths with the same priority are grouped together, the group priority is calculated as the sum of the path priorities, and the path group with the highest sum is selected for I/O. If a path's priority changes, the group priority is re-calculated, and the active path group may change. The path grouping is not changed, eventhough some members of the group may now have different priorities. If you wish to re-establish the path grouping after a change, then you must enter the command multipathd -k"reconfigure"
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,7 +1,3 @@ -Proposed RHEL 4.9 Release Note: - -(Ben, please review.) - -When dm-multipath is used on a storage device that implements ALUA, and group-by-prio is enabled, then the path groups are established when the device is configured. The paths with the same priority are grouped together, the group priority is calculated as the sum of the path priorities, and the path group with the highest sum is selected for I/O. If a path's priority changes, the group priority is re-calculated, and the active path group may change. The path grouping is not changed, eventhough some members of the group may now have different priorities. If you wish to re-establish the path grouping after a change, then you must enter the command +When dm-multipath is used on a storage device that implements ALUA, and group-by-prio is enabled, then the path groups are established when the device is configured. The paths with the same priority are grouped together, the group priority is calculated as the sum of the path priorities, and the path group with the highest sum is selected for I/O. If a path's priority changes, the group priority is re-calculated, and the active path group may change. The path grouping is not changed, even though some members of the group may now have different priorities. If you wish to re-establish the path grouping after a change, then you must enter the command multipathd -k"reconfigure"