Description of problem: For Target Portal Group IDs occupying the full 2 bytes in the RTPG response, the following group_id check in the alua_rtpg routine always fails in scsi_dh_alua.c: if (h->group_id == (ucp[2] << 8) + ucp[3]) { This causes the ALUA handler to wrongly identify the AAS of a specified device as well as incorrectly interpreting the supported AAS of the target as seen by the following entries in the /var/log/messages: "alua: port group 3ea state A supports tousna" "alua: port group 3e9 state A supports tousna" Version-Release number of selected component (if applicable): RHEL 5.6 (2.6.18-238.el5) Notes: This is because 'ucp' is wrongly defined in alua_rtpg as a character pointer instead of an unsigned character pointer. The below patch fixes the problem: --- scsi_dh_alua.c.orig 2011-01-14 16:48:37.000000000 +0530 +++ scsi_dh_alua.c 2011-01-16 10:27:54.000000000 +0530 @@ -535,7 +535,7 @@ static int alua_rtpg(struct scsi_device { struct scsi_sense_hdr sense_hdr; int len, k, off, valid_states = 0; - char *ucp; + unsigned char *ucp; unsigned err; unsigned long expiry, interval = 10;
This bug could break AAS handling in the ALUA handler. So it's important the fix makes it to 5.6.z itself.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-243.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: For a device that used a Target Portal Group (TPG) ID which occupied the full 2 bytes in the RTPG (Report Target Port Groups) response (with either byte exceeding the maximum value that may be stored in a signed char), the kernel's calculated TPG ID would never match the group_id that it should. As a result, this signed char overflow also caused the ALUA handler to incorrectly identify the AAS (Asymmetric Access State) of the specified device as well as incorrectly interpret the supported AAS of the target. With this update, the aforementioned issue has been addressed and no longer occurs.
Mike, I've setup a ALUA NetApp ONTAP LUN via 8 path, but noticed that multipath is still using scsi_dh for hardware handler and scsi_dh_alua.ko is not loaded. Does this bug need any special NetApp storage? If possible, please also provide a way for test this bug. Thanks
(In reply to comment #9) > Mike, > > I've setup a ALUA NetApp ONTAP LUN via 8 path, but noticed that multipath is > still using scsi_dh for hardware handler and scsi_dh_alua.ko is not loaded. > You would have to set hardware_handler to '1 alua' in the multipath.conf for that. And then of course restart the multipathd daemon to let the new setting take effect.
I am curious why we don't enable it in default setting. Does scsi_dh_alua.ko doesn't work well when ALUA disabled in ONTAP side (by igroup)? If so, it's understandable for not using "1 alua" in default setting of multipath
Not able to hit the problem of alua_rtpg or incorrect AAS issue. Sanity Check for following test: 1. ALUA correctly indentify AAS. 1. ALUA correctly group link. 1. multipath failover test with alua handler.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html