Description of problem: The pp_alua path prioritiser callout sends a remote target port groups (RTPG) command to storage controllers to determine the asymmetric access state of LUNs using an initial buffer size of 128 bytes. If the returned table of target port descriptors (TPGs) exceeds this size the command will exit with failure and multipath will be unable to create a multipath map for the device. This prevents multipath being used with systems that support ALUA but return a TPGD table greater than 128 bytes. Version-Release number of selected component (if applicable): 0.4.7-23.el5 How reproducible: 100% with appropriate storage (e.g. Incipient NSP, Fujitsu ETERNUS8000). Steps to Reproduce: 1. Enable ALUA support on storage 2. Enable alua path prioritiser in multipath.conf 3. run multipath Actual results: multipath fails to configure maps for the ALUA device if the returned TPGD table is >80 bytes in length. Expected results: multipath succeeds in configuring maps for the ALUA device even if the returned TPGD table is >80 bytes in length. Additional info: Upstream fixed this two years ago: diff --git a/path_priority/pp_alua/rtpg.c b/path_priority/pp_alua/rtpg.c index 6922d9a..701f9d5 100644 commit fa74fbec129b3dc056d58ffc16155ccfd6863eab Author: Christophe Varoqui <cvaroqui> Date: Thu Nov 16 00:21:17 2006 +0100 [priority] adjust SCSI RTPG request buffer size in ALUA prioritizer The following patch adds error and size checking to the REPORT TARGET PORT GROUPS command issued by rtpg.c. The old code issued a buffer of 128 bytes, but never checked the return length. The new code starts with a buffer of 128 bytes, but reallocates it if the buffer is too small (SCSI returns the necessary length in the response data). This is more robust as it handles devices that consume more than 128 bytes for RTPG, like the Incipient NSP. I tried to handle errors, etc. as closely as I could to what is already there in the code. Let me know if you'd prefer anything written differently. Brian Geisel
Created attachment 335518 [details] Resend RTPG in pp_alua if buffer too small
Patch applied. Thanks.
~~ Attention - RHEL 5.4 Beta Released! ~~ RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner! If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity. Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value. Questions can be posted to this bug or your customer or partner representative.
The patch that was merged in CVS is not the same as the one posted in comment #1 and doesn't fix the problem. The original patch has this: + scsi_buflen = buf[0] << 24 | buf[1] << 16 | buf[2] << 8 | buf[3] + 4; + if (buflen < scsi_buflen) { But the version in 0.4.7-27.el5 changes this to: + scsi_buflen = buf[0] << 24 | buf[1] << 16 | buf[2] << 8 | buf[3]; + if (buflen < (scsi_buflen + 4)) { This is fine, but the scsi_buflen variable is also used inside the if {} block to give the size of the new buffer: if (buflen < (scsi_buflen + 4)) { free(buf); buf = (unsigned char *)malloc(scsi_buflen); if (!buf) { PRINT_DEBUG ("malloc failed: could not allocate" "%u bytes\n", scsi_buflen); return -RTPG_RTPG_FAILED; } buflen = scsi_buflen; memset(buf, 0, buflen); rc = do_rtpg(fd, buf, buflen); if (rc < 0) goto out; } If the "+ 4" is going to move out of the calculation of scsi_buflen then it needs to be added back in to each successive use of the variable. The current RHEL5 code ends up allocating a buffer that is 4 bytes too small and truncates the resulting rtpg buffer returned from the device. Upstream seems to have the same bug.
Err... the attached patch in comment #1 doesn't include the +4 in the initial calculation of scsi_buflen. Aside from some extra whitespace, it is exactly like the patch I committed. At any rate, I changed the code. Could you please point me to the original patch, so that I can see if I'm missing anything else.
Created attachment 350906 [details] resend rtpg in pp_alua if buffer too small (2) Sorry, that's my fault then - I took the patch from upstream which also has these changes and didn't notice the difference when I attached it to the bug. Attaching the original version which was described as being the same change as was submitted upstream but now I look at the dm-devel thread this wasn't the case: http://www.redhat.com/archives/dm-devel/2006-November/msg00138.html
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-1377.html