Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 490633

Summary: multipath pp_alua: retry RTPG command if buffer too small
Product: Red Hat Enterprise Linux 5 Reporter: Bryn M. Reeves <bmr>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: high    
Version: 5.2CC: agk, bmarzins, bmr, christophe.varoqui, coughlan, cward, dwysocha, edamato, egoggin, heinzm, junichi.nomura, kueda, lmb, mbroz, prockai, tao, tranlan
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 11:47:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Resend RTPG in pp_alua if buffer too small
none
resend rtpg in pp_alua if buffer too small (2) none

Description Bryn M. Reeves 2009-03-17 12:32:11 UTC
Description of problem:
The pp_alua path prioritiser callout sends a remote target port groups (RTPG) command to storage controllers to determine the asymmetric access state of LUNs using an initial buffer size of 128 bytes.

If the returned table of target port descriptors (TPGs) exceeds this size the command will exit with failure and multipath will be unable to create a multipath map for the device. This prevents multipath being used with systems that support ALUA but return a TPGD table greater than 128 bytes.


Version-Release number of selected component (if applicable):
0.4.7-23.el5

How reproducible:
100% with appropriate storage (e.g. Incipient NSP, Fujitsu ETERNUS8000).

Steps to Reproduce:
1. Enable ALUA support on storage
2. Enable alua path prioritiser in multipath.conf
3. run multipath
  
Actual results:
multipath fails to configure maps for the ALUA device if the returned TPGD table is >80 bytes in length.

Expected results:
multipath succeeds in configuring maps for the ALUA device even if the returned TPGD table is >80 bytes in length.

Additional info:
Upstream fixed this two years ago:

diff --git a/path_priority/pp_alua/rtpg.c b/path_priority/pp_alua/rtpg.c
index 6922d9a..701f9d5 100644
commit fa74fbec129b3dc056d58ffc16155ccfd6863eab
Author: Christophe Varoqui <cvaroqui>
Date:   Thu Nov 16 00:21:17 2006 +0100

    [priority] adjust SCSI RTPG request buffer size in ALUA prioritizer
    
    The following patch adds error and size checking to the REPORT TARGET
    PORT GROUPS command issued by rtpg.c.  The old code issued a buffer of
    128 bytes, but never checked the return length.  The new code starts
    with a buffer of 128 bytes, but reallocates it if the buffer is too
    small (SCSI returns the necessary length in the response data).  This is
    more robust as it handles devices that consume more than 128 bytes for
    RTPG, like the Incipient NSP.
    
    I tried to handle errors, etc. as closely as I could to what is already
    there in the code.  Let me know if you'd prefer anything written
    differently.
    
    Brian Geisel

Comment 1 Bryn M. Reeves 2009-03-17 12:33:40 UTC
Created attachment 335518 [details]
Resend RTPG in pp_alua if buffer too small

Comment 2 Ben Marzinski 2009-04-08 21:39:14 UTC
Patch applied. Thanks.

Comment 4 Chris Ward 2009-07-03 18:27:29 UTC
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.

Comment 5 Bryn M. Reeves 2009-07-06 18:52:08 UTC
The patch that was merged in CVS is not the same as the one posted in comment #1 and doesn't fix the problem.

The original patch has this:

+       scsi_buflen = buf[0] << 24 | buf[1] << 16 | buf[2] << 8 | buf[3] + 4;
+       if (buflen < scsi_buflen) {

But the version in 0.4.7-27.el5 changes this to:

+       scsi_buflen = buf[0] << 24 | buf[1] << 16 | buf[2] << 8 | buf[3];
+       if (buflen < (scsi_buflen + 4)) {

This is fine, but the scsi_buflen variable is also used inside the if {} block to give the size of the new buffer:

        if (buflen < (scsi_buflen + 4)) {
                free(buf);
                buf = (unsigned char *)malloc(scsi_buflen);
                if (!buf) {
                        PRINT_DEBUG ("malloc failed: could not allocate"
                                "%u bytes\n", scsi_buflen);
                        return -RTPG_RTPG_FAILED;
                }
                buflen = scsi_buflen;
                memset(buf, 0, buflen);
                rc = do_rtpg(fd, buf, buflen);
                if (rc < 0)
                        goto out;
        }

If the "+ 4" is going to move out of the calculation of scsi_buflen then it needs to be added back in to each successive use of the variable. The current RHEL5 code ends up allocating a buffer that is 4 bytes too small and truncates the resulting rtpg buffer returned from the device.

Upstream seems to have the same bug.

Comment 9 Ben Marzinski 2009-07-07 18:45:50 UTC
Err... the attached patch in comment #1 doesn't include the +4 in the initial calculation of scsi_buflen. Aside from some extra whitespace, it is exactly like the patch I committed.

At any rate, I changed the code.  Could you please point me to the original patch, so that I can see if I'm missing anything else.

Comment 11 Bryn M. Reeves 2009-07-08 09:06:17 UTC
Created attachment 350906 [details]
resend rtpg in pp_alua if buffer too small (2)

Sorry, that's my fault then - I took the patch from upstream which also has these changes and didn't notice the difference when I attached it to the bug.

Attaching the original version which was described as being the same change as was submitted upstream but now I look at the dm-devel thread this wasn't the case:

http://www.redhat.com/archives/dm-devel/2006-November/msg00138.html

Comment 14 errata-xmlrpc 2009-09-02 11:47:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1377.html