Bug 490633 - multipath pp_alua: retry RTPG command if buffer too small
multipath pp_alua: retry RTPG command if buffer too small
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: device-mapper-multipath (Show other bugs)
5.2
All Linux
high Severity urgent
: rc
: ---
Assigned To: Ben Marzinski
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-03-17 08:32 EDT by Bryn M. Reeves
Modified: 2010-10-23 04:22 EDT (History)
17 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-02 07:47:07 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Resend RTPG in pp_alua if buffer too small (2.35 KB, patch)
2009-03-17 08:33 EDT, Bryn M. Reeves
no flags Details | Diff
resend rtpg in pp_alua if buffer too small (2) (1.56 KB, patch)
2009-07-08 05:06 EDT, Bryn M. Reeves
no flags Details | Diff

  None (edit)
Description Bryn M. Reeves 2009-03-17 08:32:11 EDT
Description of problem:
The pp_alua path prioritiser callout sends a remote target port groups (RTPG) command to storage controllers to determine the asymmetric access state of LUNs using an initial buffer size of 128 bytes.

If the returned table of target port descriptors (TPGs) exceeds this size the command will exit with failure and multipath will be unable to create a multipath map for the device. This prevents multipath being used with systems that support ALUA but return a TPGD table greater than 128 bytes.


Version-Release number of selected component (if applicable):
0.4.7-23.el5

How reproducible:
100% with appropriate storage (e.g. Incipient NSP, Fujitsu ETERNUS8000).

Steps to Reproduce:
1. Enable ALUA support on storage
2. Enable alua path prioritiser in multipath.conf
3. run multipath
  
Actual results:
multipath fails to configure maps for the ALUA device if the returned TPGD table is >80 bytes in length.

Expected results:
multipath succeeds in configuring maps for the ALUA device even if the returned TPGD table is >80 bytes in length.

Additional info:
Upstream fixed this two years ago:

diff --git a/path_priority/pp_alua/rtpg.c b/path_priority/pp_alua/rtpg.c
index 6922d9a..701f9d5 100644
commit fa74fbec129b3dc056d58ffc16155ccfd6863eab
Author: Christophe Varoqui <cvaroqui@zezette.localdomain>
Date:   Thu Nov 16 00:21:17 2006 +0100

    [priority] adjust SCSI RTPG request buffer size in ALUA prioritizer
    
    The following patch adds error and size checking to the REPORT TARGET
    PORT GROUPS command issued by rtpg.c.  The old code issued a buffer of
    128 bytes, but never checked the return length.  The new code starts
    with a buffer of 128 bytes, but reallocates it if the buffer is too
    small (SCSI returns the necessary length in the response data).  This is
    more robust as it handles devices that consume more than 128 bytes for
    RTPG, like the Incipient NSP.
    
    I tried to handle errors, etc. as closely as I could to what is already
    there in the code.  Let me know if you'd prefer anything written
    differently.
    
    Brian Geisel
Comment 1 Bryn M. Reeves 2009-03-17 08:33:40 EDT
Created attachment 335518 [details]
Resend RTPG in pp_alua if buffer too small
Comment 2 Ben Marzinski 2009-04-08 17:39:14 EDT
Patch applied. Thanks.
Comment 4 Chris Ward 2009-07-03 14:27:29 EDT
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.
Comment 5 Bryn M. Reeves 2009-07-06 14:52:08 EDT
The patch that was merged in CVS is not the same as the one posted in comment #1 and doesn't fix the problem.

The original patch has this:

+       scsi_buflen = buf[0] << 24 | buf[1] << 16 | buf[2] << 8 | buf[3] + 4;
+       if (buflen < scsi_buflen) {

But the version in 0.4.7-27.el5 changes this to:

+       scsi_buflen = buf[0] << 24 | buf[1] << 16 | buf[2] << 8 | buf[3];
+       if (buflen < (scsi_buflen + 4)) {

This is fine, but the scsi_buflen variable is also used inside the if {} block to give the size of the new buffer:

        if (buflen < (scsi_buflen + 4)) {
                free(buf);
                buf = (unsigned char *)malloc(scsi_buflen);
                if (!buf) {
                        PRINT_DEBUG ("malloc failed: could not allocate"
                                "%u bytes\n", scsi_buflen);
                        return -RTPG_RTPG_FAILED;
                }
                buflen = scsi_buflen;
                memset(buf, 0, buflen);
                rc = do_rtpg(fd, buf, buflen);
                if (rc < 0)
                        goto out;
        }

If the "+ 4" is going to move out of the calculation of scsi_buflen then it needs to be added back in to each successive use of the variable. The current RHEL5 code ends up allocating a buffer that is 4 bytes too small and truncates the resulting rtpg buffer returned from the device.

Upstream seems to have the same bug.
Comment 9 Ben Marzinski 2009-07-07 14:45:50 EDT
Err... the attached patch in comment #1 doesn't include the +4 in the initial calculation of scsi_buflen. Aside from some extra whitespace, it is exactly like the patch I committed.

At any rate, I changed the code.  Could you please point me to the original patch, so that I can see if I'm missing anything else.
Comment 11 Bryn M. Reeves 2009-07-08 05:06:17 EDT
Created attachment 350906 [details]
resend rtpg in pp_alua if buffer too small (2)

Sorry, that's my fault then - I took the patch from upstream which also has these changes and didn't notice the difference when I attached it to the bug.

Attaching the original version which was described as being the same change as was submitted upstream but now I look at the dm-devel thread this wasn't the case:

http://www.redhat.com/archives/dm-devel/2006-November/msg00138.html
Comment 14 errata-xmlrpc 2009-09-02 07:47:07 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1377.html

Note You need to log in before you can comment on or make changes to this bug.