Bug 490633 - multipath pp_alua: retry RTPG command if buffer too small
Summary: multipath pp_alua: retry RTPG command if buffer too small
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: device-mapper-multipath
Version: 5.2
Hardware: All
OS: Linux
high
urgent
Target Milestone: rc
: ---
Assignee: Ben Marzinski
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-03-17 12:32 UTC by Bryn M. Reeves
Modified: 2018-10-20 01:35 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-02 11:47:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Resend RTPG in pp_alua if buffer too small (2.35 KB, patch)
2009-03-17 12:33 UTC, Bryn M. Reeves
no flags Details | Diff
resend rtpg in pp_alua if buffer too small (2) (1.56 KB, patch)
2009-07-08 09:06 UTC, Bryn M. Reeves
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2009:1377 0 normal SHIPPED_LIVE device-mapper-multipath bug-fix and enhancement update 2009-09-01 12:41:23 UTC

Description Bryn M. Reeves 2009-03-17 12:32:11 UTC
Description of problem:
The pp_alua path prioritiser callout sends a remote target port groups (RTPG) command to storage controllers to determine the asymmetric access state of LUNs using an initial buffer size of 128 bytes.

If the returned table of target port descriptors (TPGs) exceeds this size the command will exit with failure and multipath will be unable to create a multipath map for the device. This prevents multipath being used with systems that support ALUA but return a TPGD table greater than 128 bytes.


Version-Release number of selected component (if applicable):
0.4.7-23.el5

How reproducible:
100% with appropriate storage (e.g. Incipient NSP, Fujitsu ETERNUS8000).

Steps to Reproduce:
1. Enable ALUA support on storage
2. Enable alua path prioritiser in multipath.conf
3. run multipath
  
Actual results:
multipath fails to configure maps for the ALUA device if the returned TPGD table is >80 bytes in length.

Expected results:
multipath succeeds in configuring maps for the ALUA device even if the returned TPGD table is >80 bytes in length.

Additional info:
Upstream fixed this two years ago:

diff --git a/path_priority/pp_alua/rtpg.c b/path_priority/pp_alua/rtpg.c
index 6922d9a..701f9d5 100644
commit fa74fbec129b3dc056d58ffc16155ccfd6863eab
Author: Christophe Varoqui <cvaroqui>
Date:   Thu Nov 16 00:21:17 2006 +0100

    [priority] adjust SCSI RTPG request buffer size in ALUA prioritizer
    
    The following patch adds error and size checking to the REPORT TARGET
    PORT GROUPS command issued by rtpg.c.  The old code issued a buffer of
    128 bytes, but never checked the return length.  The new code starts
    with a buffer of 128 bytes, but reallocates it if the buffer is too
    small (SCSI returns the necessary length in the response data).  This is
    more robust as it handles devices that consume more than 128 bytes for
    RTPG, like the Incipient NSP.
    
    I tried to handle errors, etc. as closely as I could to what is already
    there in the code.  Let me know if you'd prefer anything written
    differently.
    
    Brian Geisel

Comment 1 Bryn M. Reeves 2009-03-17 12:33:40 UTC
Created attachment 335518 [details]
Resend RTPG in pp_alua if buffer too small

Comment 2 Ben Marzinski 2009-04-08 21:39:14 UTC
Patch applied. Thanks.

Comment 4 Chris Ward 2009-07-03 18:27:29 UTC
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.

Comment 5 Bryn M. Reeves 2009-07-06 18:52:08 UTC
The patch that was merged in CVS is not the same as the one posted in comment #1 and doesn't fix the problem.

The original patch has this:

+       scsi_buflen = buf[0] << 24 | buf[1] << 16 | buf[2] << 8 | buf[3] + 4;
+       if (buflen < scsi_buflen) {

But the version in 0.4.7-27.el5 changes this to:

+       scsi_buflen = buf[0] << 24 | buf[1] << 16 | buf[2] << 8 | buf[3];
+       if (buflen < (scsi_buflen + 4)) {

This is fine, but the scsi_buflen variable is also used inside the if {} block to give the size of the new buffer:

        if (buflen < (scsi_buflen + 4)) {
                free(buf);
                buf = (unsigned char *)malloc(scsi_buflen);
                if (!buf) {
                        PRINT_DEBUG ("malloc failed: could not allocate"
                                "%u bytes\n", scsi_buflen);
                        return -RTPG_RTPG_FAILED;
                }
                buflen = scsi_buflen;
                memset(buf, 0, buflen);
                rc = do_rtpg(fd, buf, buflen);
                if (rc < 0)
                        goto out;
        }

If the "+ 4" is going to move out of the calculation of scsi_buflen then it needs to be added back in to each successive use of the variable. The current RHEL5 code ends up allocating a buffer that is 4 bytes too small and truncates the resulting rtpg buffer returned from the device.

Upstream seems to have the same bug.

Comment 9 Ben Marzinski 2009-07-07 18:45:50 UTC
Err... the attached patch in comment #1 doesn't include the +4 in the initial calculation of scsi_buflen. Aside from some extra whitespace, it is exactly like the patch I committed.

At any rate, I changed the code.  Could you please point me to the original patch, so that I can see if I'm missing anything else.

Comment 11 Bryn M. Reeves 2009-07-08 09:06:17 UTC
Created attachment 350906 [details]
resend rtpg in pp_alua if buffer too small (2)

Sorry, that's my fault then - I took the patch from upstream which also has these changes and didn't notice the difference when I attached it to the bug.

Attaching the original version which was described as being the same change as was submitted upstream but now I look at the dm-devel thread this wasn't the case:

http://www.redhat.com/archives/dm-devel/2006-November/msg00138.html

Comment 14 errata-xmlrpc 2009-09-02 11:47:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1377.html


Note You need to log in before you can comment on or make changes to this bug.