Bug 670572

Summary: [NetApp 6.0 Bug] Erroneous TPG ID check in SCSI ALUA Handler
Product: Red Hat Enterprise Linux 6 Reporter: Martin George <marting>
Component: kernelAssignee: Mike Snitzer <msnitzer>
Status: CLOSED ERRATA QA Contact: Gris Ge <fge>
Severity: high Docs Contact:
Priority: urgent    
Version: 6.0CC: bdonahue, coughlan, dhoward, fge, mchristi, msnitzer, revers, xdl-redhat-bugzilla
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: kernel-2.6.32-112.el6 Doc Type: Bug Fix
Doc Text:
For a device that used a Target Portal Group (TPG) ID which occupied the full 2 bytes in the RTPG (Report Target Port Groups) response (with either byte exceeding the maximum value that may be stored in a signed char), the kernel's calculated TPG ID would never match the group_id that it should. As a result, this signed char overflow also caused the ALUA handler to incorrectly identify the Asymmetric Access State (AAS) of the specified device as well as incorrectly interpret the supported AAS of the target. With this update, the aforementioned issue has been addressed and no longer occurs.
Story Points: ---
Clone Of: 669961 Environment:
Last Closed: 2011-05-19 12:05:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 669961    
Bug Blocks: 673978    

Description Martin George 2011-01-18 17:20:22 UTC
+++ This bug was initially created as a clone of Bug #669961 +++

Description of problem:
For Target Portal Group IDs occupying the full 2 bytes in the RTPG response, the following group_id check in the alua_rtpg routine always fails in scsi_dh_alua.c:

if (h->group_id == (ucp[2] << 8) + ucp[3]) {

This causes the ALUA handler to wrongly identify the AAS of a specified device as well as incorrectly interpreting the supported AAS of the target as seen by the following entries in the /var/log/messages:

"alua: port group 3ea state A supports tousna"
"alua: port group 3e9 state A supports tousna"

Version-Release number of selected component (if applicable):
RHEL 5.6 (2.6.18-238.el5)

Notes:
This is because 'ucp' is wrongly defined in alua_rtpg as a character pointer instead of an unsigned character pointer. The below patch fixes the problem:

--- scsi_dh_alua.c.orig 2011-01-14 16:48:37.000000000 +0530
+++ scsi_dh_alua.c      2011-01-16 10:27:54.000000000 +0530
@@ -535,7 +535,7 @@ static int alua_rtpg(struct scsi_device
 {
        struct scsi_sense_hdr sense_hdr;
        int len, k, off, valid_states = 0;
-       char *ucp;
+       unsigned char *ucp;
        unsigned err;
        unsigned long expiry, interval = 10;

Comment 1 Martin George 2011-01-18 17:21:38 UTC
Seen on RHEL 6.0 as well (kernel-2.6.32-71.el6).

Comment 3 Martin George 2011-01-18 17:40:39 UTC
This bug could break AAS handling in the ALUA handler. So it's important the
fix makes it to 6.0.z itself.

Comment 4 RHEL Program Management 2011-01-18 18:00:53 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 7 Aristeu Rozanski 2011-02-03 16:20:15 UTC
Patch(es) available on kernel-2.6.32-112.el6

Comment 10 Gris Ge 2011-02-23 08:41:12 UTC
Is there any way for us to verify this patch?
Or just code review?

Comment 11 Martin Prpič 2011-02-23 15:13:12 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
For a device that used a Target Portal Group (TPG) ID which occupied the full 2 bytes in the RTPG (Report Target Port Groups) response (with either byte exceeding the maximum value that may be stored in a signed char), the kernel's calculated TPG ID would never match the group_id that it should. As a result, this signed char overflow also caused the ALUA handler to incorrectly identify the Asymmetric Access State (AAS) of the specified device as well as incorrectly interpret the supported AAS of the target. With this update, the aforementioned issue has been addressed and no longer occurs.

Comment 12 Gris Ge 2011-03-11 02:20:49 UTC
Code reviewed, patch has been applied into kernel-2.6.32-120.el6.

Set as Sanity Only

Comment 13 errata-xmlrpc 2011-05-19 12:05:30 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html