Bug 751274

Summary: SCSI-3 PR commands failing with invalid host byte(0x17) value
Product: Red Hat Enterprise Linux 6 Reporter: mukesh bafna <mukesh_bafna>
Component: kernelAssignee: Chris Leech <cleech>
Status: CLOSED DUPLICATE QA Contact: Storage QE <storage-qe>
Severity: high Docs Contact:
Priority: high    
Version: 6.0CC: coughlan, dledford, dwysocha, fge, Jes.Sorensen, linux26port, msnitzer, mukesh_bafna, peterm, qcai, rprice, vincent
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-19 15:30:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 782183, 840683, 961026    

Description mukesh bafna 2011-11-04 06:47:25 UTC
Description of problem:

SCSI-3 PR commands failing with invalid host byte(0x17) field. 


Version-Release number of selected component (if applicable):
6.0

How reproducible:

consistently reproducable

Steps to Reproduce:

1. do SCSI-3 PR registrations on paths of LUN 
2. Issue SCSI-3 PR command to clear all registrations done on LUN
3. command fails with host_byte value set to 0x17
  
Actual results:

SCSI-3 PR clear command fails with host_byte value set to 0x17


Expected results:

SCSI-3 PROUT out command should clear registrations as expected. In case failure should return with valid host_byte value set. 0x17 seems to be invalid value.


Additional info:

Please find details steps/logs of issue used with Veritas Multi-pathing product

1) Take LUN hitachi_vsp0_090f having 4 paths

[root@punb200m2labs01vm9 ~]# vxdmpadm getsubpaths dmpnodename=hitachi_vsp0_090f
NAME         STATE[A]   PATH-TYPE[M] CTLR-NAME  ENCLR-TYPE   ENCLR-NAME    ATTRS
================================================================================
sdee         ENABLED(A)    -          c4         Hitachi_VSP  hitachi_vsp0     -
sdeo         ENABLED(A)    -          c3         Hitachi_VSP  hitachi_vsp0     -
sdmb         DISABLED      -          c3         Hitachi_VSP  hitachi_vsp0     -
sdmg         ENABLED(A)    -          c4         Hitachi_VSP  hitachi_vsp0     -

2) LUNs current PR status

[root@punb200m2labs01vm9 ~]# vxdmppr read /dev/vx/rdmp/hitachi_vsp0_090f
KEY-TYPE       RES-TYPE       ASCII-KEY        HEX-VALUE           PRgeneration
-------------------------------------------------------------------------------
REG            -              CPGR0012            0x4350475230303132  0x119

3) Issue clear command to clear registrations

[root@punb200m2labs01vm9 ~]# vxdmppr clear -r CPGR0012
/dev/vx/rdmp/hitachi_vsp0_090f; date
Fri Oct 14 03:40:31 IST 2011

4) All the paths went into failed state.

[root@punb200m2labs01vm9 ~]# vxdmpadm getsubpaths dmpnodename=hitachi_vsp0_090f
NAME         STATE[A]   PATH-TYPE[M] CTLR-NAME  ENCLR-TYPE   ENCLR-NAME    ATTRS
================================================================================
sdee         DISABLED     -          c4         Hitachi_VSP  hitachi_vsp0     -
sdeo         DISABLED     -          c3         Hitachi_VSP  hitachi_vsp0     -
sdmb         DISABLED     -          c3         Hitachi_VSP  hitachi_vsp0     -
sdmg         DISABLED     -          c4         Hitachi_VSP  hitachi_vsp0     -


Reason being SCSI-3 PR OUT clear command failed with host_byte field set to 0x17. msg_byte is set to 0. Valid values the host_byte can contain are 0 to 0x11. 0x17 host_byte value is not expected.  


VxVM syslog messages captured for additional info:


Oct 14 03:40:36 punb200m2labs01vm9 kernel: sd 3:0:0:69: reservation conflict
Oct 14 03:40:36 punb200m2labs01vm9 kernel: VxVM vxdmp V-5-3-0 dmp_recv_scsipkt:
SCSI request failure host_byte = 0x17 msg_byte = 0x0
Oct 14 03:40:36 punb200m2labs01vm9 kernel:
Oct 14 03:40:36 punb200m2labs01vm9 kernel: VxVM vxdmp V-5-3-0 dmp_check_scsipkt:
SCSI request failure host_byte = 0x17 msg_byte = 0x0 rq_status = 0x7
Oct 14 03:40:36 punb200m2labs01vm9 kernel:
Oct 14 03:40:36 punb200m2labs01vm9 kernel: VxVM vxdmp V-5-0-0 SCSI error
opcode=0x5f returned rq_status=0x7 cdb_status=0x0 key=0x0 asc=0x0 ascq=0x0 on
path 129/0x0
Oct 14 03:40:36 punb200m2labs01vm9 kernel:
Oct 14 03:40:36 punb200m2labs01vm9 kernel: VxVM vxdmp V-5-3-0 dmp_pr_send_cmd
failed with transport error: uscsi_rqstatus = 7ret = -1 status = 0 on dev 129/0x0
Oct 14 03:40:36 punb200m2labs01vm9 kernel:
Oct 14 03:40:36 punb200m2labs01vm9 kernel: VxVM vxdmp V-5-0-112 disabled path
129/0x0 belonging to the dmpnode 201/0x120 due to path failure

PR-OUT clear command using sg_persist utility also fails with similar reason:

[root@punb200m2labs01vm9 include]# sg_persist --out --clear
--param-sark=4350475230303132 --verbose /dev/sdee
    inquiry cdb: 12 00 00 00 24 00
  HITACHI   OPEN-V            7002
  Peripheral device type: disk
    Persistent Reservation Out cmd: 5f 03 00 00 00 00 00 00 18 00
persistent reserve out: transport: Host_status=0x17 is invalid
Driver_status=0x00 [DRIVER_OK, SUGGEST_OK]

PR out: command failed
=======

Here also PR OUT command failed with host_status 0x17

Comment 2 RHEL Program Management 2011-11-08 06:47:37 UTC
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 4 RHEL Program Management 2012-05-03 05:06:46 UTC
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 5 Jes Sorensen 2012-07-25 14:14:02 UTC
Hi,

Could you provide the full dmesg output of this system? In particular
I would like to know which driver has registered "sd 3:0:0:69"

The host_byte is set by the hba driver, not by the general SCSI stack, so
this is likely to be driver error if I understand it correctly.

Cheers,
Jes

Comment 6 Jes Sorensen 2012-07-26 07:13:22 UTC
Hi,

In addition, could you please provide the output of the following command
from the troublesome system:

udevadm info -a -n /dev/vx/rdmp/hitachi_vsp0_090f | grep DRIVER

Thanks,
Jes

Comment 7 Mike Snitzer 2012-07-30 22:10:32 UTC
(In reply to comment #5)
> Hi,
> 
> Could you provide the full dmesg output of this system? In particular
> I would like to know which driver has registered "sd 3:0:0:69"
> 
> The host_byte is set by the hba driver, not by the general SCSI stack, so
> this is likely to be driver error if I understand it correctly.

That is not entirely correct.  The host_byte can and will be manipulated by the SCSI mid layer.. but such manipulation should be reset (e.g. to DID_OK) before returning to the process that invoked the ioctl.

comment#0 refers to RHEL6.0.  Is this issue reproducible on RHEL > 6.0?

Comment 10 Linux engineering teams - Veritas 2012-08-03 07:11:45 UTC
We are preparing testbed again to provide necessary information. Issue was first seen with RHEL 6.0. Need to check with > 6.0..


-- mukesh bafna, Symantec

Comment 11 Tom Coughlan 2012-08-30 20:58:28 UTC
(In reply to comment #10)
> We are preparing testbed again to provide necessary information. Issue was
> first seen with RHEL 6.0. Need to check with > 6.0..

Time is running out for 6.4.

Comment 12 mukesh bafna 2012-08-31 09:29:25 UTC
Sorry for delay. Corresponding machine resources were released and its taking time to reacquire them. We are working towards it and hope to reply in couple of days.

Comment 16 Tom Coughlan 2012-09-21 15:42:26 UTC
(In reply to comment #0)

> SCSI-3 PR clear command fails with host_byte value set to 0x17
...
> Reason being SCSI-3 PR OUT clear command failed with host_byte field set to
> 0x17. msg_byte is set to 0. Valid values the host_byte can contain are 0 to
> 0x11. 0x17 host_byte value is not expected.  

(In reply to comment #5)

> The host_byte is set by the hba driver, not by the general SCSI stack, so
> this is likely to be driver error if I understand it correctly.

(In reply to comment #14)

> [root@punb200m2labs01vm7 ~]# udevadm info -a -n /dev/sdp | grep DRIVER
>     DRIVER==""
>     DRIVERS=="sd"
>     DRIVERS==""
>     DRIVERS==""
>     DRIVERS==""
>     DRIVERS=="fnic"
>     DRIVERS=="pcieport"
...
> 3. We have hit this issue with RHEL6.0GA, RHEL-6.1. We need to setup
> resource and check for RHEL-6.2 and RHEL-6.3. 

Chris, 

Please take a look to see if the fnic driver can return host_byte set to 0x17 (and confirm that this is indeed not correct). Check for fixes in this area. (I think the last fnic update was in 6.1, but maybe the change is in common code?)

Tom

Comment 17 Chris Leech 2012-09-24 15:36:05 UTC
I'm suspecting that this may be a duplicate of bug#787282, in which case it would be fixed in kernel-2.6.32-231.el6.  It's certainly a case of the bits from two different error codes being set, most likely either DID_TARGET_FAILURE or DID_NEXUS_FAILURE being set in the scsi_eh thread and a generic DID_ERROR being set in fnic.

Comment 19 Tom Coughlan 2013-12-19 15:30:05 UTC
I'm closing this, based on comment 17 "this may be a duplicate of bug#787282". Re-open if the problem is seen again with kernel >= kernel-2.6.32-231.el6.

*** This bug has been marked as a duplicate of bug 787282 ***