Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 467783

Summary: SAN Patchset needs merging into MRG
Product: Red Hat Enterprise MRG Reporter: IBM Bug Proxy <bugproxy>
Component: realtime-kernelAssignee: Clark Williams <williams>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 1.1CC: bhu, davids, williams
Target Milestone: 1.1   
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-22 10:44:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch-buffer_io_fix
none
patch-scsi_dh-use_in_dmmp
none
qla-handle_sense_fix.patch
none
patch-scsi_dh-add-lsi
none
qla_nomsi.patch
none
patch-scsi_dh-remove_dm_hw_handlers
none
patch-scsi_dh-single_path_init
none
patch-scsi_dh-add
none
patch-scsi_dh-path_failover
none
patch-scsi_dh-remove_dm_hw_support
none
patch-scsi_dh-remove_dm_pg_init_complete
none
renamed patch-scsi_dh-add
none
renamed patch-scsi_dh-add-lsi
none
renamed patch-scsi_dh-user_in_dmmp
none
renamed patch-scsi_dh-signle_path_init
none
renamed patch-scsi_dh-remove_dm_pg_init_complete
none
renamed patch-scsi_dh-remove_dm_hw_support
none
renamed patch-scsi_dh-remove_dm_hw_handlers
none
renamed patch-scsi_dh-path_failover none

Description IBM Bug Proxy 2008-10-20 21:01:09 UTC
=Comment: #0=================================================
John G. Stultz <johnstul.com> - 
The following patchset is needed to be merged to MRG.
   patch-scsi_dh-add
   patch-scsi_dh-add-lsi
   patch-scsi_dh-use_in_dmmp
   patch-scsi_dh-single_path_init
   patch-scsi_dh-remove_dm_pg_init_complete
   patch-scsi_dh-remove_dm_hw_handlers
   patch-scsi_dh-remove_dm_hw_support
   qla-handle_sense_fix.patch
   qla_nomsi.patch
   patch-buffer_io_fix
   patch-scsi_dh-path_failover

I'll be attaching those patches shortly.
=Comment: #2=================================================
John G. Stultz <johnstul.com> - 

patch-scsi_dh-add

Subject: [PATCH 01/10] scsi_dh: add infrastructure for SCSI Device Handlers

From: Chandra Seetharaman <sekharan.com>

Some of the storage devices (that can be accessed through multiple paths),
do need some special handling for
        1. Activating the passive path of the storage access.
        2. Decode and handle the special sense codes returned by the devices.
        3. Handle the I/Os being sent to the passive path, especially
           during the device probe time.
when accessed through multiple paths.

As of today this special device handling is done at the dm-multipath
layer using dm-handlers. That works well for (1); for (2) to be handled
at dm layer, scsi sense information need to be exported from SCSI to dm-layer,
which is not very attractive; (3) cannot be done at all at the dm layer.

Device handler has been moved to SCSI mainly to handle (2) and (3) properly.

--
This patch provides the infrastructure for moving the feature over to SCSI.

Signed-off-by: Chandra Seetharaman <sekharan.com>
Signed-off-by: Mike Anderson <andmike.ibm.com>
Signed-off-by: Mike Christie <michaelc.edu>
---
This patch was ported,tested and 
Signed-off-by: Keith Mannthey <kmannth.com>
---
=Comment: #3=================================================
John G. Stultz <johnstul.com> - 

patch-scsi_dh-add-lsi

From: Chandra Seetharaman <sekharan.com>

This patch provides the device handler to support the LSI RDAC SCSI
based storage devices.

Signed-off-by: Chandra Seetharaman <sekharan.com>
---
This patch was ported,tested and
Signed-off-by: Keith Mannthey <kmannth.com>
---

=Comment: #4=================================================
John G. Stultz <johnstul.com> - 

patch-scsi_dh-use_in_dmmp

Subject: [PATCH 05/10] scsi_dh: Use SCSI device handler in dm-multipath

From: Chandra Seetharaman <sekharan.com>

This patch converts dm-mpath to use scsi device handlers instead of
dm's hardware handlers.

This patch does not add any new functionality. Old behaviors remain and
userspace tools work as is except that arguments supplied with hardware
handler are ignored.

One behavioral exception is: Activation of a path is synchronous in this
patch, opposed to the older behavior of being asynchronous (changed in
patch 07: scsi_dh: Add a single threaded workqueue for initializing a path)

Note: There is no need to get a reference for the device handler module
(as it was done in the dm hardware handler case) here as the reference
is held when the device was first found. Instead we check and make sure
that support for the specified device is present at table load time.

Signed-off-by: Chandra Seetharaman <sekharan.com>
Signed-off-by: Mike Christie <michaelc.edu>
---
This patch was ported,tested and
Signed-off-by: Keith Mannthey <kmannth.com>
---
=Comment: #5=================================================
John G. Stultz <johnstul.com> - 

patch-scsi_dh-single_path_init

Subject: [PATCH 07/10] scsi_dh: Add a single threaded workqueue for initializing paths

From: Chandra Seetharaman <sekharan.com>

Before this patch set (SCSI hardware handlers), initialization of a
path was done asynchronously. Doing that requires a workqueue in each
device/hardware handler module and leads to unneccessary complication
in the device handler code, making it difficult to read the code and
follow the state diagram.

Moving that workqueue to this level makes the device handler code simpler.
Hence, the workqueue is moved to dm level.

A new workqueue is added instead of adding it to the existing workqueue
(kmpathd) for the following reasons:
        1. Device activation has to happen faster, stacking them along
           with the other workqueue might lead to unnecessary delay
           in the activation of the path.
        2. The effect could be felt the other way too. i.e the current
           events that are handled by the existing workqueue might get
           a delayed response.

Signed-off-by: Chandra Seetharaman <sekharan.com>
---
This patch was ported,tested and
Signed-off-by: Keith Mannthey <kmannth.com>
---
=Comment: #6=================================================
John G. Stultz <johnstul.com> - 

patch-scsi_dh-remove_dm_pg_init_complete

Subject: [PATCH 08/10] scsi_dh: Remove dm_pg_init_complete
From: Chandra Seetharaman <sekharan.com>

This patch just removes the dm layer's path initialization completion
routine.  This is separated from the other patch(scsi_dh: Use SCSI
device handler in dm-multipath) Just to make that patch more readable.

Signed-off-by: Chandra Seetharaman <sekharan.com>
---
This patch was ported,tested and
Signed-off-by: Keith Mannthey <kmannth.com>
---
=Comment: #7=================================================
John G. Stultz <johnstul.com> - 

patch-scsi_dh-remove_dm_hw_handlers

Subject: [PATCH 09/10] scsi_dh: Remove hardware handlers from dm

From: Chandra Seetharaman <sekharan.com>

This patch removes the 3 hardware handlers that currently exist
under dm as the functionality is moved to SCSI layer in the earlier
patches.

Signed-off-by: Chandra Seetharaman <sekharan.com>
---
This patch was ported,tested and
Signed-off-by: Keith Mannthey <kmannth.com>
=Comment: #8=================================================
John G. Stultz <johnstul.com> - 

patch-scsi_dh-remove_dm_hw_support

Subject: [PATCH 10/10] scsi_dh: Remove hardware handler infrastructure from dm

From: Chandra Seetharaman <sekharan.com>

This patch just removes infrastructure that provided support for hardware
handlers in the dm layer as it is not needed anymore.

Signed-off-by: Chandra Seetharaman <sekharan.com>
--
This patch was ported,tested and
Signed-off-by: Keith Mannthey <kmannth.com>
=Comment: #9=================================================
John G. Stultz <johnstul.com> - 

qla-handle_sense_fix.patch

Patch to correct SCSI error sense code behavior in the QLA2xxx driver. 

Without this change the driver always returns an incorrect error code.  The 
error leads to the sense data  data, what is used by the device handler, not
being checked.  This patch make the QLA driver return the correct sense code 
for writes to the ghost path on the DS4700. 

This has been tested by Chandra and myself on 3 boxes and it is a needed fix. 

Somewhere around 2.5.25 a new version of the QLA driver was introduced.  
Earlier kernels need this fix to work with the SCSI device handler work. 

Submitted-by:  Keith Mannthey <kmannth.com>  
=Comment: #10=================================================
John G. Stultz <johnstul.com> - 

qla_nomsi.patch

Problem:
Bugzilla defect # 84842
Spurious mailbox timeouts and path failovers were observed under heavy IO load.

Analysis/Fix:
R2 Qlogic driver is using MSI-EDGE interrupts by default.
Suspecting issue with this, changed to the old  APIC.

Testing:
No mailbox timeouts were observed on a 20 hour run with APIC interrupts.

Signed-off-by: Venkateswararao Jujjuri <jvrao.com> 

=Comment: #11=================================================
John G. Stultz <johnstul.com> - 

patch-buffer_io_fix

From: Keith Mannthey <kmannth.com>

Allow the scsi request REQ_QUIET flag to be propagated to the buffer
file system layer.  It is pretty simple, pass the flag form the scsi
request to the bio (block IO) and then to the buffer layer.  

This patch declutters the log by removed the 40-50 (per lun) buffer io
error messages seen during a boot. It is a good chance any real errors
will be missed in the "noise" in a customer environment. 

I ran bonnie++ a little: 

Linux version 2.6.24.7-75ibmrt2.8:
[root@elm3c19 ~]# bonnie++ -d /home/test -s 40000 -x 1 -u root:root -q
elm3c19,40000M,36202,45,36358,10,26109,6,76973,79,133083,13,262.2,0,16,
+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++

Linux version 2.6.24.7-75ibmrt2.8 + this patch. 
[root@elm3c19 ~]# bonnie++ -d /home/test -s 40000 -x 1 -u root:root -q
elm3c19,40000M,36697,45,35942,10,26330,7,74368,77,132676,13,264.6,0,16,
+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++

For 2 single runs there is no real difference. The SAN is not constant
speed like local and the data footprint is the same. No red flags here. 

Chandra agrees it is a needed patch and thinks it is mainline ready.  

I built a kernel view and I am building a few kernel rpms right now. 

Notes: 
Updates from the last patch:
1.  Remove the int quiet flag I propose last time.  There is a flag
state that can be used. 
2.  Refactor some printk_ratelimit code in the buffer code.  With the
first patch we would see the random printk suppression messages but not
the errors. 

The formatting on this patch in one of the .h files looks weird in the
patch and was created via diff -urN but it applies just fine via patch.
The tools treat is just fine even though it look weird in patch format. 


Signed-off-by:  Keith Mannthey <kmannth.com>

Reworked the printk_ratelimiting bits to make more sense.

Signed-off-by: John Stultz <johnstul.com>
=Comment: #12=================================================
John G. Stultz <johnstul.com> - 

patch-scsi_dh-path_failover

Moving the path activation to workqueue along with scsi_dh patches introduced
a race. It is due to the fact that the current_pgpath (in the multipath data
structure) can be modified if changes happen in any of the paths leading to
the lun. If the changes lead to current_pgpath being set to NULL, then it
leads to the invalid access which results in the panic below.

This patch fixes that by storing the pgpath to activate in the multipath data
structure and properly protecting it.

Note that if activate_path is called twice in succession with different pgpath,
with the second one being called before the first one is done, then activate
path will be called twice for the second pgpath, which is fine.

Signed-off-by: Chandra Seetharaman <sekharan.com>
--------------------
---
This patch was ported,tested and
Signed-off-by: Venkateswararao Jujjuri <jvrao.com>

Comment 1 IBM Bug Proxy 2008-10-20 21:01:17 UTC
Created attachment 320920 [details]
patch-buffer_io_fix

Comment 2 IBM Bug Proxy 2008-10-20 21:01:20 UTC
Created attachment 320921 [details]
patch-scsi_dh-use_in_dmmp

Comment 3 IBM Bug Proxy 2008-10-20 21:01:25 UTC
Created attachment 320922 [details]
qla-handle_sense_fix.patch

Comment 4 IBM Bug Proxy 2008-10-20 21:01:28 UTC
Created attachment 320923 [details]
patch-scsi_dh-add-lsi

Comment 5 IBM Bug Proxy 2008-10-20 21:01:31 UTC
Created attachment 320924 [details]
qla_nomsi.patch

Comment 6 IBM Bug Proxy 2008-10-20 21:01:34 UTC
Created attachment 320925 [details]
patch-scsi_dh-remove_dm_hw_handlers

Comment 7 IBM Bug Proxy 2008-10-20 21:01:37 UTC
Created attachment 320926 [details]
patch-scsi_dh-single_path_init

Comment 8 IBM Bug Proxy 2008-10-20 21:01:40 UTC
Created attachment 320927 [details]
patch-scsi_dh-add

Comment 9 IBM Bug Proxy 2008-10-20 21:01:44 UTC
Created attachment 320928 [details]
patch-scsi_dh-path_failover

Comment 10 IBM Bug Proxy 2008-10-20 21:01:48 UTC
Created attachment 320929 [details]
patch-scsi_dh-remove_dm_hw_support

Comment 11 IBM Bug Proxy 2008-10-20 21:01:51 UTC
Created attachment 320930 [details]
patch-scsi_dh-remove_dm_pg_init_complete

Comment 12 IBM Bug Proxy 2008-10-20 21:11:33 UTC
Clark Williams is aware these are headed his way for review.

Comment 13 Clark Williams 2008-10-22 15:18:44 UTC
Comment on attachment 320921 [details]
patch-scsi_dh-use_in_dmmp

changed MIME type

Comment 14 Clark Williams 2008-10-23 18:12:24 UTC
patches merged into kernel-rt

queued for -88 build

Comment 15 Clark Williams 2008-10-24 18:36:57 UTC
Created attachment 321447 [details]
renamed patch-scsi_dh-add

Comment 16 Clark Williams 2008-10-24 18:37:49 UTC
Created attachment 321448 [details]
renamed patch-scsi_dh-add-lsi

Comment 17 Clark Williams 2008-10-24 18:39:18 UTC
Created attachment 321449 [details]
renamed patch-scsi_dh-user_in_dmmp

Comment 18 Clark Williams 2008-10-24 18:40:24 UTC
Created attachment 321450 [details]
renamed patch-scsi_dh-signle_path_init

Comment 19 Clark Williams 2008-10-24 18:42:06 UTC
Created attachment 321451 [details]
renamed patch-scsi_dh-remove_dm_pg_init_complete

Comment 20 Clark Williams 2008-10-24 18:43:20 UTC
Created attachment 321452 [details]
renamed patch-scsi_dh-remove_dm_hw_support

Comment 21 Clark Williams 2008-10-24 18:44:31 UTC
Created attachment 321453 [details]
renamed patch-scsi_dh-remove_dm_hw_handlers

Comment 22 Clark Williams 2008-10-24 18:45:23 UTC
Created attachment 321454 [details]
renamed patch-scsi_dh-path_failover

Comment 23 Clark Williams 2008-10-24 18:48:21 UTC
It was my assumption (could be wrong) that the SCSI_DH series of patches was intended for 64-bit use only, so we only enabled them for x86_64 kernel builds. Unfortunately this causes a link failure in the x86 kernel build. 

Are the SCSI_DH patches 32-bit safe and tested on 32-bit systems?

Comment 25 David Sommerseth 2008-11-25 21:18:53 UTC
Verified by code review against mrg-rt.git (mrg-rt-2.6.24.7-93)

** Patches uploaded by Clark:

cbcaa5991d91d8060394fdfd4178d52cd4c5e1fb - "renamed patch-scsi_dh-path_failover"
    Moving the path activation to workqueue along with scsi_dh patches introduced

7e6da39ca511ee768104ccc3842df4c0501f82ae - "renamed patch-scsi_dh-remove_dm_hw_handlers"
    [PATCH 09/10] scsi_dh: Remove hardware handlers from dm

0e82579bb9915b805730d9d8c19acaf53513549b - "renamed patch-scsi_dh-remove_dm_hw_support"
    [PATCH 10/10] scsi_dh: Remove hardware handler infrastructure from dm

6c0d3d283b044e81145565b0a91879b9e8a922fb - "renamed patch-scsi_dh-remove_dm_pg_init_complete"
    [PATCH 08/10] scsi_dh: Remove dm_pg_init_complete

4c5f8c5727f95fcb293220f1955205074f307ea1 - "renamed patch-scsi_dh-signle_path_init"
    [PATCH 07/10] scsi_dh: Add a single threaded workqueue for initializing paths

bbe2227fa91c519b53271b1b321a3096d430a2e6 - "renamed patch-scsi_dh-user_in_dmmp"
    [PATCH 05/10] scsi_dh: Use SCSI device handler in dm-multipath

5db1193a051042d1d986581aa39e1dfbdc72970d - "renamed patch-scsi_dh-add"
    [PATCH 01/10] scsi_dh: add infrastructure for SCSI Device Handlers

d82bd52ac9388f1c2058979c1d9b9b85a39fa976 - "renamed patch-scsi_dh-add-lsi"
    This patch provides the device handler to support the LSI RDAC SCSI


** Patches uploaded by IBM

ab4a7a258d4d5213953f2da612d462ec79671bce - "qla_nomsi.patch"
    Problem: Bugzilla defect # 84842 Spurious mailbox timeouts and path failovers

7f0c4cab36684dec0a9cd0f1a386844cf44c260e - "patch-buffer_io_fix"
    Allow the scsi request REQ_QUIET flag to be propagated to the buffer


** Not implemented patches:
qla-handle_sense_fix.patch


** Deleted patches (ported):
patch-scsi_dh-use_in_dmmp
   >> renamed patch-scsi_dh-user_in_dmmp
   
patch-scsi_dh-add-lsi
   >> renamed patch-scsi_dh-add-lsi
   
patch-scsi_dh-remove_dm_hw_handlers 
   >> renamed patch-scsi_dh-remove_dm_hw_handlers
   
patch-scsi_dh-single_path_init
   >> renamed patch-scsi_dh-signle_path_init
   
patch-scsi_dh-add
   >> renamed patch-scsi_dh-add
   
patch-scsi_dh-remove_dm_hw_support
   >> renamed patch-scsi_dh-remove_dm_hw_support
   
patch-scsi_dh-remove_dm_pg_init_complete
   >> renamed patch-scsi_dh-remove_dm_pg_init_complete


Not possible to verify more due to missing SAN hardware.  Have done some quick and simple disk-stress tests on ls20 box, using mdraid and LVM against SCSI disk (3 partitions in RAID5) to check if these patches have influenced the basic SCSI and dm layer.  No issues were found.

Comment 27 errata-xmlrpc 2009-01-22 10:44:59 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0009.html

Comment 28 Red Hat Bugzilla 2023-09-14 01:13:53 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days