Bug 467783
Created attachment 320920 [details]
patch-buffer_io_fix
Created attachment 320921 [details]
patch-scsi_dh-use_in_dmmp
Created attachment 320922 [details]
qla-handle_sense_fix.patch
Created attachment 320923 [details]
patch-scsi_dh-add-lsi
Created attachment 320924 [details]
qla_nomsi.patch
Created attachment 320925 [details]
patch-scsi_dh-remove_dm_hw_handlers
Created attachment 320926 [details]
patch-scsi_dh-single_path_init
Created attachment 320927 [details]
patch-scsi_dh-add
Created attachment 320928 [details]
patch-scsi_dh-path_failover
Created attachment 320929 [details]
patch-scsi_dh-remove_dm_hw_support
Created attachment 320930 [details]
patch-scsi_dh-remove_dm_pg_init_complete
Clark Williams is aware these are headed his way for review. Comment on attachment 320921 [details]
patch-scsi_dh-use_in_dmmp
changed MIME type
patches merged into kernel-rt queued for -88 build Created attachment 321447 [details]
renamed patch-scsi_dh-add
Created attachment 321448 [details]
renamed patch-scsi_dh-add-lsi
Created attachment 321449 [details]
renamed patch-scsi_dh-user_in_dmmp
Created attachment 321450 [details]
renamed patch-scsi_dh-signle_path_init
Created attachment 321451 [details]
renamed patch-scsi_dh-remove_dm_pg_init_complete
Created attachment 321452 [details]
renamed patch-scsi_dh-remove_dm_hw_support
Created attachment 321453 [details]
renamed patch-scsi_dh-remove_dm_hw_handlers
Created attachment 321454 [details]
renamed patch-scsi_dh-path_failover
It was my assumption (could be wrong) that the SCSI_DH series of patches was intended for 64-bit use only, so we only enabled them for x86_64 kernel builds. Unfortunately this causes a link failure in the x86 kernel build. Are the SCSI_DH patches 32-bit safe and tested on 32-bit systems? Verified by code review against mrg-rt.git (mrg-rt-2.6.24.7-93)
** Patches uploaded by Clark:
cbcaa5991d91d8060394fdfd4178d52cd4c5e1fb - "renamed patch-scsi_dh-path_failover"
Moving the path activation to workqueue along with scsi_dh patches introduced
7e6da39ca511ee768104ccc3842df4c0501f82ae - "renamed patch-scsi_dh-remove_dm_hw_handlers"
[PATCH 09/10] scsi_dh: Remove hardware handlers from dm
0e82579bb9915b805730d9d8c19acaf53513549b - "renamed patch-scsi_dh-remove_dm_hw_support"
[PATCH 10/10] scsi_dh: Remove hardware handler infrastructure from dm
6c0d3d283b044e81145565b0a91879b9e8a922fb - "renamed patch-scsi_dh-remove_dm_pg_init_complete"
[PATCH 08/10] scsi_dh: Remove dm_pg_init_complete
4c5f8c5727f95fcb293220f1955205074f307ea1 - "renamed patch-scsi_dh-signle_path_init"
[PATCH 07/10] scsi_dh: Add a single threaded workqueue for initializing paths
bbe2227fa91c519b53271b1b321a3096d430a2e6 - "renamed patch-scsi_dh-user_in_dmmp"
[PATCH 05/10] scsi_dh: Use SCSI device handler in dm-multipath
5db1193a051042d1d986581aa39e1dfbdc72970d - "renamed patch-scsi_dh-add"
[PATCH 01/10] scsi_dh: add infrastructure for SCSI Device Handlers
d82bd52ac9388f1c2058979c1d9b9b85a39fa976 - "renamed patch-scsi_dh-add-lsi"
This patch provides the device handler to support the LSI RDAC SCSI
** Patches uploaded by IBM
ab4a7a258d4d5213953f2da612d462ec79671bce - "qla_nomsi.patch"
Problem: Bugzilla defect # 84842 Spurious mailbox timeouts and path failovers
7f0c4cab36684dec0a9cd0f1a386844cf44c260e - "patch-buffer_io_fix"
Allow the scsi request REQ_QUIET flag to be propagated to the buffer
** Not implemented patches:
qla-handle_sense_fix.patch
** Deleted patches (ported):
patch-scsi_dh-use_in_dmmp
>> renamed patch-scsi_dh-user_in_dmmp
patch-scsi_dh-add-lsi
>> renamed patch-scsi_dh-add-lsi
patch-scsi_dh-remove_dm_hw_handlers
>> renamed patch-scsi_dh-remove_dm_hw_handlers
patch-scsi_dh-single_path_init
>> renamed patch-scsi_dh-signle_path_init
patch-scsi_dh-add
>> renamed patch-scsi_dh-add
patch-scsi_dh-remove_dm_hw_support
>> renamed patch-scsi_dh-remove_dm_hw_support
patch-scsi_dh-remove_dm_pg_init_complete
>> renamed patch-scsi_dh-remove_dm_pg_init_complete
Not possible to verify more due to missing SAN hardware. Have done some quick and simple disk-stress tests on ls20 box, using mdraid and LVM against SCSI disk (3 partitions in RAID5) to check if these patches have influenced the basic SCSI and dm layer. No issues were found.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0009.html The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |
=Comment: #0================================================= John G. Stultz <johnstul.com> - The following patchset is needed to be merged to MRG. patch-scsi_dh-add patch-scsi_dh-add-lsi patch-scsi_dh-use_in_dmmp patch-scsi_dh-single_path_init patch-scsi_dh-remove_dm_pg_init_complete patch-scsi_dh-remove_dm_hw_handlers patch-scsi_dh-remove_dm_hw_support qla-handle_sense_fix.patch qla_nomsi.patch patch-buffer_io_fix patch-scsi_dh-path_failover I'll be attaching those patches shortly. =Comment: #2================================================= John G. Stultz <johnstul.com> - patch-scsi_dh-add Subject: [PATCH 01/10] scsi_dh: add infrastructure for SCSI Device Handlers From: Chandra Seetharaman <sekharan.com> Some of the storage devices (that can be accessed through multiple paths), do need some special handling for 1. Activating the passive path of the storage access. 2. Decode and handle the special sense codes returned by the devices. 3. Handle the I/Os being sent to the passive path, especially during the device probe time. when accessed through multiple paths. As of today this special device handling is done at the dm-multipath layer using dm-handlers. That works well for (1); for (2) to be handled at dm layer, scsi sense information need to be exported from SCSI to dm-layer, which is not very attractive; (3) cannot be done at all at the dm layer. Device handler has been moved to SCSI mainly to handle (2) and (3) properly. -- This patch provides the infrastructure for moving the feature over to SCSI. Signed-off-by: Chandra Seetharaman <sekharan.com> Signed-off-by: Mike Anderson <andmike.ibm.com> Signed-off-by: Mike Christie <michaelc.edu> --- This patch was ported,tested and Signed-off-by: Keith Mannthey <kmannth.com> --- =Comment: #3================================================= John G. Stultz <johnstul.com> - patch-scsi_dh-add-lsi From: Chandra Seetharaman <sekharan.com> This patch provides the device handler to support the LSI RDAC SCSI based storage devices. Signed-off-by: Chandra Seetharaman <sekharan.com> --- This patch was ported,tested and Signed-off-by: Keith Mannthey <kmannth.com> --- =Comment: #4================================================= John G. Stultz <johnstul.com> - patch-scsi_dh-use_in_dmmp Subject: [PATCH 05/10] scsi_dh: Use SCSI device handler in dm-multipath From: Chandra Seetharaman <sekharan.com> This patch converts dm-mpath to use scsi device handlers instead of dm's hardware handlers. This patch does not add any new functionality. Old behaviors remain and userspace tools work as is except that arguments supplied with hardware handler are ignored. One behavioral exception is: Activation of a path is synchronous in this patch, opposed to the older behavior of being asynchronous (changed in patch 07: scsi_dh: Add a single threaded workqueue for initializing a path) Note: There is no need to get a reference for the device handler module (as it was done in the dm hardware handler case) here as the reference is held when the device was first found. Instead we check and make sure that support for the specified device is present at table load time. Signed-off-by: Chandra Seetharaman <sekharan.com> Signed-off-by: Mike Christie <michaelc.edu> --- This patch was ported,tested and Signed-off-by: Keith Mannthey <kmannth.com> --- =Comment: #5================================================= John G. Stultz <johnstul.com> - patch-scsi_dh-single_path_init Subject: [PATCH 07/10] scsi_dh: Add a single threaded workqueue for initializing paths From: Chandra Seetharaman <sekharan.com> Before this patch set (SCSI hardware handlers), initialization of a path was done asynchronously. Doing that requires a workqueue in each device/hardware handler module and leads to unneccessary complication in the device handler code, making it difficult to read the code and follow the state diagram. Moving that workqueue to this level makes the device handler code simpler. Hence, the workqueue is moved to dm level. A new workqueue is added instead of adding it to the existing workqueue (kmpathd) for the following reasons: 1. Device activation has to happen faster, stacking them along with the other workqueue might lead to unnecessary delay in the activation of the path. 2. The effect could be felt the other way too. i.e the current events that are handled by the existing workqueue might get a delayed response. Signed-off-by: Chandra Seetharaman <sekharan.com> --- This patch was ported,tested and Signed-off-by: Keith Mannthey <kmannth.com> --- =Comment: #6================================================= John G. Stultz <johnstul.com> - patch-scsi_dh-remove_dm_pg_init_complete Subject: [PATCH 08/10] scsi_dh: Remove dm_pg_init_complete From: Chandra Seetharaman <sekharan.com> This patch just removes the dm layer's path initialization completion routine. This is separated from the other patch(scsi_dh: Use SCSI device handler in dm-multipath) Just to make that patch more readable. Signed-off-by: Chandra Seetharaman <sekharan.com> --- This patch was ported,tested and Signed-off-by: Keith Mannthey <kmannth.com> --- =Comment: #7================================================= John G. Stultz <johnstul.com> - patch-scsi_dh-remove_dm_hw_handlers Subject: [PATCH 09/10] scsi_dh: Remove hardware handlers from dm From: Chandra Seetharaman <sekharan.com> This patch removes the 3 hardware handlers that currently exist under dm as the functionality is moved to SCSI layer in the earlier patches. Signed-off-by: Chandra Seetharaman <sekharan.com> --- This patch was ported,tested and Signed-off-by: Keith Mannthey <kmannth.com> =Comment: #8================================================= John G. Stultz <johnstul.com> - patch-scsi_dh-remove_dm_hw_support Subject: [PATCH 10/10] scsi_dh: Remove hardware handler infrastructure from dm From: Chandra Seetharaman <sekharan.com> This patch just removes infrastructure that provided support for hardware handlers in the dm layer as it is not needed anymore. Signed-off-by: Chandra Seetharaman <sekharan.com> -- This patch was ported,tested and Signed-off-by: Keith Mannthey <kmannth.com> =Comment: #9================================================= John G. Stultz <johnstul.com> - qla-handle_sense_fix.patch Patch to correct SCSI error sense code behavior in the QLA2xxx driver. Without this change the driver always returns an incorrect error code. The error leads to the sense data data, what is used by the device handler, not being checked. This patch make the QLA driver return the correct sense code for writes to the ghost path on the DS4700. This has been tested by Chandra and myself on 3 boxes and it is a needed fix. Somewhere around 2.5.25 a new version of the QLA driver was introduced. Earlier kernels need this fix to work with the SCSI device handler work. Submitted-by: Keith Mannthey <kmannth.com> =Comment: #10================================================= John G. Stultz <johnstul.com> - qla_nomsi.patch Problem: Bugzilla defect # 84842 Spurious mailbox timeouts and path failovers were observed under heavy IO load. Analysis/Fix: R2 Qlogic driver is using MSI-EDGE interrupts by default. Suspecting issue with this, changed to the old APIC. Testing: No mailbox timeouts were observed on a 20 hour run with APIC interrupts. Signed-off-by: Venkateswararao Jujjuri <jvrao.com> =Comment: #11================================================= John G. Stultz <johnstul.com> - patch-buffer_io_fix From: Keith Mannthey <kmannth.com> Allow the scsi request REQ_QUIET flag to be propagated to the buffer file system layer. It is pretty simple, pass the flag form the scsi request to the bio (block IO) and then to the buffer layer. This patch declutters the log by removed the 40-50 (per lun) buffer io error messages seen during a boot. It is a good chance any real errors will be missed in the "noise" in a customer environment. I ran bonnie++ a little: Linux version 2.6.24.7-75ibmrt2.8: [root@elm3c19 ~]# bonnie++ -d /home/test -s 40000 -x 1 -u root:root -q elm3c19,40000M,36202,45,36358,10,26109,6,76973,79,133083,13,262.2,0,16, +++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++ Linux version 2.6.24.7-75ibmrt2.8 + this patch. [root@elm3c19 ~]# bonnie++ -d /home/test -s 40000 -x 1 -u root:root -q elm3c19,40000M,36697,45,35942,10,26330,7,74368,77,132676,13,264.6,0,16, +++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++ For 2 single runs there is no real difference. The SAN is not constant speed like local and the data footprint is the same. No red flags here. Chandra agrees it is a needed patch and thinks it is mainline ready. I built a kernel view and I am building a few kernel rpms right now. Notes: Updates from the last patch: 1. Remove the int quiet flag I propose last time. There is a flag state that can be used. 2. Refactor some printk_ratelimit code in the buffer code. With the first patch we would see the random printk suppression messages but not the errors. The formatting on this patch in one of the .h files looks weird in the patch and was created via diff -urN but it applies just fine via patch. The tools treat is just fine even though it look weird in patch format. Signed-off-by: Keith Mannthey <kmannth.com> Reworked the printk_ratelimiting bits to make more sense. Signed-off-by: John Stultz <johnstul.com> =Comment: #12================================================= John G. Stultz <johnstul.com> - patch-scsi_dh-path_failover Moving the path activation to workqueue along with scsi_dh patches introduced a race. It is due to the fact that the current_pgpath (in the multipath data structure) can be modified if changes happen in any of the paths leading to the lun. If the changes lead to current_pgpath being set to NULL, then it leads to the invalid access which results in the panic below. This patch fixes that by storing the pgpath to activate in the multipath data structure and properly protecting it. Note that if activate_path is called twice in succession with different pgpath, with the second one being called before the first one is done, then activate path will be called twice for the second pgpath, which is fine. Signed-off-by: Chandra Seetharaman <sekharan.com> -------------------- --- This patch was ported,tested and Signed-off-by: Venkateswararao Jujjuri <jvrao.com>