Bug 537514
| Summary: | [LSI 5.5 feat] make scsi_dh_activate asynchronous to address the slower lun failovers with large number of luns | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Babu Moger <babu.moger> | ||||||||||
| Component: | kernel | Assignee: | Rob Evers <revers> | ||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||||||
| Severity: | high | Docs Contact: | |||||||||||
| Priority: | high | ||||||||||||
| Version: | 5.5 | CC: | andriusb, clayton_walther, cward, dl-iop-bugzilla, jfeeney, martinez, mchristi, narayanan_d, revers, rlerch, sekharan, thenzl, vijay.chauhan, wwlinuxengineering, yanqing_liu | ||||||||||
| Target Milestone: | rc | Keywords: | FutureFeature, OtherQA | ||||||||||
| Target Release: | 5.5 | ||||||||||||
| Hardware: | All | ||||||||||||
| OS: | Linux | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | Doc Type: | Enhancement | |||||||||||
| Doc Text: |
Fix enabling faster 'rdac' device handler device activation. Removes long delays on rdac device path activation noticed when paths to high number of luns activate simultaneously, such as a cable-pull in an active-passive multi-path environment.
|
Story Points: | --- | ||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2010-03-30 07:25:01 UTC | Type: | --- | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Bug Depends On: | 537257 | ||||||||||||
| Bug Blocks: | 496328, 533941, 541103 | ||||||||||||
| Attachments: |
|
||||||||||||
|
Description
Babu Moger
2009-11-13 23:24:15 UTC
Babu - as stated in email, this feature will be evaluated if time and resources allow, since it is a late request for RHEL 5.5. Babu - can you please create a similar RHEL 6 bugzilla with this info to get this added there as well? Andrius, Yes, We have already created a bugzilla for RHEL 6. The bugzilla number is 537257. thanks Babu! Updated! *** Bug 523668 has been marked as a duplicate of this bug. *** The patches do not apply to RHEL 5.4. They depend on additional upstream changes in the SCSI midlayer and rdac that are not present in RHEL 5. We will require these patches to be backported to a recent RHEL 5 version before we can take them. Considering how late this arrived for 5.5, we it would be most helpful for the requestor to provide these backports. Tom Dear RedHat, I ported the changes to RHEL 5.5 tree (2.6.18-176) and it compiles clean. But, there is an issue. SInc ethe data structure scsi_device_handler has changed, some of the scsi_dh internal functions looks different w.r.t kABI compliance. The functions are: store_scsi_dh_data scsi_unregister_device_handler retrieve_scsi_dh_data scsi_register_device_handler scsi_dh_activate Let me know if it is ok to have this breakage. If it is not ok, I will work on making it ABI compliant. Thanks RedHat, Please advice on the ABI issue raised in the previous comment. Jon, Can you comment on the kabi issue raised in comment 10? Thanks, Rob Are those functions under kabi? My scsi_dh_emc patch modifies the scsi_device_handler struct, so it would break kabi if the symbols were listed. I do not think check_kabi is spitting out any errors though (it is hard to tell because there are already errors before the patch). IBM were you seeing new check_kabi errors? Mike, I just compare Module.symvers (that is generated) with and without the patch and it does show errors. It shows errors with your scsi_dh_emc patch too. Where do I get check_kabi script/tool ? Will send you the tools. Ported the patches, tested them and verified that they work as expected. Used the kabi tools Mike sent and verified that the two patches (below) does not break any kABI. Created attachment 378839 [details]
Make scsi_dh_activate async
Created attachment 378840 [details]
Batch up MODE SELECTS in rdac scsi hardware handler
Both the above patches applies cleanly on top of 2.6.18-180 Chandra, Can you update the patch with the problem found in bz537527 comment 14 if it applies and post the fix to linux-scsi? https://bugzilla.redhat.com/show_bug.cgi?id=537257#c14 Thanks, Rob Created attachment 379276 [details]
Make scsi_dh_activate async
This patch is same as the above one but applies cleanly on 2.6.18-182
Created attachment 379279 [details] Batch up MODE SELECTs in rdac scsi hardware handler Fixed a suggestion Mike Christie provided in Bug # 537257. This also applies cleanly on 2.6.18-182 (In reply to comment #20) > Chandra, > > Can you update the patch with the problem found in bz537527 comment 14 if it > applies and post the fix to linux-scsi? > > https://bugzilla.redhat.com/show_bug.cgi?id=537257#c14 > > Thanks, Rob Hi Rob, New patch provided above handles the issue you pointed. Thanks Chandra - has this been posted upstream? Yes, I have already posted the fix for this patch to upstream. Here is the link.. http://marc.info/?l=linux-scsi&m=126271209308764&w=2 All the other patches mentioned in the description are already in upstream. Babu - has this patch been /committed/ upstream? Also, bug 537257 has the rhel 6 patchset that seems to committed already. Do the rhel 5 and 6 patchsets match? Andrius, Yes all these patches have been committed to upstream. It was included in 2.6.33-rc1. Here are the commit id with links.. 1) scsidh_activate interface changes http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=3ae31f6a7b6e442fc6a92f29330fbad230dc3992 2) rdac hardware handler changes http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=970f3f47e7c97c0bfe9f91356943b55ac389cb1d 3) hp hardware handler changes http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4e2ef86cd5ce057b60acea33bb71c06676e71888 4) alua hardware handler changes. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=96e6586556dfa80112f42895be93c561582d9930 For bug 537514 (RHEL 5.5) - Chandra ported the patches 1 and 2 to RHEL 5.5. Because of time and resource constraints we did not port patch 3 and 4. For bug 537257 (RHEL 6) - I(Babu) ported the all the 4 patches to RHEL 6. Please let me know if you have any more questions.. Thanks Andrius, Now, I remember. There is one minor patch waiting for upstream commit. Here is the link.. http://marc.info/?l=linux-scsi&m=126463085821695&w=2 This is pretty new patch(three weeks old). Probably we might have to wait some more time to get this included in upstream. Thanks Babu If this patch is holding up everything, should we just not include it and move forward? Hi Andrius, It is a one line patch changing GFP_KERNEL to GFP_NOIO in a kzalloc() call. As a matter of fact, this was a result of review made by Mike Christie for the patches we submitted for bug #537257 (See Comment 14 - https://bugzilla.redhat.com/show_bug.cgi?id=537257#c14) IMHO, you can include it as I do not see there will be any issues of it being accepted upstream. (In reply to comment #30) > Hi Andrius, > > It is a one line patch changing GFP_KERNEL to GFP_NOIO in a kzalloc() call. > > As a matter of fact, this was a result of review made by Mike Christie for the > patches we submitted for bug #537257 (See Comment 14 - > https://bugzilla.redhat.com/show_bug.cgi?id=537257#c14) > > IMHO, you can include it as I do not see there will be any issues of it being > accepted upstream. I agree that this particular issue should not hold up this patch, in fact, we want the change, as Chandra suggests. Business Justification for inclusion of this issue in RHEL 5.5: This feature is required to support RHEL 5.5 with Hoggs storage enclosures.This storage enclosures is scheduled to release in June 2010. RHEL 5.5 support with this storage without this feature. *RHEL 5.5 support on this storage without this feature is not possible Patches 1 and 2 listed in the bugzilla are sufficient for Dell to test out this feature with our storage enclosure. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Fix enabling faster 'rdac' device handler device activation. Removes long delays on rdac device path activation noticed when paths to high number of luns activate simultaneously, such as a cable-pull in an active-passive multi-path environment. Partners -- This BZ was approved for *snapshot 1*, so please wait for snapshot 1 code to test it. I verified with snapshot1 and it works good. During failover, only two modeselect was issued to transfer ownership of 100 volumes. The ownership of 100 LUN's was transferred in just 25 seconds. Thanks. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html |