Description of problem: Stex driver update for RHEL 5.4 Version-Release number of selected component (if applicable): RHEL 5.5 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Ed, please post the backported patch from upstream by the end of Sep.
Created attachment 363272 [details] patch against kernel 2.6.18-166.el5 Patch against kernel 2.6.18-166.el5 is attached.
Thank you Ed! Tom - does your team need anything else from Ed on this?
@Promise, We need to confirm that there is commitment to test for the resolution of this request during the RHEL 5.5 test phase, if it is accepted into the release. Please post a confirmation before Oct 16th, 2009, including the contact information for testing engineers.
Promise will take the commitment to test this patch (driver) during RHEL 5.5 test phase. Right now I am the contact for this issue. Promise will provide test engineer/plan/result/contact in the future, as required by Red Hat.
Ed, I was looking over the update patch, I have a few questions. Basically, it is based upon this upstream commit. commit 69cb48750b02034350bc78d8053647d7464cdde0 Author: Ed Lin <ed.lin> Date: Tue Aug 18 12:15:14 2009 -0700 [SCSI] stex: Add reset code for st_yel (v2) Add reset related code for st_yel. 1. Set the SS_H2I_INT_RESET bit. 2. Wait for the SS_MU_OPERATIONAL flag. This is also part of normal handshake process so move it to handshake routine. 3. Continue handshake with the firmware. Comparing to upstream stex.c, I follow #2 and #3 (handshake with firmware) for st_yel we go thru stex_ss_handshake(), and this code looks consistent with upstream. I noticed that you set h->scratch_size in 2 steps, but that seems ok since your using scratch_size in the memset near the end of stex_ss_handshake. One question, the RHEL5 version flushes after writing to YH2I_REQ_HI and YH2I_REQ, upstream doesn't in stex_ss_handshake()? Also, in RHEL5 reset code there is a special case in stex_intr to handle MU_OUTBOUND_DOORBELL_REQUEST_RESET with workqueue as well as SS_I2H_REQUEST_RESET in stex_ss_intr(), why does the upstream code not handle this bit? And, I noticed you changed defining the MU register values and status code, some of the status codes have different values than upstream code, and MU_OUTBOUND_DOORBELL_REQUEST_RESET is not defined upstream, were you planning on changing these upstream? Just trying to reconcile the changes to upstream, thank you.
Hi David, There are three later upstream commits on which the RHEL 5.5 patch is based, along with the patch you mentioned, which was submitted earlier. http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=4be749e2da8fb1ca7a7689d0092f6833ff1cce9b http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=b8cfb36e7429fbfab684b5dbb36b6f46f16bb3ba http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=32d2b5e9480f3dcb6dd81403dcac0d796d0dbde9 The first is patch about the small buffer issue which you had added to RHEL 5 kernel. The last is version update. So the main difference you saw is introduced by the second: --- From b8cfb36e7429fbfab684b5dbb36b6f46f16bb3ba Mon Sep 17 00:00:00 2001 From: Ed Lin <ed.lin> Date: Mon, 28 Sep 2009 22:58:33 -0800 Subject: [PATCH] [SCSI] stex: add support for reset request from firmware Add support for reset request from firmware for controllers of st_shasta and st_yel type. Code adjustments necessary for this change are also included. --- This should answer your questions. Please let me know if you still have question with it. Thanks, Ed
Hi Ed, Thanks for pointing out those scsi-misc commits, I have built some test kernel rpms, please test and let me know if you encounter problems. I have started to regression test on SuperTrak EX8350/EX16300. http://people.redhat.com/dmilburn/.bz516881/
Created attachment 366180 [details] Patch to fix inconsitent usage of max_lun Hi David, I downloaded the kernel from the link and did some test. From the limited test (including I/O and reset on st_yel) result, it seems that this kernel and the driver are good. There is another upstream scsi patch. The commit link and information are here: http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=367703558825614f70992f424d68fc7408ff562f [SCSI] fix inconsistent usage of max_lunauthor Ed Lin <ed.lin> Fri, 9 Oct 2009 22:23:27 +0000 (15:23 -0700) committer James Bottomley <James.Bottomley> Tue, 13 Oct 2009 18:36:17 +0000 (13:36 -0500) max_lun in struct Scsi_Host is used as actual max lun plus 1 in scsi_sequential_lun_scan()(scsi_scan.c). However it is also used as actual max lun in some other cases. According to the comment in the definition of struct Scsi_Host, max_lun should be set to 1 more than the actual max lun, just like max_id. Fix the problem according to this definition. Signed-off-by: Ed Lin <ed.lin> Signed-off-by: James Bottomley <James.Bottomley> The patch of RHEL 5 version is attached. Without this patch, the stex driver and firmware may return wrong information when the scsi mid layer performs an invalid extra scan. However, it is a patch about scsi mid layer by nature. Let me know if you have concern about this patch. Thanks, Ed
Hi Ed, Since the patch in Comment #12 is a SCSI mid-layer change we will need to open up a seperate Bugzilla, would please provide more details concerning the failure? It does look like SCSI mid-layer code scsi_report_lun_scan doesn't intend to scan for a LUN larger than a host allows. Thanks, David
Hi David, For example, the stex driver may set max_lun to 256, which means valid lun range is from 0 to 255. If the mid layer scans (invalid) lun 256, then because the lun in stex payload message is u8, 256 will become 0, and the firmware will have no way to tell it's intended for lun 256 instead of lun 0, and will return wrong data. If lun 0 exists, this will make lun 256 also seem to exist. So we need to fix the problem. And the patch is a correct fix which is done in the scsi mid layer. The specialness of this problem is that it will not be triggered in normal driver loading, because max_lun is used correctly for that case. This problem may be triggered only when you manually scan a specific lun, because this time max_lun is used as max actual lun (wrong), instead of max actual lun plus 1 (correct). Thanks, Ed
Thank you Ed, I have opened BZ 531488 for inconsistent usage of max_lun. https://bugzilla.redhat.com/show_bug.cgi?id=531488 Please let me know when you have completed testing the stex driver update with kernel-2.6.18-169.el5-bz516881a.1.
Hi Ed, Qiming, Just checking in to see if testing was complete with the stex driver update, once completed we will need to have the update patch for RHEL5.5 reviewed internally, we will need to do this very soon. Thanks, David
Hi David, I finished the test on the controllers at my hand. All of them were good, no regression was detected. Considering the fact that the patch is mainly about adding reset handling of st_yel, I think the test was good enough. So now, maybe it's time for the next action. Thanks, Ed
Thank you Ed, the patch being reviewed.
in kernel-2.6.18-178.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please update the appropriate value in the Verified field (cf_verified) to indicate this fix has been successfully verified. Include a comment with verification details.
Created attachment 379947 [details] patch to fix scan of nonexistent lun Due to the unexpected upstream reversion of the patch in BZ 531488, the stex driver must be modified to avoid the bug of adding a nonexistent lun. The upstream submission of this patch can be found at: http://marc.info/?l=linux-scsi&m=126118651528399&w=2 Here the patch for RHEL 5 (against kernel-2.6.18-182.el5) is attached. Hopefully the upstream and RHEL 5 patches can be accepted. Sorry for the late update, but it is really an unexpected event.
Ed, please create a new bugzilla for Comment #22 - this bug is already in ON_QA.
Obsoleting patch in Comment #22 since this Bugzilla is already included in 5.5 Beta.
Thanks for patch Ed, Andrius I have created some test rpms with the patch from Comment #22, I will make them available on BZ 531488, no need to create a new BZ.
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~ RHEL 5.5 Beta has been released! There should be a fix present in this release that addresses your request. Please test and report back results here, by March 3rd 2010 (2010-03-03) or sooner. Upon successful verification of this request, post your results and update the Verified field in Bugzilla with the appropriate value. If you encounter any issues while testing, please describe them and set this bug into NEED_INFO. If you encounter new defects or have additional patch(es) to request for inclusion, please clone this bug per each request and escalate through your support representative.
*** Bug 525350 has been marked as a duplicate of this bug. ***
Verified using The Red Hat Enterprise Linux 5.5 GA Snapshot 2 (*kernel-2.6.18-189.el5*). Installation, reset, and normal I/O are all good. The problem in BZ 531488 is also fixed. Set the Verified field. Thanks!
Sorry for muddying this bug, but Ed, are you requesting an updated driver for RHEL 5.6? If so we'll need that placeholder BZ created ASAP.
Hi Andrius, Thanks for reminding. I created a placeholder (BZ 573777), just in case. Regards, Ed
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html