Bug 516881 - [Promise 5.5 feat] Update stex driver to version 4.6.0102.4
Summary: [Promise 5.5 feat] Update stex driver to version 4.6.0102.4
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.5
Hardware: All
OS: Linux
high
high
Target Milestone: alpha
: 5.5
Assignee: David Milburn
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
: 525350 (view as bug list)
Depends On:
Blocks: 525215 533192 533941 557292
TreeView+ depends on / blocked
 
Reported: 2009-08-11 19:09 UTC by Ed lin
Modified: 2018-10-27 16:05 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-30 07:14:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch against kernel 2.6.18-166.el5 (12.89 KB, patch)
2009-10-01 00:08 UTC, Ed lin
no flags Details | Diff
Patch to fix inconsitent usage of max_lun (2.02 KB, patch)
2009-10-27 00:01 UTC, Ed lin
no flags Details | Diff
patch to fix scan of nonexistent lun (459 bytes, patch)
2009-12-22 23:15 UTC, Ed lin
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0178 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update 2010-03-29 12:18:21 UTC

Description Ed lin 2009-08-11 19:09:51 UTC
Description of problem:
Stex driver update for RHEL 5.4

Version-Release number of selected component (if applicable):
RHEL 5.5

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Andrius Benokraitis 2009-08-12 06:31:31 UTC
Ed, please post the backported patch from upstream by the end of Sep.

Comment 2 Ed lin 2009-10-01 00:08:47 UTC
Created attachment 363272 [details]
patch against kernel 2.6.18-166.el5

Patch against kernel 2.6.18-166.el5 is attached.

Comment 3 Andrius Benokraitis 2009-10-01 01:19:46 UTC
Thank you Ed!

Tom - does your team need anything else from Ed on this?

Comment 7 Chris Ward 2009-10-14 10:55:33 UTC
@Promise,

We need to confirm that there is commitment to test 
for the resolution of this request during the RHEL 5.5 test
phase, if it is accepted into the release. 

Please post a confirmation before Oct 16th, 2009, 
including the contact information for testing engineers.

Comment 8 Ed lin 2009-10-15 23:07:55 UTC
Promise will take the commitment to test this patch (driver) during RHEL 5.5 test phase.

Right now I am the contact for this issue. Promise will provide test engineer/plan/result/contact in the future, as required by Red Hat.

Comment 9 David Milburn 2009-10-23 16:05:02 UTC
Ed,

I was looking over the update patch, I have a few questions.

Basically, it is based upon this upstream commit.

commit 69cb48750b02034350bc78d8053647d7464cdde0
Author: Ed Lin <ed.lin>
Date:   Tue Aug 18 12:15:14 2009 -0700

    [SCSI] stex: Add reset code for st_yel (v2)

    Add reset related code for st_yel.
    1. Set the SS_H2I_INT_RESET bit.
    2. Wait for the SS_MU_OPERATIONAL flag. This is also part of
       normal handshake process so move it to handshake routine.
    3. Continue handshake with the firmware.

Comparing to upstream stex.c, I follow #2 and #3 (handshake with firmware) 
for st_yel we go thru stex_ss_handshake(), and this code looks consistent with
upstream. I noticed that you set h->scratch_size in 2 steps, but that seems
ok since your using scratch_size in the memset near the end of stex_ss_handshake.

One question, the RHEL5 version flushes after writing to YH2I_REQ_HI and 
YH2I_REQ, upstream doesn't in stex_ss_handshake()?

Also, in RHEL5 reset code there is a special case in stex_intr to handle
MU_OUTBOUND_DOORBELL_REQUEST_RESET with workqueue as well as 
SS_I2H_REQUEST_RESET in stex_ss_intr(), why does the upstream code not
handle this bit?

And, I noticed you changed defining the MU register values and status code,
some of the status codes have different values than upstream code, and
MU_OUTBOUND_DOORBELL_REQUEST_RESET is not defined upstream, were you planning
on changing these upstream? 

Just trying to reconcile the changes to upstream, thank you.

Comment 10 Ed lin 2009-10-23 18:22:19 UTC
Hi David,

There are three later upstream commits on which the RHEL 5.5 patch is based, along with the patch you mentioned, which was submitted earlier.

http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=4be749e2da8fb1ca7a7689d0092f6833ff1cce9b

http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=b8cfb36e7429fbfab684b5dbb36b6f46f16bb3ba

http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=32d2b5e9480f3dcb6dd81403dcac0d796d0dbde9

The first is patch about the small buffer issue which you had added to RHEL 5 kernel. The last is version update. So the main difference you saw is introduced by the second:

---
From b8cfb36e7429fbfab684b5dbb36b6f46f16bb3ba Mon Sep 17 00:00:00 2001
From: Ed Lin <ed.lin>
Date: Mon, 28 Sep 2009 22:58:33 -0800
Subject: [PATCH] [SCSI] stex: add support for reset request from firmware

Add support for reset request from firmware for controllers
of st_shasta and st_yel type. Code adjustments necessary
for this change are also included.
---

This should answer your questions. Please let me know if you still have question with it.

Thanks,
Ed

Comment 11 David Milburn 2009-10-26 22:19:09 UTC
Hi Ed,

Thanks for pointing out those scsi-misc commits, I have built some test kernel
rpms, please test and let me know if you encounter problems. I have started
to regression test on SuperTrak EX8350/EX16300.

 
http://people.redhat.com/dmilburn/.bz516881/

Comment 12 Ed lin 2009-10-27 00:01:36 UTC
Created attachment 366180 [details]
Patch to fix inconsitent usage of max_lun

Hi David,

I downloaded the kernel from the link and did some test. From the limited test (including I/O and reset on st_yel) result, it seems that this kernel and the driver are good.

There is another upstream scsi patch. The commit link and information are here:

http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=367703558825614f70992f424d68fc7408ff562f

[SCSI] fix inconsistent usage of max_lunauthor	Ed Lin <ed.lin>	
	Fri, 9 Oct 2009 22:23:27 +0000 (15:23 -0700)
committer	James Bottomley <James.Bottomley>	
	Tue, 13 Oct 2009 18:36:17 +0000 (13:36 -0500)

max_lun in struct Scsi_Host is used as actual max lun
plus 1 in scsi_sequential_lun_scan()(scsi_scan.c).
However it is also used as actual max lun in some
other cases.

According to the comment in the definition of struct
Scsi_Host, max_lun should be set to 1 more than the
actual max lun, just like max_id. Fix the problem
according to this definition.

Signed-off-by: Ed Lin <ed.lin>
Signed-off-by: James Bottomley <James.Bottomley>

The patch of RHEL 5 version is attached. Without this patch, the stex driver and firmware may return wrong information when the scsi mid layer performs an invalid extra scan. However, it is a patch about scsi mid layer by nature. Let me know if you have concern about this patch.

Thanks,
Ed

Comment 13 David Milburn 2009-10-27 17:20:16 UTC
Hi Ed,

Since the patch in Comment #12 is a SCSI mid-layer change we will need to
open up a seperate Bugzilla, would please provide more details concerning the
failure? It does look like SCSI mid-layer code scsi_report_lun_scan doesn't
intend to scan for a LUN larger than a host allows.

Thanks,
David

Comment 14 Ed lin 2009-10-27 18:29:08 UTC
Hi David,

For example, the stex driver may set max_lun to 256, which means valid lun range is from 0 to 255. If the mid layer scans (invalid) lun 256, then because the lun in stex payload message is u8, 256 will become 0, and the firmware will have no way to tell it's intended for lun 256 instead of lun 0, and will return wrong data. If lun 0 exists, this will make lun 256 also seem to exist. So we need to fix the problem. And the patch is a correct fix which is done in the scsi mid layer.

The specialness of this problem is that it will not be triggered in normal driver loading, because max_lun is used correctly for that case. This problem may be triggered only when you manually scan a specific lun, because this time max_lun is used as max actual lun (wrong), instead of max actual lun plus 1 (correct).

Thanks,
Ed

Comment 15 David Milburn 2009-10-28 15:26:23 UTC
Thank you Ed, I have opened BZ 531488 for inconsistent usage of max_lun.

https://bugzilla.redhat.com/show_bug.cgi?id=531488

Please let me know when you have completed testing the stex driver update with
kernel-2.6.18-169.el5-bz516881a.1.

Comment 16 David Milburn 2009-11-23 22:43:51 UTC
Hi Ed, Qiming,

Just checking in to see if testing was complete with the stex driver update,
once completed we will need to have the update patch for RHEL5.5 reviewed
internally, we will need to do this very soon.

Thanks,
David

Comment 17 Ed lin 2009-11-26 02:16:31 UTC
Hi David,

I finished the test on the controllers at my hand. All of them were good, no regression was detected. Considering the fact that the patch is mainly about adding reset handling of st_yel, I think the test was good enough. So now, maybe it's time for the next action.

Thanks,
Ed

Comment 19 David Milburn 2009-11-30 20:10:22 UTC
Thank you Ed, the patch being reviewed.

Comment 20 Don Zickus 2009-12-09 18:11:50 UTC
in kernel-2.6.18-178.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please update the appropriate value in the Verified field
(cf_verified) to indicate this fix has been successfully
verified. Include a comment with verification details.

Comment 22 Ed lin 2009-12-22 23:15:04 UTC
Created attachment 379947 [details]
patch to fix scan of nonexistent lun

Due to the unexpected upstream reversion of the patch in BZ 531488, the stex driver must be modified to avoid the bug of adding a nonexistent lun.

The upstream submission of this patch can be found at: http://marc.info/?l=linux-scsi&m=126118651528399&w=2

Here the patch for RHEL 5 (against kernel-2.6.18-182.el5) is attached. Hopefully the upstream and RHEL 5 patches can be accepted. Sorry for the late update, but it is really an unexpected event.

Comment 23 Andrius Benokraitis 2009-12-23 15:42:50 UTC
Ed, please create a new bugzilla for Comment #22 - this bug is already in ON_QA.

Comment 24 Andrius Benokraitis 2009-12-23 15:43:54 UTC
Obsoleting patch in Comment #22 since this Bugzilla is already included in 5.5 Beta.

Comment 25 David Milburn 2009-12-23 15:49:50 UTC
Thanks for patch Ed, Andrius I have created some test rpms with the patch
from Comment #22, I will make them available on BZ 531488, no need to 
create a new BZ.

Comment 26 Chris Ward 2010-02-11 10:16:24 UTC
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~

RHEL 5.5 Beta has been released! There should be a fix present in this 
release that addresses your request. Please test and report back results 
here, by March 3rd 2010 (2010-03-03) or sooner.

Upon successful verification of this request, post your results and update 
the Verified field in Bugzilla with the appropriate value.

If you encounter any issues while testing, please describe them and set 
this bug into NEED_INFO. If you encounter new defects or have additional 
patch(es) to request for inclusion, please clone this bug per each request
and escalate through your support representative.

Comment 27 David Milburn 2010-02-15 18:27:03 UTC
*** Bug 525350 has been marked as a duplicate of this bug. ***

Comment 28 Ed lin 2010-03-03 03:16:38 UTC
Verified using The Red Hat Enterprise Linux 5.5 GA Snapshot 2 (*kernel-2.6.18-189.el5*). Installation, reset, and normal I/O are all good. The problem in BZ 531488 is also fixed. Set the Verified field.

Thanks!

Comment 29 Andrius Benokraitis 2010-03-11 04:40:50 UTC
Sorry for muddying this bug, but Ed, are you requesting an updated driver for RHEL 5.6? If so we'll need that placeholder BZ created ASAP.

Comment 30 Ed lin 2010-03-15 18:56:54 UTC
Hi Andrius,

Thanks for reminding. I created a placeholder (BZ 573777), just in case.

Regards,
Ed

Comment 32 errata-xmlrpc 2010-03-30 07:14:27 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html


Note You need to log in before you can comment on or make changes to this bug.