Bug 435670 - RHEL5.2: USB stress test failure on AMD SBX00
Summary: RHEL5.2: USB stress test failure on AMD SBX00
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.2
Hardware: All
OS: Linux
low
high
Target Milestone: rc
: ---
Assignee: Bhavna Sarathy
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks: 253746
TreeView+ depends on / blocked
 
Reported: 2008-03-03 09:01 UTC by Shane Huang
Modified: 2018-10-19 21:50 UTC (History)
6 users (show)

Fixed In Version: RHBA-2008-0314
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-05-21 15:11:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
backported patch to fix USB stress test failure (9.11 KB, patch)
2008-03-03 09:01 UTC, Shane Huang
no flags Details | Diff
backported patch to fix USB stress test failure(updated) (9.58 KB, patch)
2008-03-04 06:15 UTC, Shane Huang
no flags Details | Diff
Another USB patch to make things better (3.43 KB, patch)
2008-03-06 08:42 UTC, Shane Huang
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0314 0 normal SHIPPED_LIVE Updated kernel packages for Red Hat Enterprise Linux 5.2 2008-05-20 18:43:34 UTC

Description Shane Huang 2008-03-03 09:01:49 UTC
Description of problem:

There is one SB600/SB700 USB bug which will lead to usb stress
test failure:
http://bugzilla.kernel.org/show_bug.cgi?id=8692

The workaround contains three linux patches:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-
2.6.git;a=commitdiff;h=07d29b63ef6b39963ab37818653284d861cf55af

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-
2.6.git;a=commitdiff;h=f8fa7571a928d6d0e1b7444b0ea69ec7dc7db3b6

http://lkml.org/lkml/2008/2/19/546

I backported these three patches for RHEL5.2, please add it into the kernel.

Thanks

Comment 1 Shane Huang 2008-03-03 09:01:49 UTC
Created attachment 296565 [details]
backported patch to fix USB stress test failure

Comment 2 Shane Huang 2008-03-03 09:04:41 UTC
You may add this patch and build one kernel rpm package for us,
I can ask our QA to test this rpm package. Thanks

Comment 3 Bhavna Sarathy 2008-03-03 14:31:29 UTC
Russ, please add to the 5.2 master bug list.  Peter has indicated that this
patch will go into a snapshot build.
Bhavana

Comment 4 Russell Doty 2008-03-03 15:19:43 UTC
Requesting inclusion in a Beta snapshot.

Comment 5 Bhavna Sarathy 2008-03-03 20:16:59 UTC
Shane, thanks for attaching the backport.  Brew build uploaded for AMD chipset
QA team to test.

http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1193626

Comment 6 Shane Huang 2008-03-04 06:15:08 UTC
Created attachment 296703 [details]
backported patch to fix USB stress test failure(updated)

Comment 7 Shane Huang 2008-03-04 06:18:17 UTC
I'm sorry that there is some mistake in the patch in Comment 1,
Can you help to build the testing kernel again with the updated patch?

This updated patch also has been sent to Bhavana with the suffix "_5.patch"

Thanks


Comment 8 Shane Huang 2008-03-06 08:42:52 UTC
Created attachment 297010 [details]
Another USB patch to make things better

Comment 9 Shane Huang 2008-03-06 08:45:33 UTC
Linux USB guys submit another patch to make things better:
http://marc.info/?l=linux-usb&m=120469059715031&w=2

And that new patch can also replace this one:
http://lkml.org/lkml/2008/2/19/546

So, I backported this new patch into the above patch in comment #8,
Can you apply it as well as patch in comment #6?
You need apply patch in comment #6 at first, then comment #8.

Thanks


Comment 10 Bhavna Sarathy 2008-03-10 14:37:47 UTC
Patch set posted to RHML on 3/10.

Comment 11 Bhavna Sarathy 2008-03-10 14:40:08 UTC
This patchset will be included in a snapshot build.

Comment 12 Shane Huang 2008-03-11 03:03:33 UTC
The third patch http://marc.info/?l=linux-usb&m=120469059715031&w=2
has been added to linus source tree too, which is:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-
2.6.git;a=commit;h=e82cc1288fa57857c6af8c57f3d07096d4bcd9d9

But there is some small difference between them,
the latter(git commit) also modified another line as below:

@@ -781,7 +811,7 @@ static int ehci_urb_enqueue (
 static void unlink_async (struct ehci_hcd *ehci, struct ehci_qh *qh)
 {
        /* failfast */
-       if (!HC_IS_RUNNING(ehci_to_hcd(ehci)->state))
+       if (!HC_IS_RUNNING(ehci_to_hcd(ehci)->state) && ehci->reclaim)
                end_unlink_async(ehci);

        /* if it's not linked then there's nothing to do */


Can you also modify it? I will NOT do any special backport for this line.

Thanks


Comment 14 Bhavna Sarathy 2008-03-11 18:34:47 UTC
Don, comment 12 lists a small change in the USB patch that went upstream.  How
should we deal with it?    Can you incorporate the change into the patch I
already posted? 

Comment 15 Don Zickus 2008-03-11 18:45:29 UTC
I haven't included the patch yet, please repost the whole thing.  Thanks.

Comment 16 Shane Huang 2008-03-17 01:51:11 UTC
As to the one line change in comment #12, here is the explanation from
David Brownell, the USB EHCI driver maintainer and patch submitter:

Quoting David:
> It would be useful paranoia, yes.  I don't have any reason to think 
> that the relevant bug ever triggered, but I suspect you've been 
> surprised on occasion too.


So my suggestion to this line change is:
If we have missed the RHEL5.2 boat, we can ignore this line change,
otherwise we'd better add it too.  And the git commit also contains it:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-
2.6.git;a=commit;h=e82cc1288fa57857c6af8c57f3d07096d4bcd9d9


Comment 18 Don Zickus 2008-03-26 20:31:50 UTC
in kernel-2.6.18-87.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 20 John Poelstra 2008-04-02 21:34:01 UTC
Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot3--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you


Comment 21 John Poelstra 2008-04-09 22:41:26 UTC
Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot4--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you


Comment 22 David Aquilina 2008-05-02 19:16:44 UTC
HP has indicated that they're still seeing failures with their SB600-based
systems. We're gathering more information from them. 

-David

Comment 23 Shane Huang 2008-05-04 01:35:21 UTC
We are not able to duplicate the failure again on our SB600 and SB700 boards
once the previous patch has been applied.

Can you ask HP guys to rebuild the failed kernel with CONFIG_USB_DEBUG
enabled and attach the dmesg output and the /var/log/messages
log file here when bonnie/bonnie++ test failed again?


Comment 24 YC.Tung 2008-05-07 15:06:29 UTC
After HP tested RHEL 5.2 SS7 x86_64 (kernel 2.6.18-91.el5) with bonnie , the 
testing can pass over 24 hours . We think this issue can close out .

Comment 25 Shane Huang 2008-05-08 01:09:40 UTC
OK, good news. Do you use HP xw4550 platform to test it?


Comment 26 YC.Tung 2008-05-08 01:59:54 UTC
Yes , we use HP xw4550 with 512 M DIMM for testing . And it can pass over 24 
hours . 

Comment 27 Shane Huang 2008-05-08 07:46:27 UTC
YC, RHEL5.2 RC has been released too, the kernel is: 2.6.18-92,
can you also test whether RC can work well or not?  Thanks


Comment 28 YC.Tung 2008-05-08 13:39:21 UTC
Sure !! I am downloading image now . I will test it next week . 

Comment 29 YC.Tung 2008-05-12 12:33:00 UTC
Shane ,

After I tested RHEL5.2 RC in xw4550 , it can pass bonnie 24 hours testing .
Please close out this issue . Thanks !!

Comment 32 errata-xmlrpc 2008-05-21 15:11:12 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html



Note You need to log in before you can comment on or make changes to this bug.