Bug 435670 - RHEL5.2: USB stress test failure on AMD SBX00
RHEL5.2: USB stress test failure on AMD SBX00
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.2
All Linux
low Severity high
: rc
: ---
Assigned To: Bhavna Sarathy
Martin Jenner
: OtherQA
Depends On:
Blocks: 253746
  Show dependency treegraph
 
Reported: 2008-03-03 04:01 EST by Shane Huang
Modified: 2010-10-22 18:57 EDT (History)
6 users (show)

See Also:
Fixed In Version: RHBA-2008-0314
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-05-21 11:11:12 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
backported patch to fix USB stress test failure (9.11 KB, patch)
2008-03-03 04:01 EST, Shane Huang
no flags Details | Diff
backported patch to fix USB stress test failure(updated) (9.58 KB, patch)
2008-03-04 01:15 EST, Shane Huang
no flags Details | Diff
Another USB patch to make things better (3.43 KB, patch)
2008-03-06 03:42 EST, Shane Huang
no flags Details | Diff

  None (edit)
Description Shane Huang 2008-03-03 04:01:49 EST
Description of problem:

There is one SB600/SB700 USB bug which will lead to usb stress
test failure:
http://bugzilla.kernel.org/show_bug.cgi?id=8692

The workaround contains three linux patches:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-
2.6.git;a=commitdiff;h=07d29b63ef6b39963ab37818653284d861cf55af

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-
2.6.git;a=commitdiff;h=f8fa7571a928d6d0e1b7444b0ea69ec7dc7db3b6

http://lkml.org/lkml/2008/2/19/546

I backported these three patches for RHEL5.2, please add it into the kernel.

Thanks
Comment 1 Shane Huang 2008-03-03 04:01:49 EST
Created attachment 296565 [details]
backported patch to fix USB stress test failure
Comment 2 Shane Huang 2008-03-03 04:04:41 EST
You may add this patch and build one kernel rpm package for us,
I can ask our QA to test this rpm package. Thanks
Comment 3 Bhavna Sarathy 2008-03-03 09:31:29 EST
Russ, please add to the 5.2 master bug list.  Peter has indicated that this
patch will go into a snapshot build.
Bhavana
Comment 4 Russell Doty 2008-03-03 10:19:43 EST
Requesting inclusion in a Beta snapshot.
Comment 5 Bhavna Sarathy 2008-03-03 15:16:59 EST
Shane, thanks for attaching the backport.  Brew build uploaded for AMD chipset
QA team to test.

http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1193626
Comment 6 Shane Huang 2008-03-04 01:15:08 EST
Created attachment 296703 [details]
backported patch to fix USB stress test failure(updated)
Comment 7 Shane Huang 2008-03-04 01:18:17 EST
I'm sorry that there is some mistake in the patch in Comment 1,
Can you help to build the testing kernel again with the updated patch?

This updated patch also has been sent to Bhavana with the suffix "_5.patch"

Thanks
Comment 8 Shane Huang 2008-03-06 03:42:52 EST
Created attachment 297010 [details]
Another USB patch to make things better
Comment 9 Shane Huang 2008-03-06 03:45:33 EST
Linux USB guys submit another patch to make things better:
http://marc.info/?l=linux-usb&m=120469059715031&w=2

And that new patch can also replace this one:
http://lkml.org/lkml/2008/2/19/546

So, I backported this new patch into the above patch in comment #8,
Can you apply it as well as patch in comment #6?
You need apply patch in comment #6 at first, then comment #8.

Thanks
Comment 10 Bhavna Sarathy 2008-03-10 10:37:47 EDT
Patch set posted to RHML on 3/10.
Comment 11 Bhavna Sarathy 2008-03-10 10:40:08 EDT
This patchset will be included in a snapshot build.
Comment 12 Shane Huang 2008-03-10 23:03:33 EDT
The third patch http://marc.info/?l=linux-usb&m=120469059715031&w=2
has been added to linus source tree too, which is:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-
2.6.git;a=commit;h=e82cc1288fa57857c6af8c57f3d07096d4bcd9d9

But there is some small difference between them,
the latter(git commit) also modified another line as below:

@@ -781,7 +811,7 @@ static int ehci_urb_enqueue (
 static void unlink_async (struct ehci_hcd *ehci, struct ehci_qh *qh)
 {
        /* failfast */
-       if (!HC_IS_RUNNING(ehci_to_hcd(ehci)->state))
+       if (!HC_IS_RUNNING(ehci_to_hcd(ehci)->state) && ehci->reclaim)
                end_unlink_async(ehci);

        /* if it's not linked then there's nothing to do */


Can you also modify it? I will NOT do any special backport for this line.

Thanks
Comment 14 Bhavna Sarathy 2008-03-11 14:34:47 EDT
Don, comment 12 lists a small change in the USB patch that went upstream.  How
should we deal with it?    Can you incorporate the change into the patch I
already posted? 
Comment 15 Don Zickus 2008-03-11 14:45:29 EDT
I haven't included the patch yet, please repost the whole thing.  Thanks.
Comment 16 Shane Huang 2008-03-16 21:51:11 EDT
As to the one line change in comment #12, here is the explanation from
David Brownell, the USB EHCI driver maintainer and patch submitter:

Quoting David:
> It would be useful paranoia, yes.  I don't have any reason to think 
> that the relevant bug ever triggered, but I suspect you've been 
> surprised on occasion too.


So my suggestion to this line change is:
If we have missed the RHEL5.2 boat, we can ignore this line change,
otherwise we'd better add it too.  And the git commit also contains it:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-
2.6.git;a=commit;h=e82cc1288fa57857c6af8c57f3d07096d4bcd9d9
Comment 18 Don Zickus 2008-03-26 16:31:50 EDT
in kernel-2.6.18-87.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 20 John Poelstra 2008-04-02 17:34:01 EDT
Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot3--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you
Comment 21 John Poelstra 2008-04-09 18:41:26 EDT
Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot4--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you
Comment 22 David Aquilina 2008-05-02 15:16:44 EDT
HP has indicated that they're still seeing failures with their SB600-based
systems. We're gathering more information from them. 

-David
Comment 23 Shane Huang 2008-05-03 21:35:21 EDT
We are not able to duplicate the failure again on our SB600 and SB700 boards
once the previous patch has been applied.

Can you ask HP guys to rebuild the failed kernel with CONFIG_USB_DEBUG
enabled and attach the dmesg output and the /var/log/messages
log file here when bonnie/bonnie++ test failed again?
Comment 24 YC.Tung 2008-05-07 11:06:29 EDT
After HP tested RHEL 5.2 SS7 x86_64 (kernel 2.6.18-91.el5) with bonnie , the 
testing can pass over 24 hours . We think this issue can close out .
Comment 25 Shane Huang 2008-05-07 21:09:40 EDT
OK, good news. Do you use HP xw4550 platform to test it?
Comment 26 YC.Tung 2008-05-07 21:59:54 EDT
Yes , we use HP xw4550 with 512 M DIMM for testing . And it can pass over 24 
hours . 
Comment 27 Shane Huang 2008-05-08 03:46:27 EDT
YC, RHEL5.2 RC has been released too, the kernel is: 2.6.18-92,
can you also test whether RC can work well or not?  Thanks
Comment 28 YC.Tung 2008-05-08 09:39:21 EDT
Sure !! I am downloading image now . I will test it next week . 
Comment 29 YC.Tung 2008-05-12 08:33:00 EDT
Shane ,

After I tested RHEL5.2 RC in xw4550 , it can pass bonnie 24 hours testing .
Please close out this issue . Thanks !!
Comment 32 errata-xmlrpc 2008-05-21 11:11:12 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html

Note You need to log in before you can comment on or make changes to this bug.