Bug 435670

Summary: RHEL5.2: USB stress test failure on AMD SBX00
Product: Red Hat Enterprise Linux 5 Reporter: Shane Huang <shane.huang>
Component: kernelAssignee: Bhavna Sarathy <bnagendr>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: low    
Version: 5.2CC: bnagendr, dzickus, poelstra, rdoty, tom.gao, ying-chang.tung
Target Milestone: rcKeywords: OtherQA
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2008-0314 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-21 15:11:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 253746    
Attachments:
Description Flags
backported patch to fix USB stress test failure
none
backported patch to fix USB stress test failure(updated)
none
Another USB patch to make things better none

Description Shane Huang 2008-03-03 09:01:49 UTC
Description of problem:

There is one SB600/SB700 USB bug which will lead to usb stress
test failure:
http://bugzilla.kernel.org/show_bug.cgi?id=8692

The workaround contains three linux patches:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-
2.6.git;a=commitdiff;h=07d29b63ef6b39963ab37818653284d861cf55af

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-
2.6.git;a=commitdiff;h=f8fa7571a928d6d0e1b7444b0ea69ec7dc7db3b6

http://lkml.org/lkml/2008/2/19/546

I backported these three patches for RHEL5.2, please add it into the kernel.

Thanks

Comment 1 Shane Huang 2008-03-03 09:01:49 UTC
Created attachment 296565 [details]
backported patch to fix USB stress test failure

Comment 2 Shane Huang 2008-03-03 09:04:41 UTC
You may add this patch and build one kernel rpm package for us,
I can ask our QA to test this rpm package. Thanks

Comment 3 Bhavna Sarathy 2008-03-03 14:31:29 UTC
Russ, please add to the 5.2 master bug list.  Peter has indicated that this
patch will go into a snapshot build.
Bhavana

Comment 4 Russell Doty 2008-03-03 15:19:43 UTC
Requesting inclusion in a Beta snapshot.

Comment 5 Bhavna Sarathy 2008-03-03 20:16:59 UTC
Shane, thanks for attaching the backport.  Brew build uploaded for AMD chipset
QA team to test.

http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1193626

Comment 6 Shane Huang 2008-03-04 06:15:08 UTC
Created attachment 296703 [details]
backported patch to fix USB stress test failure(updated)

Comment 7 Shane Huang 2008-03-04 06:18:17 UTC
I'm sorry that there is some mistake in the patch in Comment 1,
Can you help to build the testing kernel again with the updated patch?

This updated patch also has been sent to Bhavana with the suffix "_5.patch"

Thanks


Comment 8 Shane Huang 2008-03-06 08:42:52 UTC
Created attachment 297010 [details]
Another USB patch to make things better

Comment 9 Shane Huang 2008-03-06 08:45:33 UTC
Linux USB guys submit another patch to make things better:
http://marc.info/?l=linux-usb&m=120469059715031&w=2

And that new patch can also replace this one:
http://lkml.org/lkml/2008/2/19/546

So, I backported this new patch into the above patch in comment #8,
Can you apply it as well as patch in comment #6?
You need apply patch in comment #6 at first, then comment #8.

Thanks


Comment 10 Bhavna Sarathy 2008-03-10 14:37:47 UTC
Patch set posted to RHML on 3/10.

Comment 11 Bhavna Sarathy 2008-03-10 14:40:08 UTC
This patchset will be included in a snapshot build.

Comment 12 Shane Huang 2008-03-11 03:03:33 UTC
The third patch http://marc.info/?l=linux-usb&m=120469059715031&w=2
has been added to linus source tree too, which is:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-
2.6.git;a=commit;h=e82cc1288fa57857c6af8c57f3d07096d4bcd9d9

But there is some small difference between them,
the latter(git commit) also modified another line as below:

@@ -781,7 +811,7 @@ static int ehci_urb_enqueue (
 static void unlink_async (struct ehci_hcd *ehci, struct ehci_qh *qh)
 {
        /* failfast */
-       if (!HC_IS_RUNNING(ehci_to_hcd(ehci)->state))
+       if (!HC_IS_RUNNING(ehci_to_hcd(ehci)->state) && ehci->reclaim)
                end_unlink_async(ehci);

        /* if it's not linked then there's nothing to do */


Can you also modify it? I will NOT do any special backport for this line.

Thanks


Comment 14 Bhavna Sarathy 2008-03-11 18:34:47 UTC
Don, comment 12 lists a small change in the USB patch that went upstream.  How
should we deal with it?    Can you incorporate the change into the patch I
already posted? 

Comment 15 Don Zickus 2008-03-11 18:45:29 UTC
I haven't included the patch yet, please repost the whole thing.  Thanks.

Comment 16 Shane Huang 2008-03-17 01:51:11 UTC
As to the one line change in comment #12, here is the explanation from
David Brownell, the USB EHCI driver maintainer and patch submitter:

Quoting David:
> It would be useful paranoia, yes.  I don't have any reason to think 
> that the relevant bug ever triggered, but I suspect you've been 
> surprised on occasion too.


So my suggestion to this line change is:
If we have missed the RHEL5.2 boat, we can ignore this line change,
otherwise we'd better add it too.  And the git commit also contains it:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-
2.6.git;a=commit;h=e82cc1288fa57857c6af8c57f3d07096d4bcd9d9


Comment 18 Don Zickus 2008-03-26 20:31:50 UTC
in kernel-2.6.18-87.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 20 John Poelstra 2008-04-02 21:34:01 UTC
Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot3--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you


Comment 21 John Poelstra 2008-04-09 22:41:26 UTC
Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot4--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you


Comment 22 David Aquilina 2008-05-02 19:16:44 UTC
HP has indicated that they're still seeing failures with their SB600-based
systems. We're gathering more information from them. 

-David

Comment 23 Shane Huang 2008-05-04 01:35:21 UTC
We are not able to duplicate the failure again on our SB600 and SB700 boards
once the previous patch has been applied.

Can you ask HP guys to rebuild the failed kernel with CONFIG_USB_DEBUG
enabled and attach the dmesg output and the /var/log/messages
log file here when bonnie/bonnie++ test failed again?


Comment 24 YC.Tung 2008-05-07 15:06:29 UTC
After HP tested RHEL 5.2 SS7 x86_64 (kernel 2.6.18-91.el5) with bonnie , the 
testing can pass over 24 hours . We think this issue can close out .

Comment 25 Shane Huang 2008-05-08 01:09:40 UTC
OK, good news. Do you use HP xw4550 platform to test it?


Comment 26 YC.Tung 2008-05-08 01:59:54 UTC
Yes , we use HP xw4550 with 512 M DIMM for testing . And it can pass over 24 
hours . 

Comment 27 Shane Huang 2008-05-08 07:46:27 UTC
YC, RHEL5.2 RC has been released too, the kernel is: 2.6.18-92,
can you also test whether RC can work well or not?  Thanks


Comment 28 YC.Tung 2008-05-08 13:39:21 UTC
Sure !! I am downloading image now . I will test it next week . 

Comment 29 YC.Tung 2008-05-12 12:33:00 UTC
Shane ,

After I tested RHEL5.2 RC in xw4550 , it can pass bonnie 24 hours testing .
Please close out this issue . Thanks !!

Comment 32 errata-xmlrpc 2008-05-21 15:11:12 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html