Description of problem: There is one SB600/SB700 USB bug which will lead to usb stress test failure: http://bugzilla.kernel.org/show_bug.cgi?id=8692 The workaround contains three linux patches: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux- 2.6.git;a=commitdiff;h=07d29b63ef6b39963ab37818653284d861cf55af http://git.kernel.org/?p=linux/kernel/git/torvalds/linux- 2.6.git;a=commitdiff;h=f8fa7571a928d6d0e1b7444b0ea69ec7dc7db3b6 http://lkml.org/lkml/2008/2/19/546 I backported these three patches for RHEL5.2, please add it into the kernel. Thanks
Created attachment 296565 [details] backported patch to fix USB stress test failure
You may add this patch and build one kernel rpm package for us, I can ask our QA to test this rpm package. Thanks
Russ, please add to the 5.2 master bug list. Peter has indicated that this patch will go into a snapshot build. Bhavana
Requesting inclusion in a Beta snapshot.
Shane, thanks for attaching the backport. Brew build uploaded for AMD chipset QA team to test. http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1193626
Created attachment 296703 [details] backported patch to fix USB stress test failure(updated)
I'm sorry that there is some mistake in the patch in Comment 1, Can you help to build the testing kernel again with the updated patch? This updated patch also has been sent to Bhavana with the suffix "_5.patch" Thanks
Created attachment 297010 [details] Another USB patch to make things better
Linux USB guys submit another patch to make things better: http://marc.info/?l=linux-usb&m=120469059715031&w=2 And that new patch can also replace this one: http://lkml.org/lkml/2008/2/19/546 So, I backported this new patch into the above patch in comment #8, Can you apply it as well as patch in comment #6? You need apply patch in comment #6 at first, then comment #8. Thanks
Patch set posted to RHML on 3/10.
This patchset will be included in a snapshot build.
The third patch http://marc.info/?l=linux-usb&m=120469059715031&w=2 has been added to linus source tree too, which is: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux- 2.6.git;a=commit;h=e82cc1288fa57857c6af8c57f3d07096d4bcd9d9 But there is some small difference between them, the latter(git commit) also modified another line as below: @@ -781,7 +811,7 @@ static int ehci_urb_enqueue ( static void unlink_async (struct ehci_hcd *ehci, struct ehci_qh *qh) { /* failfast */ - if (!HC_IS_RUNNING(ehci_to_hcd(ehci)->state)) + if (!HC_IS_RUNNING(ehci_to_hcd(ehci)->state) && ehci->reclaim) end_unlink_async(ehci); /* if it's not linked then there's nothing to do */ Can you also modify it? I will NOT do any special backport for this line. Thanks
Don, comment 12 lists a small change in the USB patch that went upstream. How should we deal with it? Can you incorporate the change into the patch I already posted?
I haven't included the patch yet, please repost the whole thing. Thanks.
As to the one line change in comment #12, here is the explanation from David Brownell, the USB EHCI driver maintainer and patch submitter: Quoting David: > It would be useful paranoia, yes. I don't have any reason to think > that the relevant bug ever triggered, but I suspect you've been > surprised on occasion too. So my suggestion to this line change is: If we have missed the RHEL5.2 boat, we can ignore this line change, otherwise we'd better add it too. And the git commit also contains it: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux- 2.6.git;a=commit;h=e82cc1288fa57857c6af8c57f3d07096d4bcd9d9
in kernel-2.6.18-87.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
Greetings Red Hat Partner, A fix for this issue should be included in the latest packages contained in RHEL5.2-Snapshot3--available now on partners.redhat.com. Please test and confirm that your issue is fixed. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to ASSIGNED. If you are receiving this message in Issue Tracker, please reply with a message to Issue Tracker about your results and I will update bugzilla for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. Thank you
Greetings Red Hat Partner, A fix for this issue should be included in the latest packages contained in RHEL5.2-Snapshot4--available now on partners.redhat.com. Please test and confirm that your issue is fixed. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to ASSIGNED. If you are receiving this message in Issue Tracker, please reply with a message to Issue Tracker about your results and I will update bugzilla for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. Thank you
HP has indicated that they're still seeing failures with their SB600-based systems. We're gathering more information from them. -David
We are not able to duplicate the failure again on our SB600 and SB700 boards once the previous patch has been applied. Can you ask HP guys to rebuild the failed kernel with CONFIG_USB_DEBUG enabled and attach the dmesg output and the /var/log/messages log file here when bonnie/bonnie++ test failed again?
After HP tested RHEL 5.2 SS7 x86_64 (kernel 2.6.18-91.el5) with bonnie , the testing can pass over 24 hours . We think this issue can close out .
OK, good news. Do you use HP xw4550 platform to test it?
Yes , we use HP xw4550 with 512 M DIMM for testing . And it can pass over 24 hours .
YC, RHEL5.2 RC has been released too, the kernel is: 2.6.18-92, can you also test whether RC can work well or not? Thanks
Sure !! I am downloading image now . I will test it next week .
Shane , After I tested RHEL5.2 RC in xw4550 , it can pass bonnie 24 hours testing . Please close out this issue . Thanks !!
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html