Summary: | bnx2: panic in bnx2_poll_work() | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | John Feeney <jfeeney> | |
Component: | kernel | Assignee: | John Feeney <jfeeney> | |
Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | |
Severity: | medium | Docs Contact: | ||
Priority: | urgent | |||
Version: | 5.4 | CC: | agospoda, anton, bzeranski, caiqian, davidkwood, dhoward, emcnabb, hjia, jane.lv, jburke, jpirko, jvillalo, lcm, luyu, mchan, nobody+PNT0273897, prarit, tis | |
Target Milestone: | rc | Keywords: | ZStream | |
Target Release: | 5.5 | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 623265 (view as bug list) | Environment: | ||
Last Closed: | 2010-03-30 06:54:04 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Bug Depends On: | ||||
Bug Blocks: | 532386, 515318, 533941, 539686, 623265 |
Description
John Feeney
2009-09-30 14:56:40 UTC
The RHEL5 patch being posted is: --- linux-2.6.18.noarch/drivers/net/bnx2.c.orig +++ linux-2.6.18.noarch/drivers/net/bnx2.c @@ -2750,6 +2750,7 @@ bnx2_get_hw_tx_cons(struct bnx2_napi *bn /* Tell compiler that status block fields can change. */ barrier(); cons = *bnapi->hw_tx_cons_ptr; + barrier(); if (unlikely((cons & MAX_TX_DESC_CNT) == MAX_TX_DESC_CNT)) cons++; return cons; @@ -3031,6 +3032,7 @@ bnx2_get_hw_rx_cons(struct bnx2_napi *bn /* Tell compiler that status block fields can change. */ barrier(); cons = *bnapi->hw_rx_cons_ptr; + barrier(); if (unlikely((cons & MAX_RX_DESC_CNT) == MAX_RX_DESC_CNT)) cons++; return cons; Unfortunately, the patch provided in comment #3 does not fix the problem. The system under test still paniced. This is most likely caused by NULL skb when we are handling tx interrupt in bnx2_tx_int(). A similar issue was reported upstream a while ago shown by the thread below. http://marc.info/?t=121362387400001&r=1&w=2 This issue was ultimately fixed by the patch below. Does RHEL5.4 have this patch? 69747650c814a8a79fef412c7416adf823293a3e pkt_sched: Fix return value corruption in HTB and TBF. This problem was only seen when using HTB or TBF qdisc though. (In reply to comment #6) > This is most likely caused by NULL skb when we are handling tx interrupt in > bnx2_tx_int(). A similar issue was reported upstream a while ago shown by the > thread below. > > http://marc.info/?t=121362387400001&r=1&w=2 > > This issue was ultimately fixed by the patch below. Does RHEL5.4 have this > patch? The first two chunks look like they're in RHEL5, but the latter one is not. P. > > 69747650c814a8a79fef412c7416adf823293a3e > pkt_sched: Fix return value corruption in HTB and TBF. > > This problem was only seen when using HTB or TBF qdisc though. A x86_64 kernel rpm that has the patch provided by comment #6 can be found on my people page. A i686 version will be there as soon as it finishes building. See http://people.redhat.com/jfeeney/.bz526481 ------- Comment From kumarr@linux.ibm.com 2009-10-29 17:09 EDT------- Mirroring over to IBM I am available to test the kernel listed in comment #8 but it's not clear to me from the bug which Broadcom adapter the problem was discovered on. Does it only appear on the IBM Ghidorah? Peter Any Broadcom bnx2 NICs can potentially encounter this problem because the driver relies on the nr_frags in the SKB to not change when the SKB is queued for transmission. This problem is only known to exist when using HTB or TBF qdisc. patch posted on 11/20/09: 2:40 PM.EDT in kernel-2.6.18-175.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified. *** Bug 523873 has been marked as a duplicate of this bug. *** ------- Comment From kumarr@linux.ibm.com 2010-03-02 15:38 EDT------- (In reply to comment #4) > I am available to test the kernel listed in comment #8 but it's not clear to me > from the bug which Broadcom adapter the problem was discovered on. Does it only > appear on the IBM Ghidorah? > > Peter Peter, Can you please verify this fix? Thanks ------- Comment From coschult@us.ibm.com 2010-03-03 17:07 EDT------- What kind of test is the nfs test in Tier1? Would fstress be a sufficient test for verifying this fix? With regard to comment #21, running fstress would provide a level of sanity checking for the fix. I would appreciate knowing the results from its test run. ------- Comment From coschult@us.ibm.com 2010-03-05 18:52 EDT------- I ran bonnie++ (instead of fstress) for several hours with no problems. It looks like the fix is good. This is the command I used: bonnie++ -d /test_dir/bonnie -s 10000 -n 5 -x 10 -u corinna -b -r 5000 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html |