Bug 733976 - Fix be2net tx competition polling for MRG 1.3 RT kernel.
Summary: Fix be2net tx competition polling for MRG 1.3 RT kernel.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel
Version: 1.3
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: 1.3.5
: ---
Assignee: John Kacur
QA Contact: David Sommerseth
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-08-29 02:43 UTC by Wade Mealing
Modified: 2018-11-26 18:47 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-10-14 02:20:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Backport of upstream patch. (5.21 KB, patch)
2011-08-29 02:43 UTC, Wade Mealing
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 660389 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Product Errata RHBA-2011:1370 0 normal SHIPPED_LIVE Red Hat Enterprise MRG 1.3 Realtime bug fix update 2011-10-14 02:20:01 UTC

Internal Links: 660389

Description Wade Mealing 2011-08-29 02:43:29 UTC
Created attachment 520275 [details]
Backport of upstream patch.

Description of problem:

Backport the upstream commit: f31e50a802baae939c49819b8acd8f077019d398 to the realtime kernel to smash the previously solved bug:

be2net: fix tx completion polling

In tx/mcc polling, napi_complete() is being incorrectly called
before reaping tx completions. This can cause tx compl processing
to be scheduled on another cpu concurrently which can result in a panic.
This if fixed by calling napi complete() after tx/mcc compl processing
but before re-enabling interrupts (via a cq notify).

Version-Release number of selected component (if applicable):

2.6.33.9-rt31.64.el5rt


How reproducible:

Easily.


Steps to Reproduce:
1. Install MRG 1.3 on a HW certified hardware ( https://hardware.redhat.com/show.cgi?id=691965 )
2. Run iperf to test network bandwidth to a local switch
3. Wait.
  
Actual results:

As the iperf tests ramp up the machine will panic, the actual panic is a mess and looks to be a panic output from two threads.

bad: scheduling from the idle thread!, 

PID: 34     TASK: ffff880c089c0040  CPU: 2   COMMAND: "sirq-net-rx/2"
 #0 [ffff880c089c3a00] machine_kexec at ffffffff8101dccc
 #1 [ffff880c089c3a80] crash_kexec at ffffffff8107c4df
 #2 [ffff880c089c3b50] oops_end at ffffffff81357f03
 #3 [ffff880c089c3b80] die at ffffffff810064c5
 #4 [ffff880c089c3bb0] do_trap at ffffffff81357823
 #5 [ffff880c089c3c00] do_invalid_op at ffffffff81004439
 #6 [ffff880c089c3ca0] invalid_op at ffffffff81003915
    [exception RIP: be_tx_compl_process+66]
    RIP: ffffffffa01fd494  RSP: ffff880c089c3d50  RFLAGS: 00010246
    RAX: ffff880601501038  RBX: ffff880601500908  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 00000000000000df  RDI: ffff8806015006c0
    RBP: ffff880c089c3d90   R8: ffff880c089c2000   R9: dead000000200200
    R10: dead000000100100  R11: ffff880c06adbd60  R12: ffff8806015006c0
    R13: ffff8806015006c0  R14: ffff8806015000dd  R15: ffff880601500908
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff880c089c3d98] be_process_tx at ffffffffa01fdc01 [be2net]
 #8 [ffff880c089c3dd8] be_poll_tx_mcc at ffffffffa01fdcb8 [be2net]
 #9 [ffff880c089c3df8] net_rx_action at ffffffff812beaf6
#10 [ffff880c089c3e48] run_ksoftirqd at ffffffff81048bdc
#11 [ffff880c089c3eb8] kthread at ffffffff8105db79
#12 [ffff880c089c3f48] kernel_thread_helper at ffffffff81003a94


Expected results:

No panic under load.

Additional info:

Patch to be attached.  More messages/questions in private post.

Comment 9 David Sommerseth 2011-10-12 15:39:18 UTC
Verified by code review.  Found the following commits applied to kernel-rt-2.6.33.9-rt31.67 src rpm.

$ ~/MRG-RT-tools/check_commit_presence ~/rpmbuild/BUILD/kernel-rt-2.6.33.9-rt31.67/linux-2.6.33.9.x86_64 34cfbb350a6b74ba500c2cce090202a9d2f22e9e c671541682a5c33e6f1bd8c1d9b53041c2bb90f7
Reverting 34cfbb350a6b74ba500c2cce090202a9d2f22e9e (v2.6.33.9-rt31-mrg66_hotfix^2) ... Applied
Reverting c671541682a5c33e6f1bd8c1d9b53041c2bb90f7 (v2.6.33.9-rt31-mrg66_hotfix^2~1) ... Applied
Restoring .. Done
2 patch(es) was found applied.


34cfbb350a6b74ba500c2cce090202a9d2f22e9e is backported from upstream f31e50a802baae939c49819b8acd8f077019d398

c671541682a5c33e6f1bd8c1d9b53041c2bb90f7 is backported from upstream 7a1e9b2059d147461cff3dfbabbfb43f296a1eef


-> VERIFIED

Comment 10 errata-xmlrpc 2011-10-14 02:20:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1370.html


Note You need to log in before you can comment on or make changes to this bug.