Bug 733976

Summary: Fix be2net tx competition polling for MRG 1.3 RT kernel.
Product: Red Hat Enterprise MRG Reporter: Wade Mealing <wmealing>
Component: realtime-kernelAssignee: John Kacur <jkacur>
Status: CLOSED ERRATA QA Contact: David Sommerseth <davids>
Severity: high Docs Contact:
Priority: urgent    
Version: 1.3CC: bhu, bugzilla-redhat, cww, jkacur, jkastner, jwest, lgoncalv, ovasik, rdassen
Target Milestone: 1.3.5   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-10-14 02:20:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Backport of upstream patch. none

Description Wade Mealing 2011-08-29 02:43:29 UTC
Created attachment 520275 [details]
Backport of upstream patch.

Description of problem:

Backport the upstream commit: f31e50a802baae939c49819b8acd8f077019d398 to the realtime kernel to smash the previously solved bug:

be2net: fix tx completion polling

In tx/mcc polling, napi_complete() is being incorrectly called
before reaping tx completions. This can cause tx compl processing
to be scheduled on another cpu concurrently which can result in a panic.
This if fixed by calling napi complete() after tx/mcc compl processing
but before re-enabling interrupts (via a cq notify).

Version-Release number of selected component (if applicable):

2.6.33.9-rt31.64.el5rt


How reproducible:

Easily.


Steps to Reproduce:
1. Install MRG 1.3 on a HW certified hardware ( https://hardware.redhat.com/show.cgi?id=691965 )
2. Run iperf to test network bandwidth to a local switch
3. Wait.
  
Actual results:

As the iperf tests ramp up the machine will panic, the actual panic is a mess and looks to be a panic output from two threads.

bad: scheduling from the idle thread!, 

PID: 34     TASK: ffff880c089c0040  CPU: 2   COMMAND: "sirq-net-rx/2"
 #0 [ffff880c089c3a00] machine_kexec at ffffffff8101dccc
 #1 [ffff880c089c3a80] crash_kexec at ffffffff8107c4df
 #2 [ffff880c089c3b50] oops_end at ffffffff81357f03
 #3 [ffff880c089c3b80] die at ffffffff810064c5
 #4 [ffff880c089c3bb0] do_trap at ffffffff81357823
 #5 [ffff880c089c3c00] do_invalid_op at ffffffff81004439
 #6 [ffff880c089c3ca0] invalid_op at ffffffff81003915
    [exception RIP: be_tx_compl_process+66]
    RIP: ffffffffa01fd494  RSP: ffff880c089c3d50  RFLAGS: 00010246
    RAX: ffff880601501038  RBX: ffff880601500908  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 00000000000000df  RDI: ffff8806015006c0
    RBP: ffff880c089c3d90   R8: ffff880c089c2000   R9: dead000000200200
    R10: dead000000100100  R11: ffff880c06adbd60  R12: ffff8806015006c0
    R13: ffff8806015006c0  R14: ffff8806015000dd  R15: ffff880601500908
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff880c089c3d98] be_process_tx at ffffffffa01fdc01 [be2net]
 #8 [ffff880c089c3dd8] be_poll_tx_mcc at ffffffffa01fdcb8 [be2net]
 #9 [ffff880c089c3df8] net_rx_action at ffffffff812beaf6
#10 [ffff880c089c3e48] run_ksoftirqd at ffffffff81048bdc
#11 [ffff880c089c3eb8] kthread at ffffffff8105db79
#12 [ffff880c089c3f48] kernel_thread_helper at ffffffff81003a94


Expected results:

No panic under load.

Additional info:

Patch to be attached.  More messages/questions in private post.

Comment 9 David Sommerseth 2011-10-12 15:39:18 UTC
Verified by code review.  Found the following commits applied to kernel-rt-2.6.33.9-rt31.67 src rpm.

$ ~/MRG-RT-tools/check_commit_presence ~/rpmbuild/BUILD/kernel-rt-2.6.33.9-rt31.67/linux-2.6.33.9.x86_64 34cfbb350a6b74ba500c2cce090202a9d2f22e9e c671541682a5c33e6f1bd8c1d9b53041c2bb90f7
Reverting 34cfbb350a6b74ba500c2cce090202a9d2f22e9e (v2.6.33.9-rt31-mrg66_hotfix^2) ... Applied
Reverting c671541682a5c33e6f1bd8c1d9b53041c2bb90f7 (v2.6.33.9-rt31-mrg66_hotfix^2~1) ... Applied
Restoring .. Done
2 patch(es) was found applied.


34cfbb350a6b74ba500c2cce090202a9d2f22e9e is backported from upstream f31e50a802baae939c49819b8acd8f077019d398

c671541682a5c33e6f1bd8c1d9b53041c2bb90f7 is backported from upstream 7a1e9b2059d147461cff3dfbabbfb43f296a1eef


-> VERIFIED

Comment 10 errata-xmlrpc 2011-10-14 02:20:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1370.html