Created attachment 520275 [details] Backport of upstream patch. Description of problem: Backport the upstream commit: f31e50a802baae939c49819b8acd8f077019d398 to the realtime kernel to smash the previously solved bug: be2net: fix tx completion polling In tx/mcc polling, napi_complete() is being incorrectly called before reaping tx completions. This can cause tx compl processing to be scheduled on another cpu concurrently which can result in a panic. This if fixed by calling napi complete() after tx/mcc compl processing but before re-enabling interrupts (via a cq notify). Version-Release number of selected component (if applicable): 2.6.33.9-rt31.64.el5rt How reproducible: Easily. Steps to Reproduce: 1. Install MRG 1.3 on a HW certified hardware ( https://hardware.redhat.com/show.cgi?id=691965 ) 2. Run iperf to test network bandwidth to a local switch 3. Wait. Actual results: As the iperf tests ramp up the machine will panic, the actual panic is a mess and looks to be a panic output from two threads. bad: scheduling from the idle thread!, PID: 34 TASK: ffff880c089c0040 CPU: 2 COMMAND: "sirq-net-rx/2" #0 [ffff880c089c3a00] machine_kexec at ffffffff8101dccc #1 [ffff880c089c3a80] crash_kexec at ffffffff8107c4df #2 [ffff880c089c3b50] oops_end at ffffffff81357f03 #3 [ffff880c089c3b80] die at ffffffff810064c5 #4 [ffff880c089c3bb0] do_trap at ffffffff81357823 #5 [ffff880c089c3c00] do_invalid_op at ffffffff81004439 #6 [ffff880c089c3ca0] invalid_op at ffffffff81003915 [exception RIP: be_tx_compl_process+66] RIP: ffffffffa01fd494 RSP: ffff880c089c3d50 RFLAGS: 00010246 RAX: ffff880601501038 RBX: ffff880601500908 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 00000000000000df RDI: ffff8806015006c0 RBP: ffff880c089c3d90 R8: ffff880c089c2000 R9: dead000000200200 R10: dead000000100100 R11: ffff880c06adbd60 R12: ffff8806015006c0 R13: ffff8806015006c0 R14: ffff8806015000dd R15: ffff880601500908 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff880c089c3d98] be_process_tx at ffffffffa01fdc01 [be2net] #8 [ffff880c089c3dd8] be_poll_tx_mcc at ffffffffa01fdcb8 [be2net] #9 [ffff880c089c3df8] net_rx_action at ffffffff812beaf6 #10 [ffff880c089c3e48] run_ksoftirqd at ffffffff81048bdc #11 [ffff880c089c3eb8] kthread at ffffffff8105db79 #12 [ffff880c089c3f48] kernel_thread_helper at ffffffff81003a94 Expected results: No panic under load. Additional info: Patch to be attached. More messages/questions in private post.
Verified by code review. Found the following commits applied to kernel-rt-2.6.33.9-rt31.67 src rpm. $ ~/MRG-RT-tools/check_commit_presence ~/rpmbuild/BUILD/kernel-rt-2.6.33.9-rt31.67/linux-2.6.33.9.x86_64 34cfbb350a6b74ba500c2cce090202a9d2f22e9e c671541682a5c33e6f1bd8c1d9b53041c2bb90f7 Reverting 34cfbb350a6b74ba500c2cce090202a9d2f22e9e (v2.6.33.9-rt31-mrg66_hotfix^2) ... Applied Reverting c671541682a5c33e6f1bd8c1d9b53041c2bb90f7 (v2.6.33.9-rt31-mrg66_hotfix^2~1) ... Applied Restoring .. Done 2 patch(es) was found applied. 34cfbb350a6b74ba500c2cce090202a9d2f22e9e is backported from upstream f31e50a802baae939c49819b8acd8f077019d398 c671541682a5c33e6f1bd8c1d9b53041c2bb90f7 is backported from upstream 7a1e9b2059d147461cff3dfbabbfb43f296a1eef -> VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1370.html