Description of problem: This is to track the additional issue with the fix, Bug 477945 - Kernel Panic with Bnx2 - Badness in local_bh_enable at kernel/softirq.c:141 I have seen consistently packets loss while running "echo t >/proc/sysrq-trigger" in a loop. From the affected machine's serial console, # while :; do echo t >/proc/sysrq-trigger; done From another host, $ ping hp-dl785g5-01.rhts.bos.redhat.com ... I have seen lots of packets loss here. It likely happens on machines using bnx2 driver. hp-dl785g5-01.rhts.bos.redhat.com dell-pe1950-01.rhts.bos.redhat.com dell-pe1950-01.rhts.englab.brq.redhat.com Version-Release number of selected component (if applicable): kernel-2.6.9-78.23.EL + patch from, https://bugzilla.redhat.com/show_bug.cgi?id=477945#c11 How reproducible: always Steps to Reproduce: 1. reserve one of the affected machines. 2. while :; do echo t >/proc/sysrq-trigger; done 3. From another host, $ ping <the affected machine> Actual results: packets loss Expected results: no packet loss
This isn't a bug, you're exercizing the pessimal case of netpoll. In the prior bug that you mention, we found a problem wherein there was access to shared data from multiple contexts causing a panic. The fix for that was to enforce the needed mutual exclusion between those contexts. Since one of the contexts was the nominal receive fast path (net_rx_action), netpoll now (correctly) blocks receive operations while calling the poll_controller/poll methods of a driver. doing this puts us at risk for frame loss. By sending multiple sysrq-t's, you effectively create multiple windows of time where we can't rx frames, leading to overflow and frame drops. This is working as it should.