Bug 483445 - Packets Loss with Netdump
Packets Loss with Netdump
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.8
All Linux
low Severity medium
: rc
: ---
Assigned To: Neil Horman
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-02-01 07:28 EST by CAI Qian
Modified: 2009-02-01 14:09 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-02-01 14:09:23 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description CAI Qian 2009-02-01 07:28:30 EST
Description of problem:
This is to track the additional issue with the fix,

Bug 477945 - Kernel Panic with Bnx2 - Badness in local_bh_enable at kernel/softirq.c:141

I have seen consistently packets loss while running "echo t >/proc/sysrq-trigger" in a loop.

From the affected machine's serial console,
# while :; do echo t >/proc/sysrq-trigger; done

From another host,
$ ping hp-dl785g5-01.rhts.bos.redhat.com
...

I have seen lots of packets loss here.

It likely happens on machines using bnx2 driver.

hp-dl785g5-01.rhts.bos.redhat.com
dell-pe1950-01.rhts.bos.redhat.com
dell-pe1950-01.rhts.englab.brq.redhat.com

Version-Release number of selected component (if applicable):
kernel-2.6.9-78.23.EL + patch from,

https://bugzilla.redhat.com/show_bug.cgi?id=477945#c11

How reproducible:
always

Steps to Reproduce:
1. reserve one of the affected machines.
2. while :; do echo t >/proc/sysrq-trigger; done
3. From another host,
$ ping <the affected machine>
  
Actual results:
packets loss

Expected results:
no packet loss
Comment 1 Neil Horman 2009-02-01 14:09:23 EST
This isn't a bug, you're exercizing the pessimal case of netpoll.  In the prior bug that you mention, we found a problem wherein there was access to shared data from multiple contexts causing a panic.  The fix for that was to enforce the needed mutual exclusion between those contexts.  Since one of the contexts was the nominal receive fast path (net_rx_action), netpoll now (correctly) blocks receive operations while calling the poll_controller/poll methods of a driver.  doing this puts us at risk for frame loss.  By sending multiple sysrq-t's, you effectively create multiple windows of time where we can't rx frames, leading to overflow and frame drops.  This is working as it should.

Note You need to log in before you can comment on or make changes to this bug.