483445 – Packets Loss with Netdump

Bug 483445 - Packets Loss with Netdump

Summary: Packets Loss with Netdump

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.8
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Neil Horman
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-02-01 12:28 UTC by Qian Cai
Modified:	2009-02-01 19:09 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-02-01 19:09:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Qian Cai 2009-02-01 12:28:30 UTC

Description of problem:
This is to track the additional issue with the fix,

Bug 477945 - Kernel Panic with Bnx2 - Badness in local_bh_enable at kernel/softirq.c:141

I have seen consistently packets loss while running "echo t >/proc/sysrq-trigger" in a loop.

From the affected machine's serial console,
# while :; do echo t >/proc/sysrq-trigger; done

From another host,
$ ping hp-dl785g5-01.rhts.bos.redhat.com
...

I have seen lots of packets loss here.

It likely happens on machines using bnx2 driver.

hp-dl785g5-01.rhts.bos.redhat.com
dell-pe1950-01.rhts.bos.redhat.com
dell-pe1950-01.rhts.englab.brq.redhat.com

Version-Release number of selected component (if applicable):
kernel-2.6.9-78.23.EL + patch from,

https://bugzilla.redhat.com/show_bug.cgi?id=477945#c11

How reproducible:
always

Steps to Reproduce:
1. reserve one of the affected machines.
2. while :; do echo t >/proc/sysrq-trigger; done
3. From another host,
$ ping <the affected machine>
  
Actual results:
packets loss

Expected results:
no packet loss

Comment 1 Neil Horman 2009-02-01 19:09:23 UTC

This isn't a bug, you're exercizing the pessimal case of netpoll.  In the prior bug that you mention, we found a problem wherein there was access to shared data from multiple contexts causing a panic.  The fix for that was to enforce the needed mutual exclusion between those contexts.  Since one of the contexts was the nominal receive fast path (net_rx_action), netpoll now (correctly) blocks receive operations while calling the poll_controller/poll methods of a driver.  doing this puts us at risk for frame loss.  By sending multiple sysrq-t's, you effectively create multiple windows of time where we can't rx frames, leading to overflow and frame drops.  This is working as it should.

Note You need to log in before you can comment on or make changes to this bug.