85706 – FreeBSD + i82559= 25% packet loss

Bug 85706 - FreeBSD + i82559= 25% packet loss

Summary: FreeBSD + i82559= 25% packet loss

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	eCos
Classification:	Retired
Component:	Ethernet drivers
Sub Component:
Version:	CVS
Hardware:	strongarm
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Gary Thomas
QA Contact:	Jonathan Larmour
Docs Contact:
URL:	http://sources.redhat.com/ml/ecos-dis...
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-03-06 10:15 UTC by Andrew Lunn
Modified:	2007-04-18 16:51 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2003-03-15 13:46:25 UTC
Embargoed:

Attachments	(Terms of Use)

Description Andrew Lunn 2003-03-06 10:15:47 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021130

Description of problem:
Hardware: AFE1
Software: AnonCVS except the HAL for the AFE1 is from 1.5.2 since AFE1 is not in
anonCVS.

When testing the snmp code on the freeBSD stack i found that on TX it appears to
be around 25% packet loss. 

The problem occurs when loading and running the image using gdb over
Ethernet. If i load and run the same image using RedBoot:load command,
it works correctly. So its something to do with redboots stack
interacting with FreeBSD down in the ethernet driver.

I've also run the application with CYGDBG_IO_ETH_DRIVER_DEBUG enabled
and RedBoot force_console set. With this setup i get the IO ETH
drivers debug output on the serial port. This shows that TXs are
getting to this layer in the stack, but are not making it all the way
to the wire. 

Next i turned on INFRA_DEBUG. I should of done this earlier! There is
an assert on the first RX! In ResetRxRing(), the assertion 

          CYG_ASSERT( HAL_LE32TOC(link) == VIRT_TO_BUS(p_rfd), 
                      "rfd linked list broken" );

is happening. Its the 7th descriptor in the chain that is wrong.
Using __builtin_return_address i found out that ResetRxRing() is being
called from PacketRxReady(). For some reason the receive unit has
stopped.

At the moment i don't understand why im seeing problems with the recieve side
when its TX packets that are missing. Its probably some general corruption of
the queues. 

I also don't see why its FreeBSD specific. I'm guessing there is an generic bug,
but somehow only the FreeBSD stack causes it to happen. 

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Build snmpping
2.Use gdb over ethernet to run the image
3.
    

Actual Results:  Around 1 in 4 pings to the server timeout.

Expected Results:  All the pings to the server get responces.

Comment 1 Andrew Lunn 2003-03-14 09:20:10 UTC

Index: current/ChangeLog
===================================================================
RCS file: /cvs/ecos/ecos-opt/net/net/bsd_tcpip/current/ChangeLog,v
retrieving revision 1.17
diff -u -r1.17 ChangeLog
--- current/ChangeLog   24 Feb 2003 14:29:17 -0000      1.17
+++ current/ChangeLog   14 Mar 2003 09:12:02 -0000
@@ -1,3 +1,10 @@
+2003-03-14  Andrew Lunn  <andrew.lunn>
+
+       * src/sys/net/if.c (if_attach): Removed printf which causes the
+       ethernet device to become corrupt. At this point the app driver
+       has started but not completed taking over from the redboot
+       driver. It is unsafe for redboot to use the ethernet device.
+
 2003-02-24  Jonathan Larmour  <jifl>
 
        * cdl/freebsd_net.cdl: Fix doc link.
Index: current/src/sys/net/if.c
===================================================================
RCS file: /cvs/ecos/ecos-opt/net/net/bsd_tcpip/current/src/sys/net/if.c,v
retrieving revision 1.2
diff -u -r1.2 if.c
--- current/src/sys/net/if.c    4 Nov 2002 20:23:25 -0000       1.2
+++ current/src/sys/net/if.c    14 Mar 2003 09:12:02 -0000
@@ -194,8 +194,6 @@
        }
 
         if (ifp->if_snd.ifq_maxlen == 0) {
-            printf("%s%d XXX: driver didn't set ifq_maxlen\n",
-                   ifp->if_name, ifp->if_unit);
             ifp->if_snd.ifq_maxlen = ifqmaxlen;
         }
 

This fixes the problem.

At the point this printf is made, the apps instance of the driver has setup the
i82559 with its buffers. The redboot i82559 driver no longer has control over
the device. But, redboot has not been told this yet. The VV call has still to be
made. Thus this printf invokes the redboot instance of the i82559 driver. It
uses one of its buffers and so corrupts the apps ring of buffers.

This is a generic problem and not limited to just the i82559. Any ethernet
device with 'complex' buffer management is likely to be corrupted.

Note You need to log in before you can comment on or make changes to this bug.