Bug 85706

Summary: FreeBSD + i82559= 25% packet loss
Product: [Retired] eCos Reporter: Andrew Lunn <andrew.lunn>
Component: Ethernet driversAssignee: Gary Thomas <gary>
Status: CLOSED CURRENTRELEASE QA Contact: Jonathan Larmour <jifl-bugzilla>
Severity: medium Docs Contact:
Priority: medium    
Version: CVS   
Target Milestone: ---   
Target Release: ---   
Hardware: strongarm   
OS: Linux   
URL: http://sources.redhat.com/ml/ecos-discuss/2003-03/msg00018.html
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-03-15 13:46:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andrew Lunn 2003-03-06 10:15:47 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021130

Description of problem:
Hardware: AFE1
Software: AnonCVS except the HAL for the AFE1 is from 1.5.2 since AFE1 is not in
anonCVS.

When testing the snmp code on the freeBSD stack i found that on TX it appears to
be around 25% packet loss. 

The problem occurs when loading and running the image using gdb over
Ethernet. If i load and run the same image using RedBoot:load command,
it works correctly. So its something to do with redboots stack
interacting with FreeBSD down in the ethernet driver.

I've also run the application with CYGDBG_IO_ETH_DRIVER_DEBUG enabled
and RedBoot force_console set. With this setup i get the IO ETH
drivers debug output on the serial port. This shows that TXs are
getting to this layer in the stack, but are not making it all the way
to the wire. 

Next i turned on INFRA_DEBUG. I should of done this earlier! There is
an assert on the first RX! In ResetRxRing(), the assertion 

          CYG_ASSERT( HAL_LE32TOC(link) == VIRT_TO_BUS(p_rfd), 
                      "rfd linked list broken" );

is happening. Its the 7th descriptor in the chain that is wrong.
Using __builtin_return_address i found out that ResetRxRing() is being
called from PacketRxReady(). For some reason the receive unit has
stopped.

At the moment i don't understand why im seeing problems with the recieve side
when its TX packets that are missing. Its probably some general corruption of
the queues. 

I also don't see why its FreeBSD specific. I'm guessing there is an generic bug,
but somehow only the FreeBSD stack causes it to happen. 

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Build snmpping
2.Use gdb over ethernet to run the image
3.
    

Actual Results:  Around 1 in 4 pings to the server timeout.

Expected Results:  All the pings to the server get responces.

Comment 1 Andrew Lunn 2003-03-14 09:20:10 UTC
Index: current/ChangeLog
===================================================================
RCS file: /cvs/ecos/ecos-opt/net/net/bsd_tcpip/current/ChangeLog,v
retrieving revision 1.17
diff -u -r1.17 ChangeLog
--- current/ChangeLog   24 Feb 2003 14:29:17 -0000      1.17
+++ current/ChangeLog   14 Mar 2003 09:12:02 -0000
@@ -1,3 +1,10 @@
+2003-03-14  Andrew Lunn  <andrew.lunn>
+
+       * src/sys/net/if.c (if_attach): Removed printf which causes the
+       ethernet device to become corrupt. At this point the app driver
+       has started but not completed taking over from the redboot
+       driver. It is unsafe for redboot to use the ethernet device.
+
 2003-02-24  Jonathan Larmour  <jifl>
 
        * cdl/freebsd_net.cdl: Fix doc link.
Index: current/src/sys/net/if.c
===================================================================
RCS file: /cvs/ecos/ecos-opt/net/net/bsd_tcpip/current/src/sys/net/if.c,v
retrieving revision 1.2
diff -u -r1.2 if.c
--- current/src/sys/net/if.c    4 Nov 2002 20:23:25 -0000       1.2
+++ current/src/sys/net/if.c    14 Mar 2003 09:12:02 -0000
@@ -194,8 +194,6 @@
        }
 
         if (ifp->if_snd.ifq_maxlen == 0) {
-            printf("%s%d XXX: driver didn't set ifq_maxlen\n",
-                   ifp->if_name, ifp->if_unit);
             ifp->if_snd.ifq_maxlen = ifqmaxlen;
         }
 

This fixes the problem.

At the point this printf is made, the apps instance of the driver has setup the
i82559 with its buffers. The redboot i82559 driver no longer has control over
the device. But, redboot has not been told this yet. The VV call has still to be
made. Thus this printf invokes the redboot instance of the i82559 driver. It
uses one of its buffers and so corrupts the apps ring of buffers.

This is a generic problem and not limited to just the i82559. Any ethernet
device with 'complex' buffer management is likely to be corrupted.