From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021130 Description of problem: Hardware: AFE1 Software: AnonCVS except the HAL for the AFE1 is from 1.5.2 since AFE1 is not in anonCVS. When testing the snmp code on the freeBSD stack i found that on TX it appears to be around 25% packet loss. The problem occurs when loading and running the image using gdb over Ethernet. If i load and run the same image using RedBoot:load command, it works correctly. So its something to do with redboots stack interacting with FreeBSD down in the ethernet driver. I've also run the application with CYGDBG_IO_ETH_DRIVER_DEBUG enabled and RedBoot force_console set. With this setup i get the IO ETH drivers debug output on the serial port. This shows that TXs are getting to this layer in the stack, but are not making it all the way to the wire. Next i turned on INFRA_DEBUG. I should of done this earlier! There is an assert on the first RX! In ResetRxRing(), the assertion CYG_ASSERT( HAL_LE32TOC(link) == VIRT_TO_BUS(p_rfd), "rfd linked list broken" ); is happening. Its the 7th descriptor in the chain that is wrong. Using __builtin_return_address i found out that ResetRxRing() is being called from PacketRxReady(). For some reason the receive unit has stopped. At the moment i don't understand why im seeing problems with the recieve side when its TX packets that are missing. Its probably some general corruption of the queues. I also don't see why its FreeBSD specific. I'm guessing there is an generic bug, but somehow only the FreeBSD stack causes it to happen. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.Build snmpping 2.Use gdb over ethernet to run the image 3. Actual Results: Around 1 in 4 pings to the server timeout. Expected Results: All the pings to the server get responces.
Index: current/ChangeLog =================================================================== RCS file: /cvs/ecos/ecos-opt/net/net/bsd_tcpip/current/ChangeLog,v retrieving revision 1.17 diff -u -r1.17 ChangeLog --- current/ChangeLog 24 Feb 2003 14:29:17 -0000 1.17 +++ current/ChangeLog 14 Mar 2003 09:12:02 -0000 @@ -1,3 +1,10 @@ +2003-03-14 Andrew Lunn <andrew.lunn> + + * src/sys/net/if.c (if_attach): Removed printf which causes the + ethernet device to become corrupt. At this point the app driver + has started but not completed taking over from the redboot + driver. It is unsafe for redboot to use the ethernet device. + 2003-02-24 Jonathan Larmour <jifl> * cdl/freebsd_net.cdl: Fix doc link. Index: current/src/sys/net/if.c =================================================================== RCS file: /cvs/ecos/ecos-opt/net/net/bsd_tcpip/current/src/sys/net/if.c,v retrieving revision 1.2 diff -u -r1.2 if.c --- current/src/sys/net/if.c 4 Nov 2002 20:23:25 -0000 1.2 +++ current/src/sys/net/if.c 14 Mar 2003 09:12:02 -0000 @@ -194,8 +194,6 @@ } if (ifp->if_snd.ifq_maxlen == 0) { - printf("%s%d XXX: driver didn't set ifq_maxlen\n", - ifp->if_name, ifp->if_unit); ifp->if_snd.ifq_maxlen = ifqmaxlen; } This fixes the problem. At the point this printf is made, the apps instance of the driver has setup the i82559 with its buffers. The redboot i82559 driver no longer has control over the device. But, redboot has not been told this yet. The VV call has still to be made. Thus this printf invokes the redboot instance of the i82559 driver. It uses one of its buffers and so corrupts the apps ring of buffers. This is a generic problem and not limited to just the i82559. Any ethernet device with 'complex' buffer management is likely to be corrupted.