Description of problem: SMB connections from a Windows 2000 Server box to a samba server on phoebe3 beta hangs about every 10 minutes or so. Playing around with ethereal i found that NBNS (NetBIOS Name Service) UDP packets gets lost on their to the nmbd process. Reconfiguring the ethernet interface with '/sbin/ifconfig eth0 down; /sbin/ifconfig eth0 up' fixes the problem with the disappearing UDP packets and after a few seconds the SMB mount starts working again. Version-Release number of selected component (if applicable): kernel-2.4.20-2.54 (from rawahide, i686) samba-2.2.7a-5 [root@ulysses noa]# grep Ethernet /proc/pci Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone] (rev 48). The client: Windows 2000 Server SP3 with all Hotfixes applied How reproducible: Havent seen it on any other networks, but here it is totally reproducible. I'm willing to put the needed time in trying out test setups and stuff to nail this one. Steps to Reproduce: 1. The simplest test case is doing an explicit name table lookup for the samba server using the windows commandline util nbtstat. "nbtstat -a ulysses" should give a table of names that has different properties. Looking at the lookup from ethereal on the server gives the following pattern: 6.803884 213.114.26.12 -> 213.114.26.96 NBNS Name query NBSTAT ULYSSES<00> 6.804272 213.114.26.96 -> 213.114.26.12 NBNS Name query response NBSTAT Doing "strace -p `/sbin/pidof nmbd` -eselect" shows that the select(2) returns when the UDP packet arrives. 2. Mount a volume from the samba server serving some music, and listen to it. When the music abruptly stops the server is in "fail mode" 3. Trying "nbtstat -a ulysses" again produces 3 UDP packet in ethereal on the samba server but no answers, like this: 1.551568 213.114.26.12 -> 213.114.26.96 NBNS Name query NBSTAT ULYSSES<00> 3.050753 213.114.26.12 -> 213.114.26.96 NBNS Name query NBSTAT ULYSSES<00> 4.550720 213.114.26.12 -> 213.114.26.96 NBNS Name query NBSTAT ULYSSES<00> The really interesting part though is the strace line from above. The select(2) call doesnt doesnt return when the packets hit the machine. 4. To restore UDP delivery to the nmb process again, wihtout waiting for about 10 minutes is most easily done by running '/sbin/ifconfig eth0 down; /sbin/ifconfig eth0 up' Additional info: - No firewall is configured, /sbin/iptables-save returns without printing anything.
I think you were a little to quick classing this as a samba problem, as I can trivially reproduce it with netcat also. When the server is in failure mode, if i shut down samba and instead run 'nc -u -l -p 137' I don't get any traffic from the network. Restarting the interface makes the UDP packets get sent to netcat (and displayed at chunks of strange ascii). If this is indeed not a kernel bug I would be most interested in finding out what valid reasons the kernel has for not forwarding incoming udp packets to userspace, or at least point me in a direction where I can RTFM a bit :)
Is the box paticularly loaded when packets are being dropped? One of the other develpers here wolud like to know what "cat /proc/net/snmp" shows when it's dropping packets. If the kernel is dropping packets, that'll show why. I'd also suggest trying a different (kind of) nic in the machine. If it only fails when you're using the 3c905B, it'll be easier for me to say "this is a kernel bug".
After some more hours of debugging I found the problem. It turns out that my ISP has the habit of spaming my network with ARP responses. This creates a race condition between the samba server and the routers. When the routers win they get the oppurtunity to block certain udp packets. This was hard to track down because the routers only drops certain classes of packets. A piece of advice to anyone tracking down similiar problems in the future is to have a quick look at the Ethernet headers for the packets that seem to disappear.