Description of problem:
SMB connections from a Windows 2000 Server box to a samba server on phoebe3 beta
hangs about every 10 minutes or so. Playing around with ethereal i found that
NBNS (NetBIOS Name Service) UDP packets gets lost on their to the nmbd process.
Reconfiguring the ethernet interface with '/sbin/ifconfig eth0 down;
/sbin/ifconfig eth0 up' fixes the problem with the disappearing UDP packets and
after a few seconds the SMB mount starts working again.
Version-Release number of selected component (if applicable):
kernel-2.4.20-2.54 (from rawahide, i686)
[root@ulysses noa]# grep Ethernet /proc/pci
Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone] (rev 48).
The client: Windows 2000 Server SP3 with all Hotfixes applied
Havent seen it on any other networks, but here it is totally reproducible. I'm
willing to put the needed time in trying out test setups and stuff to nail this
Steps to Reproduce:
1. The simplest test case is doing an explicit name table lookup for the samba
server using the windows commandline util nbtstat. "nbtstat -a ulysses" should
give a table of names that has different properties. Looking at the lookup from
ethereal on the server gives the following pattern:
6.803884 220.127.116.11 -> 18.104.22.168 NBNS Name query NBSTAT ULYSSES<00>
6.804272 22.214.171.124 -> 126.96.36.199 NBNS Name query response NBSTAT
Doing "strace -p `/sbin/pidof nmbd` -eselect" shows that the select(2) returns
when the UDP packet arrives.
2. Mount a volume from the samba server serving some music, and listen to it.
When the music abruptly stops the server is in "fail mode"
3. Trying "nbtstat -a ulysses" again produces 3 UDP packet in ethereal on the
samba server but no answers, like this:
1.551568 188.8.131.52 -> 184.108.40.206 NBNS Name query NBSTAT ULYSSES<00>
3.050753 220.127.116.11 -> 18.104.22.168 NBNS Name query NBSTAT ULYSSES<00>
4.550720 22.214.171.124 -> 126.96.36.199 NBNS Name query NBSTAT ULYSSES<00>
The really interesting part though is the strace line from above. The select(2)
call doesnt doesnt return when the packets hit the machine.
4. To restore UDP delivery to the nmb process again, wihtout waiting for about
10 minutes is most easily done by running '/sbin/ifconfig eth0 down;
/sbin/ifconfig eth0 up'
- No firewall is configured, /sbin/iptables-save returns without printing anything.
I think you were a little to quick classing this as a samba problem, as I can
trivially reproduce it with netcat also.
When the server is in failure mode, if i shut down samba and instead run 'nc -u
-l -p 137' I don't get any traffic from the network. Restarting the interface
makes the UDP packets get sent to netcat (and displayed at chunks of strange ascii).
If this is indeed not a kernel bug I would be most interested in finding out
what valid reasons the kernel has for not forwarding incoming udp packets to
userspace, or at least point me in a direction where I can RTFM a bit :)
Is the box paticularly loaded when packets are being dropped? One of the other
develpers here wolud like to know what "cat /proc/net/snmp" shows when it's
dropping packets. If the kernel is dropping packets, that'll show why.
I'd also suggest trying a different (kind of) nic in the machine. If it only
fails when you're using the 3c905B, it'll be easier for me to say "this is a
After some more hours of debugging I found the problem. It turns out that my ISP
has the habit of spaming my network with ARP responses. This creates a race
condition between the samba server and the routers. When the routers win they
get the oppurtunity to block certain udp packets.
This was hard to track down because the routers only drops certain classes of
A piece of advice to anyone tracking down similiar problems in the future is to
have a quick look at the Ethernet headers for the packets that seem to disappear.