We have a Machine running Red Hat Linux 6.1, it's primary purpose is for DNS, it has been running great for the last 3 month. It's running on a P2-200 and 64 MB, 3Com 3c509 10bT. Last night something happened, I was unable to access the network at all. The first thing we noticed was that the DNS is down. I was able to ping the DNS machine but I can't telnet or HTTP into it. I logged in to the console as root and tried to "arp" and the computer just hangs. I control-C out and tried "ifdown eth0" then "ifup eth0" that doesn't seems to fix the problem. If that's all it is, then that's not a big deal, but I was unable to run "arp" on the Linux on my workstation. The "arp" on NT systems runs just fine. I reboot the DNS machine and everything was fine after that. After I rebooted our DNS, my wokstation was able to "arp" agian. What I don't understand is why if the DNS mahcine fails to "arp" it would cause other Linux machines to fail to "arp". The fact that we can't "arp" could mean something more serious such as something failed in the network protocol and it was sending out bad packets. I'm not sure. Could it be a bad nic card? We also have a Watch Guard firewall box that but that has been running for some time now. Could there be a bug in the kernel that has already been fixed. I would really like to know if other people are having this problem, I searched all over the web and couldn't find it. Thank you.
By the way, I looked at /var/log/messages and /var/log/security and was not able to find any useful. There only entry that might be worth mentioning is: Jan 16 23:31:51 dns telnetd[3655]: ttloop: peer died: Invalid or incomplete multibyte or wide character Jan 17 09:09:46 dns telnetd[3886]: ttloop: peer died: Invalid or incomplete multibyte or wide character Both of these lines are next to each other on /var/log/messages
The ARP protocol (as opposed to the arp command) must be functioning or you would not be able to ping your DNS box. The arp command relies on the ability to dynamically load modules AFAIR, so you migh want to look at what modules were loaded using lsmod if your problem re-occurs. Sanity: If your DNS server was unavailable, many client commands/services would require a long time, as host resolution time-outs are painfully long. If that's the problem, you should think about setting up a secondary name server and configuring your clients to use a 2nd server. FWIW, I usually find that a 2nd name server is often not worth the effort since the time-out on a single server is unacceptably slow, and adding a 2nd server just doubles the time before a command/service on a client machine gets around to printing an error message. Far easier to reboot the primary ... As for the telnetd messages, you might want to try the latest telnet*0.16* rpm's (there's two packages now, since the server/client have been separated). The message appears (I can't find the exact string) to be due to a read with a return code <= 0 which shouldn't be a problem (and is probably only indirectly related to the other problem). Just guessing ...
This problem appears to be resolved. Please reopen if I'm wrong.