Bug 77775
Summary: | (NET)Neighbour Table and Lost Packets | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | CJeness <cj> |
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> |
Status: | CLOSED NOTABUG | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 7.3 | CC: | davem |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i586 | ||
OS: | Linux | ||
URL: | www.sforest.org | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2003-06-09 05:33:13 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
CJeness
2002-11-13 13:34:14 UTC
so which kernel are you running? I assume 2.4.18-17.7.x What network card/driver are you using ? The "uname -a" command returns the following: 2.4.18-17.7.x I upgrade at least weekly. We have two nic's in the computer. Our module.conf file shows the following: alias eth1 eepro100 alias eth0 3c59x eth0 is connected to a BellSouth ADSL modem. This is the original ADSL which used ethernet and DHCP. eth1 connects to our internal lan. I am using IPCHAINS for masquerading. In an attempt to resolve the slowdown issues, I have now turned off all of my IPCHAINS rules except FORWARD MASQ. can you try using the e100 module instead ? (replace eepro100 with e100 in modules.conf) I have chnaged my driver from eepro100 to e100 as requested. Apparently, I missed the notification about this request; otherwise, I would have made the change sooner. I will provide an updated status after the system has run for a day. Changing the driver from eepro100 to e100 has not resolved the problem. We continue to see the neighbour table overflow error. More importantly, this error seems to coincide with a high level of packet loss which makes the computer unusable for Interntet activities. Please keep in mind that this computer has been operating successfully under a version of RedHat using the eepro100 driver since 5.2. It was the install of 7.3 that triggered the problems. The previous version was 7.1. Therefore, there is something different in the kernel or some component of the 7.3 distribution which is triggering the problem. This problem is very serious since the computer has to be rebooted about every 12 hours. This message shows up when either of two things have happened: 1) The loopback device is misconfigured 2) The netmask on one of your interfaces is wrong I am extremely confident in this statement, so if you could go and double, no in fact triple check, the loopback interface configuration and that of your interfaces. 2.2.x kernels used to be very lenient on misconfiguration in this area, 2.4.x is not and you absolutely must get this right. That would explain why 7.1 did not show the bahavior and 7.3 does. Below are the results from "ifconfig". eth0 is set up by DHCP and I assume that the netmask is correct. The netmask for eth1 is correct. I am not sure how the "loopback" might be misconfigured since I have never done anything special to configure it. I assumed that it should be appropriately set up as part of the installation process. Can you clarify what I should check on the loopback? ----------eth0 Link encap:Ethernet HWaddr 00:10:4B:25:5D:97 inet addr:66.20.72.252 Bcast:66.20.75.255 Mask:255.255.252.0 UP BROADCAST NOTRAILERS RUNNING MULTICAST MTU:1500 Metric:1 RX packets:58754 errors:0 dropped:0 overruns:0 frame:0 TX packets:32899 errors:0 dropped:0 overruns:0 carrier:0 collisions:2 txqueuelen:100 RX bytes:58712841 (55.9 Mb) TX bytes:5738838 (5.4 Mb) Interrupt:10 Base address:0x7c40 eth1 Link encap:Ethernet HWaddr 00:04:AC:1D:FF:13 inet addr:192.168.1.14 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:29034 errors:0 dropped:0 overruns:0 frame:0 TX packets:22556 errors:0 dropped:0 overruns:0 carrier:0 collisions:37 txqueuelen:100 RX bytes:4542571 (4.3 Mb) TX bytes:11461679 (10.9 Mb) Interrupt:11 Base address:0x7c20 Memory:f3eff000-f3eff038 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:462 errors:0 dropped:0 overruns:0 frame:0 TX packets:462 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:35052 (34.2 Kb) TX bytes:35052 (34.2 Kb) It has now been more than one week since I asked for clarification as to how the loopback interface could be misconfigured and what should be checked. Any help would be greatly appreciated. so are you 1) 100% sure your netmask and broadcast address of eth0 are correct 2) 100% sure you didn't firewall off the loopback device ? Show us your routing table as well. Also, do you think the behavior occurs when dhcp renegs eth0's address? That might be a clue. Also when I say "netmask is correct", I mean does it match what other systems on that subnet are using. Having this not match is what causes neighbour table overflow messages. You say eth1 is correct, fine, but go and make sure eth0 is getting something legitimate. Probably, when these messages are being printed, the contents of /proc/net/arp is full of bogus ARP entries because the netmask is incorrect. Next time it triggers, capture /proc/net/arp and attach it to this bug report. Thanks. With regard to netmask, we have reviewed all of the computers that participate in the network and have verified that that they all use a netmask of 255.255.255.0. At the time that we received the many neighbour table overflow messages yeseterday, here are the contents of /proc/net/arp: IP address HW type Flags HW address Mask Device 192.168.1.25 0x1 0x2 00:20:E0:65:EA:4A * eth1 66.20.72.1 0x1 0x2 00:02:3B:01:6B:94 * eth0 IP address HW type Flags HW address Mask Device 192.168.1.25 0x1 0x2 00:20:E0:65:EA:4A * eth1 66.20.72.1 0x1 0x2 00:02:3B:01:6B:94 * eth0 This is obviously not what you expected. Also, here is the result of netstat -r: Kernel IP routing table; Destination Gateway Genmask Flags MSS Window irtt Iface 192.168.1.0 * 255.255.255.0 U 40 0 0 eth1 66.20.72.0 * 255.255.252.0 U 40 0 0 eth0 127.0.0.0 * 255.0.0.0 U 40 0 0 lo default adsl-20-72-1.as 0.0.0.0 UG 40 0 0 eth0 In terms of etho through DHCP, the results are always consistent. In particular, I always see: eth0 Link encap:Ethernet HWaddr 00:10:4B:25:5D:97 inet addr:66.20.72.252 Bcast:66.20.75.255 Mask:255.255.252.0 UP BROADCAST NOTRAILERS RUNNING MULTICAST MTU:1500 Metric:1 RX packets:56263 errors:0 dropped:0 overruns:0 frame:0 TX packets:41404 errors:0 dropped:0 overruns:0 carrier:0 collisions:1 txqueuelen:100 RX bytes:33599901 (32.0 Mb) TX bytes:3403690 (3.2 Mb) Interrupt:10 Base address:0x7c40 This is really a baffling for me. I have 4 other computers that I upgraded from RedHat 7.1 to 7.3. All of these other computers are working fine. One of the 4 serves the web site for the Atlanta Java Users Group. The only differences between the problem computer and the other 3 include the following: 1. Problem computer uses the newer file system ext3. I had the installation upgrade my existing ext2 for /usr/local and /home 2. Problem computer uses IPCHAINS with the only the following statements: :input ACCEPT :forward ACCEPT :output ACCEPT -P forward MASQ So it uses IP Masquerading. 3. Problem computer interacts with BellSouth DSL. 4. Hardware is dfferent. The computer is an IBM 300 PL. However, this problem computer has been running RedHat Linux and performing the same functionality for about 4 years. In the past, its primary problem related to its S3Trio3D video card. This now seems to work OK. What else can I look at? Should I try upgrading to RedHat 8 which I have already purchased? Upgrading to 8.0 isn't likely to help much, as the errata kernels are nearly identical. I won't be able to help more with this until the new year. You could try taking masquerading out of the equation, if such an experiment is possible. The primary reason that we have this computer is to do IP masquerading. Disabling masquerading would shut off our Internet access. We have been using the same IPCHAINS command now for as long as we have had BellSouth DSL or about 4 years. When I first installed RedHat 7.3, I had accepted your firewall settings to try to make our environment more secure. However, when we started having problems, I eliminated all the security and went back to just the single masquerading command. Good news for all. This bug has been resolved by the following action. We disabled the network interface on the motherboard (eepro100) and installed a Netgear PCI LAN card. I don't know whether this was a software incompatibility (i.e. RedHat 7.3 and EEPRO100 driver) or just a hardware failure which arose around the time that we upgraded to 7.3. We also upgraded the memory from 128 to 256 Mb. Therefore, you may close this bug with whatever resulotion code you deem appropriate. Thanks for the help and suggestions. Just to clarify, you were using the eepro100 and e100 drivers from the Red Hat kernel rpms, right? Or were you using a vendor supplied kernel module image? It'd be nice if this was indeed a convenient hardware failure of some sort, but I'm not convinced of that just yet :) Yes, we tried both the e100 and the eepro100 driver which are part of the RedHat distribution. There is another issue that we see at work with this card. At work, we have a 100 Mb ethernet switch running in full-duplex mode. When I installed RedHat 7.3 on one of the IBM 300 PL's at work, everything operated correctly except the network interactions which were extremely slow. We thought that perhaps the NIC was not going into full-duplex mode. However, when I researched the driver parameters on the RedHat site, I drew the conclusion that there was no way to force full-duplex. The card was supposed to sense this. I did some searches on Google which seemed to confirm this. So we have seen other issues with this on-board NIC. This full-duplex issue is actually a big problem for getting Linux adopted at work. The only desktop computers we have are IBM 300 PL's. I just have not had time to pursue it further. You can control the duplex setting using the "ethtool" utility. Your original problem is gone so I'm closing this. |