Bug 68072
Summary: | PPP over ADSL intermittent network timeouts | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Bill Woodward <wpwood> | ||||
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 7.3 | CC: | boultonj | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2004-09-30 15:39:44 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Bill Woodward
2002-07-06 02:07:50 UTC
Thank you for this very useful report. The fact that 2.4.9-XX work and 2.4.18 doesn't is valueable. The changes to the pppoe code between those versions is rather small and I'll just revert to the 2.4.9 version. I've seen something similar to this problem. It's not exactly the same, but close enough that I think it's related. In my case, certain web pages are consistently unviewable - download progress stops part of the way through these pages. I spent some time trying to track things down. I've attached a couple tcpdump logs showing that certain packets have consistently incorrect checksums (checksum is wrong on all retransmits). The tcpdump logs are from the firewall machine, which does the pppoe. torch.int-domain is the main machine, where the web browser is running. Behavior is the same if I run the web browser on the firewall. The problem started when I switched to a new firewall machine, which runs RH 7.3. The old firewall ran 7.0, so both hardware and software changed, making it hard to narrow down the problem. I can investigate further if someone wants to point me in a particular direction. Logs generated using the following command: # tcpdump -l -s0 -v -i eth0 host 162.129.49.63 Created attachment 64146 [details]
tcpdump log of web connection stall condition
The problem that boultonj reported may not be the same. I saw a problem with Mozilla on Linux where it *consistently* hung on certain web pages. I solved this problem by going into the Mozilla HTTP Networking preferences and turning off HTTP 1.1, and unchecking 'Enable Pipelining' and 'Enable Persistent Connections'. I assumed that this was a Mozilla problem. The problem I'm reporting in this bug report is not consistent. The same page may work fine, hang, or get a name resolution error each time it is reloaded. Since it's dependent on the kernel version, I'm pretty sure it's a Linux bug. More info: After checking the pppoe forum over at roaring penguin, I decided to check my syslog when the problem occurred. Lo and behold, I get the following set of messages when the Mozilla connection hangs: --- snip /var/log/messages --- Jul 9 00:13:46 localhost pppoe[16910]: Inactivity timeout... something wicked happened Jul 9 00:13:46 localhost pppoe[16910]: Sent PADT Jul 9 00:13:46 localhost pppd[16909]: Modem hangup Jul 9 00:13:46 localhost pppd[16909]: Connection terminated. Jul 9 00:13:46 localhost pppd[16909]: Connect time 8.4 minutes. Jul 9 00:13:46 localhost pppd[16909]: Sent 654179 bytes, received 2061398 bytes. Jul 9 00:13:46 localhost /etc/hotplug/net.agent: NET unregister event not supported Jul 9 00:13:46 localhost pppd[16909]: Exit. Jul 9 00:13:46 localhost adsl-connect: ADSL connection lost; attempting re-connection. Jul 9 00:13:51 localhost pppd[17029]: pppd 2.4.1 started by root, uid 0 Jul 9 00:13:51 localhost pppd[17029]: Using interface ppp0 Jul 9 00:13:51 localhost pppd[17029]: Connect: ppp0 <--> /dev/pts/2 Jul 9 00:13:51 localhost /etc/hotplug/net.agent: assuming ppp0 is already up Jul 9 00:13:51 localhost pppoe[17030]: PPP session is 10512 Jul 9 00:13:52 localhost pppd[17029]: local IP address 64.123.14.128 Jul 9 00:13:52 localhost pppd[17029]: remote IP address 64.123.15.254 Jul 9 00:13:52 localhost pppd[17029]: primary DNS address 151.164.20.201 Jul 9 00:13:52 localhost pppd[17029]: secondary DNS address 151.164.11.201 Jul 9 00:14:31 localhost pppoe[17030]: Inactivity timeout... something wicked happened Jul 9 00:14:31 localhost pppoe[17030]: Sent PADT Jul 9 00:14:31 localhost pppd[17029]: Modem hangup Jul 9 00:14:31 localhost pppd[17029]: Connection terminated. Jul 9 00:14:31 localhost pppd[17029]: Connect time 0.7 minutes. Jul 9 00:14:31 localhost pppd[17029]: Sent 1474 bytes, received 14473 bytes. Jul 9 00:14:31 localhost /etc/hotplug/net.agent: NET unregister event not supported Jul 9 00:14:31 localhost pppd[17029]: Exit. Jul 9 00:14:31 localhost adsl-connect: ADSL connection lost; attempting re-connection. Jul 9 00:14:36 localhost pppd[17107]: pppd 2.4.1 started by root, uid 0 Jul 9 00:14:36 localhost pppd[17107]: Using interface ppp0 Jul 9 00:14:36 localhost pppd[17107]: Connect: ppp0 <--> /dev/pts/2 Jul 9 00:14:37 localhost /etc/hotplug/net.agent: assuming ppp0 is already up Jul 9 00:14:37 localhost pppoe[17108]: PPP session is 10523 Jul 9 00:14:38 localhost pppd[17107]: local IP address 64.217.72.252 Jul 9 00:14:38 localhost pppd[17107]: remote IP address 64.217.73.254 Jul 9 00:14:38 localhost pppd[17107]: primary DNS address 151.164.20.201 Jul 9 00:14:38 localhost pppd[17107]: secondary DNS address 151.164.11.201 --- end snip --- The log posted by wpwood on 7-9 is definitely a different problem from the one I'm having. I don't believe that my problem is a Mozilla issue, as suggested in the message from 7-8, although when I switch firewall machines again (soon, I hope) I'm going to try changing the Mozilla settings as suggested to see if it makes life tolerable. Anyway, since it looks like my problem is different, I'll probably open up a new bug for it. More info -- I downloaded and built the kernel 2.4.19-rc1 code from kernel.org. After this, the problem went away. I have no problem with running a custom version of the kernel, but I assume that there are other PPPoX users out there seeing the same problem. So, with the 2.4.19-rc1 kernel, no more hangs, and no "ADSL connection lost" errors in the syslog. From the /var/log/messages snippet above I see that was the same problem I was experiencing. After some digging I found that the /etc/sysconfig/network-scripts/ifcfg-ppp0 contained PPPOE_TIMEOUT=20 and LCP_INTERVAL=20, now I've read somewhere that the timeout should be about 4 times the LCP interval so I changed it to 80, the problem went away. This seems to be fixed in Red Hat 8.0 Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/ |