Bug 68072

Summary: PPP over ADSL intermittent network timeouts
Product: [Retired] Red Hat Linux Reporter: Bill Woodward <wpwood>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.3CC: boultonj
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-30 15:39:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
tcpdump log of web connection stall condition none

Description Bill Woodward 2002-07-06 02:07:50 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0) Gecko/20020607

Description of problem:
Under Redhat 7.3, with the latest kernel (2.4.18-5), but also with the original
(2.4.18-3?), running a pppoe DSL connection, I see a lot of browser timeouts
which quickly resolve, hung browser connection, just general network
instability.  Usually it manifests as hanging page loads in Mozilla, or name
resolution issues which resolve after a few seconds.  However, it happens so
often (like once a minute), and sometimes I have to kill Mozilla and restart it,
that I consider it a serious problem.

Version-Release number of selected component (if applicable):


How reproducible:
Couldn't Reproduce

Steps to Reproduce:
I cannot reproduce this reliably, but it happens every time I use the network. 
As noted below, I saw it in Redhat 7.2, and it is dependent on the kernel
version.  I will be happy to provide more info if you have suggested steps to
diagnose.

Additional info:

I first noticed this problem running RedHat 7.2, but building the 2.4.18 kernel
(not sure which minor version).  Since this wasn't an official version, I backed
back down to kernel 2.4.9-whatever was the latest available for 7.2, and the
problem went away.  Since this is not an option in Redhat 7.3, I feel it needs
to be resolved.

Comment 1 Arjan van de Ven 2002-07-06 19:17:15 UTC
Thank you for this very useful report. The fact that 2.4.9-XX work and 2.4.18
doesn't is valueable. The changes to the pppoe code between those versions is
rather small and I'll just revert to the 2.4.9 version.

Comment 2 boultonj 2002-07-07 20:28:51 UTC
I've seen something similar to this problem.  It's not exactly the
same, but close enough that I think it's related.  In my case, certain
web pages are consistently unviewable - download progress stops part
of the way through these pages.  I spent some time trying to track
things down.  I've attached a couple tcpdump logs showing that certain
packets have consistently incorrect checksums (checksum is wrong on
all retransmits).  The tcpdump logs are from the firewall machine,
which does the pppoe.  torch.int-domain is the main machine, where the
web browser is running.  Behavior is the same if I run the web browser
on the firewall.  The problem started when I switched to a new
firewall machine, which runs RH 7.3.  The old firewall ran 7.0, so
both hardware and software changed, making it hard to narrow down the
problem.  I can investigate further if someone wants to point me in a
particular direction.

Logs generated using the following command:

# tcpdump -l -s0 -v -i eth0 host 162.129.49.63

Comment 3 boultonj 2002-07-07 20:30:37 UTC
Created attachment 64146 [details]
tcpdump log of web connection stall condition

Comment 4 Bill Woodward 2002-07-08 15:07:50 UTC
The problem that boultonj reported may not be the same.  I saw a problem with
Mozilla on Linux where it *consistently* hung on certain web pages.  I solved
this problem by going into the Mozilla HTTP Networking preferences and turning
off HTTP 1.1, and unchecking 'Enable Pipelining' and 'Enable Persistent
Connections'.  I assumed that this was a Mozilla problem.

The problem I'm reporting in this bug report is not consistent.  The same page
may work fine, hang, or get a name resolution error each time it is reloaded. 
Since it's dependent on the kernel version, I'm pretty sure it's a Linux bug.

Comment 5 Bill Woodward 2002-07-09 05:19:36 UTC
More info:  After checking the pppoe forum over at roaring penguin, I decided to
check my syslog when the problem occurred.  Lo and behold, I get the following
set of messages when the Mozilla connection hangs:


--- snip /var/log/messages ---
Jul  9 00:13:46 localhost pppoe[16910]: Inactivity timeout... something wicked
happened
Jul  9 00:13:46 localhost pppoe[16910]: Sent PADT
Jul  9 00:13:46 localhost pppd[16909]: Modem hangup
Jul  9 00:13:46 localhost pppd[16909]: Connection terminated.
Jul  9 00:13:46 localhost pppd[16909]: Connect time 8.4 minutes.
Jul  9 00:13:46 localhost pppd[16909]: Sent 654179 bytes, received 2061398 bytes.
Jul  9 00:13:46 localhost /etc/hotplug/net.agent: NET unregister event not supported
Jul  9 00:13:46 localhost pppd[16909]: Exit.
Jul  9 00:13:46 localhost adsl-connect: ADSL connection lost; attempting
re-connection.
Jul  9 00:13:51 localhost pppd[17029]: pppd 2.4.1 started by root, uid 0
Jul  9 00:13:51 localhost pppd[17029]: Using interface ppp0
Jul  9 00:13:51 localhost pppd[17029]: Connect: ppp0 <--> /dev/pts/2
Jul  9 00:13:51 localhost /etc/hotplug/net.agent: assuming ppp0 is already up
Jul  9 00:13:51 localhost pppoe[17030]: PPP session is 10512
Jul  9 00:13:52 localhost pppd[17029]: local  IP address 64.123.14.128
Jul  9 00:13:52 localhost pppd[17029]: remote IP address 64.123.15.254
Jul  9 00:13:52 localhost pppd[17029]: primary   DNS address 151.164.20.201
Jul  9 00:13:52 localhost pppd[17029]: secondary DNS address 151.164.11.201
Jul  9 00:14:31 localhost pppoe[17030]: Inactivity timeout... something wicked
happened
Jul  9 00:14:31 localhost pppoe[17030]: Sent PADT
Jul  9 00:14:31 localhost pppd[17029]: Modem hangup
Jul  9 00:14:31 localhost pppd[17029]: Connection terminated.
Jul  9 00:14:31 localhost pppd[17029]: Connect time 0.7 minutes.
Jul  9 00:14:31 localhost pppd[17029]: Sent 1474 bytes, received 14473 bytes.
Jul  9 00:14:31 localhost /etc/hotplug/net.agent: NET unregister event not supported
Jul  9 00:14:31 localhost pppd[17029]: Exit.
Jul  9 00:14:31 localhost adsl-connect: ADSL connection lost; attempting
re-connection.
Jul  9 00:14:36 localhost pppd[17107]: pppd 2.4.1 started by root, uid 0
Jul  9 00:14:36 localhost pppd[17107]: Using interface ppp0
Jul  9 00:14:36 localhost pppd[17107]: Connect: ppp0 <--> /dev/pts/2
Jul  9 00:14:37 localhost /etc/hotplug/net.agent: assuming ppp0 is already up
Jul  9 00:14:37 localhost pppoe[17108]: PPP session is 10523
Jul  9 00:14:38 localhost pppd[17107]: local  IP address 64.217.72.252
Jul  9 00:14:38 localhost pppd[17107]: remote IP address 64.217.73.254
Jul  9 00:14:38 localhost pppd[17107]: primary   DNS address 151.164.20.201
Jul  9 00:14:38 localhost pppd[17107]: secondary DNS address 151.164.11.201
--- end snip ---

Comment 6 boultonj 2002-07-21 10:22:48 UTC
The log posted by wpwood on 7-9 is definitely a different problem from
the one I'm having.  I don't believe that my problem is a Mozilla
issue, as suggested in the message from 7-8, although when I switch
firewall machines again (soon, I hope) I'm going to try changing the
Mozilla settings as suggested to see if it makes life tolerable.

Anyway, since it looks like my problem is different, I'll probably open
up a new bug for it.

Comment 7 Bill Woodward 2002-07-22 15:24:44 UTC
More info -- I downloaded and built the kernel 2.4.19-rc1 code from kernel.org.
 After this, the problem went away.  I have no problem with running a custom
version of the kernel, but I assume that there are other PPPoX users out there
seeing the same problem.

So, with the 2.4.19-rc1 kernel, no more hangs, and no "ADSL connection lost"
errors in the syslog.


Comment 8 Carlos Rodrigues 2002-09-02 00:23:34 UTC
From the /var/log/messages snippet above I see that was the same problem I was 
experiencing. After some digging I found that the 
/etc/sysconfig/network-scripts/ifcfg-ppp0 contained PPPOE_TIMEOUT=20 and 
LCP_INTERVAL=20, now I've read somewhere that the timeout should be about 4 
times the LCP interval so I changed it to 80, the problem went away.

Comment 9 Carlos Rodrigues 2002-10-17 10:08:16 UTC
This seems to be fixed in Red Hat 8.0

Comment 10 Bugzilla owner 2004-09-30 15:39:44 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/