Red Hat Bugzilla – Bug 197440
ppp occasionally dies
Last modified: 2007-11-30 17:11:36 EST
Description of problem:
I use ppp in dial on demand mode and it occassionally locks up and then dies.
Typically I will see the modem still indicating outbound packets, but no inbound
packets and then a little while later the ppp interface will disappear. I can
use ifdown and ifup to get things working again. This happens sporadically.
Sometimes I will see if happen a couple of times in an hour and other times I
can be connected for over 8 hours without having the problem.
I first noticed this behavior after upgrading to the 2.6.16 kernel, however some
other things changed around that time as well. My dialup provider changed who
they buy dialup service from. So it might have been a preexisting bug that
wasn't triggered by the previous service provider. I also started running the
zaptel kernel module (for asterisk) around that time and there could potentially
be a problem with it.
I probably should have tried going back to a 2.6.15 kernel for a while, but
these seem to have been removed from updates now so I can't try this easily.
(I think the last 2.6.15 kernel update for FC5 might have some security issues,
but they would probably be low risk for me and I could try this if it would help
and I can get a copy.)
I don't know if this is related or not, but I also have been getting the
following log message, though I don't know that it correlates with when I have
the described problem:
PPP: VJ uncompressed error
Version-Release number of selected component (if applicable):
2.6.16 and 2.6.17 smp kernels
Steps to Reproduce:
1. Make a dial up connection using ppp
ppp occasionally will lock up and then usally die shortly there after.
Here are some questions:
Is pppd dying at this moment?
Is it reproducable without that additional kernel module?
Please use a debugger or strace on pppd if it is happening again.
I saw at least one instance over last weekend.
I am still just playing with Asterisk and can uninstall the zaptel modules for a
while to see if that stops the crashes.
The next time I see a hang I will see if I can attach a debugger before the ppp
interface disappears. That will be a learning process me, so I might botch an
attempt or a few, before I get it right.
It isn't clear to me that the hangs are a Fedora problem, but once they occur I
should be able to do a sighup to get ppp to hang up the connection without
losing the ppp interface.
Since it has been a few days I thought I should give an update.
I upgraded to the 2.6.17-2145 kernel and uninstalled the zaptel kernel module.
I haven't seen a recurrence of the problem yet, but only have spent about 4
hours connected using ppp, so it is a bit early to say that zaptel was the problem.
I should rack up some more hours over the weekend. If it doesn't occur for over
something like 16 hours, then I will try putting zaptel back and see if the
failures start up again.
If it does turn out to be related to zaptel, it may not be the kernel module
itself. I believe that the device (TDM400) it runs generates 8000 interrupts per
second and the problem could also be related to that. I'll probably be over my
head trying to debug zaptel if things point that way, but I might ask you guys
for some general advice on how to approach doing that if needed.
It died again. zaptel wasn't installed and I was running the 2.6.17 2145
smp kernel. I wasn't around when it happened, so I didn't get to try to do
anything between the network hang and the interface disappearing. I'll now
try to work on capturing some information about how and/or where it is dying.
Please check if pppd is dying or if there is a problem with the ppp kernel driver.
Is pppd still running? Which state has it? Please have a look at the ps output
and use a debugger and/or strace on pppd if it still there.
Created attachment 132671 [details]
strace of pppd
I finally got an strace of pppd. I X'd out a prefix for my dialup password in
one of the writes, but otherwise the file is intact.
I am still seeing the ppp daemon crash. Is there anything else that would be
helpful for isolating what is happening?
P.S. I am not sure I checked the "I am providing the requested information for
this bug." last time, but am now as the status stayed 'NEEDINFO' after I
uploaded the strace data.
Apparebtly I had something else checked or that check box didn't do what I
thought. The ticket should be still open as the problem continues to happen.
So far I haven't be seeing this in Fedora 7. Occasionally the connection hangs,
but the pppd process seems to stay usable if I hang up and reconnect. I haven't
been running F7 very long, so it is possible the problem may just be happening
less often, but based on what i am seeing, I think you can close this when FC5
is EOL'd. (I never ended up testing dial up in FC6, so I don't know if there is
a problem there or not.)
I think I have seen this once now in that I found the pppd process no longer
running. I wasn't actively using the machine when it happened. I don't think
there is any reason it should have stopped running.
Things are still a lot better than with FC5.
Please verify this with a newer version of Red Hat Enterprise Linux or
Fedora Core and reopen it against the new version if there is still a
Closing as "WONTFIX" for now.
I am pretty sure the problem I was having here is fixed in F7. I have only seen
the one possible reocurrance and I am not even sure about that one. The common
triggering problem that occurred previously is still occurring. My ISP will stop
sending packets to my modem sometimes. But now when that happens ppp stays up,
whereas before it almost always died. The communication problem is not going to
be isolatable as to whose fault is so there is no point in opening a ticket
versus Fedora for it. (My suspicion is that the connection isn't resynching
sometimes after retrains.)