Bug 465399

Summary: all 2.6.26 kernels break PPPoE
Product: [Fedora] Fedora Reporter: James Ralston <ralston>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: high    
Version: 9CC: alan, kernel-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-11-26 11:09:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description James Ralston 2008-10-03 05:16:23 UTC
I have been unable to get PPPoE working under any 2.6.26 kernels.

When booting a 2.6.26 kernel, these messages repeatedly show up in /var/log/messages, and the PPPoE interface is non-functional:

Oct  3 00:45:01 example pppd[3303]: pppd 2.4.4 started by root, uid 0
Oct  3 00:45:01 example pppd[3303]: Using interface ppp0
Oct  3 00:45:01 example pppd[3303]: Connect: ppp0 <--> /dev/pts/0
Oct  3 00:45:32 example pppd[3303]: LCP: timeout sending Config-Requests
Oct  3 00:45:32 example pppd[3303]: Connection terminated.
Oct  3 00:45:32 example pppd[3303]: Modem hangup
Oct  3 00:45:36 example pppoe[3304]: Timeout waiting for PADO packets
Oct  3 00:45:36 example pppd[3303]: Exit.
Oct  3 00:45:36 example pppoe-connect: PPPoE connection lost; attempting re-connection.

However, if I boot any 2.6.25 kernel (on the same machine without changing any installed packages), my PPPoE interface comes up perfectly.

Versions:

0:kernel-2.6.25.14-108.fc9.x86_64
0:kernel-2.6.26.3-29.fc9.x86_64
0:kernel-2.6.26.5-45.fc9.x86_64
0:ppp-2.4.4-7.fc9.x86_64
0:rp-pppoe-3.8-3.fc9.x86_64

Comment 1 Alan Cox 2008-10-03 19:21:37 UTC
At a root console "modprobe ppp_async' as root, then retry. Let me know if that helps.

I don't however understand why we aren't using the pppoe kernel module instead but thats another story as this stuff shouldnt have broken

Comment 2 James Ralston 2008-10-05 07:25:42 UTC
Ensuring that the ppp_async module is loaded has no effect, as far as I can tell; running pppoe under 2.6.26 kernels still fails, in exactly the same way...

Comment 3 Chuck Ebbert 2008-10-06 11:51:39 UTC
So it never works, not even once?

I can make a ppp connection with a USB EVDO modem but it hangs when trying to bring it up a second time. I get this message in the system log when taking down the connection:

localhost pppd[26606]: ioctl(TIOCSETD, N_TTY): Interrupted system call (line 562)

The only way I can "fix" it is to disconnect and reconnect the modem.

Comment 4 Alan Cox 2008-10-06 12:59:03 UTC
That last one would be a pppd bug  I think. Can you strace it Chuck. It would then fail next time as the ldisc
would still be set to N_PPP. If you were to force it back to N_TTY with ioctl(.. TIOCSETD ..) you'd I think find it then works.

What modem is being used by James and what supports it driver wise ?

Comment 5 Chuck Ebbert 2008-10-06 17:50:17 UTC
(In reply to comment #4)
> That last one would be a pppd bug  I think. Can you strace it Chuck. It would
> then fail next time as the ldisc
> would still be set to N_PPP. If you were to force it back to N_TTY with
> ioctl(.. TIOCSETD ..) you'd I think find it then works.

I never got that error until I upgraded from 2.6.25 to 2.6.26. The ppp package is unchanged.

Comment 6 Alan Cox 2008-10-06 19:36:36 UTC
So is that part of the kernel code.

And it has always been the case that TIOCSETD can be interrupted by a signal

Comment 7 James Ralston 2008-10-06 20:04:50 UTC
Chuck: correct; with 2.6.26, it never works, not even once.  (With 2.6.25, it works perfectly every time.)

Alan: I'm not sure what you mean by "what modem", as I'm using PPPoE.  The only physical modem involved is my ADSL modem (which functions as an Ethernet bridge).

Comment 8 Chuck Ebbert 2008-10-06 22:19:27 UTC
(In reply to comment #4)
> That last one would be a pppd bug  I think. Can you strace it Chuck.

If I strace it the timing changes and the problem doesn't happen reliably (I got it to happen once -- pppd cloned a thread and it exited immediately, delivering SIGCHLD during the TIOCSETD ioctl call.)

If I try to follow forks there is no output from the child but strace does this:

# strace -f -o /var/log/ppp.trace -p 16879
Process 16879 attached - interrupt to quit
Process 17019 attached
PANIC: attached pid 17019 exited
Process 16879 detached

Comment 9 Alan Cox 2008-10-06 22:31:37 UTC
Ok so thats what I suspected - its timing and its probably the 2.6.26 scheduler changes. PPPD needs to either retry the TIOCSETD or mask signals properly.

Comment 10 James Ralston 2008-10-07 02:17:33 UTC
I'm wondering if I'm seeing the same problem Chuck is seeing, or a different problem, as I haven't seen the "Interrupted system call" message.

Here are the ppp-related message I see with 2.6.26 on system boot:

Oct  5 03:17:29 example kernel: PPP generic driver version 2.4.2
Oct  5 03:18:43 example pppd[2488]: No response to 3 echo-requests
Oct  5 03:18:43 example pppd[2488]: Serial link appears to be disconnected.
Oct  5 03:18:43 example pppd[2488]: Connect time 1.4 minutes.
Oct  5 03:18:43 example pppd[2488]: Sent 72125 bytes, received 64460 bytes.
Oct  5 03:18:49 example pppd[2488]: Connection terminated.
Oct  5 03:18:49 example pppd[2488]: Modem hangup
Oct  5 03:18:49 example pppoe[2489]: read (asyncReadFromPPP): Session 33966: Input/output error
Oct  5 03:18:49 example pppoe[2489]: Sent PADT
Oct  5 03:18:49 example pppd[2488]: Exit.
Oct  5 03:18:49 example pppoe-connect: PPPoE connection lost; attempting re-connection.
Oct  5 03:18:54 example pppd[3216]: pppd 2.4.4 started by root, uid 0
Oct  5 03:18:54 example pppd[3216]: Using interface ppp0
Oct  5 03:18:54 example pppd[3216]: Connect: ppp0 <--> /dev/pts/0

With 2.6.25, I only see the "PPP generic driver version 2.4.2" message, and everything works.

Chuck, did you have to enable debugging in pppd to get the "Interrupted system call" message?

Comment 11 Chuck Ebbert 2008-10-08 01:44:16 UTC
(In reply to comment #10)
> Chuck, did you have to enable debugging in pppd to get the "Interrupted system
> call" message?

No.

You can strace pppd by editing /etc/sysconfig/network-scripts/ifup-ppp

Look for a line that says this:

 $exec pppd $opts ${MODEMPORT} ${LINESPEED} \

Change it to:

 $exec strace -o /var/log/ppp.trace pppd $opts ${MODEMPORT} ${LINESPEED} \

You can also try:

 $exec strace -f -o /var/log/ppp.trace pppd $opts ${MODEMPORT} ${LINESPEED} \

Then compare the ppp.trace files from 2.6.25 and 2.6.26 (be sure to copy the file somewhere after each attempt as it will get overwritten.)

Be careful what you upload to bugzilla from those logs since it might contain sensitive data.

Comment 12 James Ralston 2008-11-26 11:09:07 UTC
I finally tracked this down. It has nothing to do with pppd/pppoe.

I use the kernel's traffic control capabilities to shape the eth0 interface, which faces my DSL modem.  The shaper package I wrote does all of the calculations itself, but in essence, I was doing (in part):

$ tc class add dev eth0 parent 1: classid 1:1 htb rate 704kbit burst 6k ceil 704kbit cburst 6k
$ tc class add dev eth0 parent 1:1 classid 1:10 htb rate 620kbit burst 6k ceil 704 cburst 6k
$ tc class add dev eth0 parent 1:1 classid 1:11 htb rate 56kbit burst 6k ceil 704 cburst 6k
$ tc class add dev eth0 parent 1:1 classid 1:12 htb rate 28kbit burst 6k ceil 704 cburst 6k

The problem was that when creating the child classes, I was omitting the "kbit" unit specification for the ceiling rate. Through 2.6.25 kernels, if the units were unspecified, "kbit" was pretty clearly being assumed. Starting with 2.6.26, however, rates with unspecified units are interpreted as "bps" (bytes per second) instead.

So, in essence, pppoe was failing because its upstream bandwidth was being throttled down to "tin cans and string" rates.  :(

Closing with OOPSMYBAD. (Although Chuck, you should probably file a new bug against pppd based on Alan's observations in comment 9...)