Bug 465399
Summary: | all 2.6.26 kernels break PPPoE | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | James Ralston <ralston> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 9 | CC: | alan, kernel-maint |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-11-26 11:09:07 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
James Ralston
2008-10-03 05:16:23 UTC
At a root console "modprobe ppp_async' as root, then retry. Let me know if that helps. I don't however understand why we aren't using the pppoe kernel module instead but thats another story as this stuff shouldnt have broken Ensuring that the ppp_async module is loaded has no effect, as far as I can tell; running pppoe under 2.6.26 kernels still fails, in exactly the same way... So it never works, not even once? I can make a ppp connection with a USB EVDO modem but it hangs when trying to bring it up a second time. I get this message in the system log when taking down the connection: localhost pppd[26606]: ioctl(TIOCSETD, N_TTY): Interrupted system call (line 562) The only way I can "fix" it is to disconnect and reconnect the modem. That last one would be a pppd bug I think. Can you strace it Chuck. It would then fail next time as the ldisc would still be set to N_PPP. If you were to force it back to N_TTY with ioctl(.. TIOCSETD ..) you'd I think find it then works. What modem is being used by James and what supports it driver wise ? (In reply to comment #4) > That last one would be a pppd bug I think. Can you strace it Chuck. It would > then fail next time as the ldisc > would still be set to N_PPP. If you were to force it back to N_TTY with > ioctl(.. TIOCSETD ..) you'd I think find it then works. I never got that error until I upgraded from 2.6.25 to 2.6.26. The ppp package is unchanged. So is that part of the kernel code. And it has always been the case that TIOCSETD can be interrupted by a signal Chuck: correct; with 2.6.26, it never works, not even once. (With 2.6.25, it works perfectly every time.) Alan: I'm not sure what you mean by "what modem", as I'm using PPPoE. The only physical modem involved is my ADSL modem (which functions as an Ethernet bridge). (In reply to comment #4) > That last one would be a pppd bug I think. Can you strace it Chuck. If I strace it the timing changes and the problem doesn't happen reliably (I got it to happen once -- pppd cloned a thread and it exited immediately, delivering SIGCHLD during the TIOCSETD ioctl call.) If I try to follow forks there is no output from the child but strace does this: # strace -f -o /var/log/ppp.trace -p 16879 Process 16879 attached - interrupt to quit Process 17019 attached PANIC: attached pid 17019 exited Process 16879 detached Ok so thats what I suspected - its timing and its probably the 2.6.26 scheduler changes. PPPD needs to either retry the TIOCSETD or mask signals properly. I'm wondering if I'm seeing the same problem Chuck is seeing, or a different problem, as I haven't seen the "Interrupted system call" message. Here are the ppp-related message I see with 2.6.26 on system boot: Oct 5 03:17:29 example kernel: PPP generic driver version 2.4.2 Oct 5 03:18:43 example pppd[2488]: No response to 3 echo-requests Oct 5 03:18:43 example pppd[2488]: Serial link appears to be disconnected. Oct 5 03:18:43 example pppd[2488]: Connect time 1.4 minutes. Oct 5 03:18:43 example pppd[2488]: Sent 72125 bytes, received 64460 bytes. Oct 5 03:18:49 example pppd[2488]: Connection terminated. Oct 5 03:18:49 example pppd[2488]: Modem hangup Oct 5 03:18:49 example pppoe[2489]: read (asyncReadFromPPP): Session 33966: Input/output error Oct 5 03:18:49 example pppoe[2489]: Sent PADT Oct 5 03:18:49 example pppd[2488]: Exit. Oct 5 03:18:49 example pppoe-connect: PPPoE connection lost; attempting re-connection. Oct 5 03:18:54 example pppd[3216]: pppd 2.4.4 started by root, uid 0 Oct 5 03:18:54 example pppd[3216]: Using interface ppp0 Oct 5 03:18:54 example pppd[3216]: Connect: ppp0 <--> /dev/pts/0 With 2.6.25, I only see the "PPP generic driver version 2.4.2" message, and everything works. Chuck, did you have to enable debugging in pppd to get the "Interrupted system call" message? (In reply to comment #10) > Chuck, did you have to enable debugging in pppd to get the "Interrupted system > call" message? No. You can strace pppd by editing /etc/sysconfig/network-scripts/ifup-ppp Look for a line that says this: $exec pppd $opts ${MODEMPORT} ${LINESPEED} \ Change it to: $exec strace -o /var/log/ppp.trace pppd $opts ${MODEMPORT} ${LINESPEED} \ You can also try: $exec strace -f -o /var/log/ppp.trace pppd $opts ${MODEMPORT} ${LINESPEED} \ Then compare the ppp.trace files from 2.6.25 and 2.6.26 (be sure to copy the file somewhere after each attempt as it will get overwritten.) Be careful what you upload to bugzilla from those logs since it might contain sensitive data. I finally tracked this down. It has nothing to do with pppd/pppoe. I use the kernel's traffic control capabilities to shape the eth0 interface, which faces my DSL modem. The shaper package I wrote does all of the calculations itself, but in essence, I was doing (in part): $ tc class add dev eth0 parent 1: classid 1:1 htb rate 704kbit burst 6k ceil 704kbit cburst 6k $ tc class add dev eth0 parent 1:1 classid 1:10 htb rate 620kbit burst 6k ceil 704 cburst 6k $ tc class add dev eth0 parent 1:1 classid 1:11 htb rate 56kbit burst 6k ceil 704 cburst 6k $ tc class add dev eth0 parent 1:1 classid 1:12 htb rate 28kbit burst 6k ceil 704 cburst 6k The problem was that when creating the child classes, I was omitting the "kbit" unit specification for the ceiling rate. Through 2.6.25 kernels, if the units were unspecified, "kbit" was pretty clearly being assumed. Starting with 2.6.26, however, rates with unspecified units are interpreted as "bps" (bytes per second) instead. So, in essence, pppoe was failing because its upstream bandwidth was being throttled down to "tin cans and string" rates. :( Closing with OOPSMYBAD. (Although Chuck, you should probably file a new bug against pppd based on Alan's observations in comment 9...) |