Bug 146459

Summary: ppp with channel bonding wrong interface assigned to clients
Product: [Fedora] Fedora Reporter: John Horne <john.horne>
Component: pppAssignee: Thomas Woerner <twoerner>
Status: CLOSED NOTABUG QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 3CC: mattdm
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-07-11 10:00:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description John Horne 2005-01-28 15:19:24 UTC
Description of problem:
On a system with channel bonding ppp selects the wrong proxyarp
interface for ppp clients. It should select the bonded interface but
doesn't, it checks for up interfaces but uses the last one seen
(assuming the netmask is the same). As such the bond0 interface is
seen but discarded in preference for eth0. Clients have no Internet
connection with eth0. Modifying the arp table to delete the eth0 entry
for the client and setting a bond0 entry then allows Internet traffic.

The problem is that with an incorrect arp table entry the upstream
router sends arp queries for the client IP/MAC address but the FC3
server never sends an arp reply (tcpdump shows this). As such traffic
from the client can get out but never gets back to the client. doing
the mod above on the arp table entry, and then tcpdump shows arp
replies being sent out.

Doing something on the FC3 server like 'ping -I eth0 141.163.1.250'
causes an error (Destination Host Unreachable), whereas using bond0 it
works.

Note that on an FC2 server this worked fine, bond0 was always used.
Unfortunately the FC2 server is live so cannot test with it.

Version-Release number of selected component (if applicable):
kernel-2.6.10-1.741_FC3
ppp-2.4.2-6.4.FC3

How reproducible:
Configure system for channel bonding.
Connect to system using ppp.
Log shows that eth0 interface has been selected for proxyarp; arp
table shows that ppp client has an entry with eth0 interface. Client
has no Internet, or local, connectivity.

Steps to Reproduce:
1.
2.
3.
  
Actual results:
PPP client has no network connectivity.

Expected results:
Client should see the Internet.

Additional info:
Done some testing with the PPP source RPM - note setting SYSDEBUG (or
was it DEBUGSYS) causes the pppd daemon to die (signal 11). Obviously
a bug in the debugging code!?

Modified pppd to send out more info about the interfaces looked at.
Log shows:

=========================================================
Jan 28 14:49:01 fred pppd[1913]: proxy arp: scanning 7 interfaces for
IP 141.163.108.30
Jan 28 14:49:01 fred pppd[1913]: proxy arp: examining interface lo
Jan 28 14:49:02 fred pppd[1913]: proxy arp: examining interface bond0
Jan 28 14:49:02 fred pppd[1913]: proxy arp: interface addr
141.163.109.250 maskffff
Jan 28 14:49:02 fred pppd[1913]: proxy arp: examining interface eth0
Jan 28 14:49:02 fred pppd[1913]: proxy arp: interface addr
141.163.109.250 maskffff
Jan 28 14:49:02 fred pppd[1913]: proxy arp: examining interface eth1
Jan 28 14:49:02 fred pppd[1913]: proxy arp: examining interface eth2
Jan 28 14:49:02 fred pppd[1913]: proxy arp: examining interface eth3
Jan 28 14:49:02 fred pppd[1913]: proxy arp: examining interface ppp0
Jan 28 14:49:02 fred pppd[1913]: found interface eth0 for proxy arp
Jan 28 14:49:02 fred pppd[1913]: local  IP address 192.168.108.20
Jan 28 14:49:02 fred pppd[1913]: remote IP address 141.163.108.30
============================================================

As can be seen bond0 is determined as suitable, but the code continues
looking. It then sees eth0 and since that is the last one selected it
uses that one.

'ifconfig' shows:

==============================================
bond0     Link encap:Ethernet  HWaddr 00:08:02:E6:57:1B
          inet addr:141.163.109.250  Bcast:141.163.109.255 
Mask:255.255.254.0
          inet6 addr: fe80::208:2ff:fee6:571b/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:66629 errors:0 dropped:0 overruns:0 frame:0
          TX packets:31094 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:5146109 (4.9 MiB)  TX bytes:4948431 (4.7 MiB)

eth0      Link encap:Ethernet  HWaddr 00:08:02:E6:57:1B
          inet addr:141.163.109.250  Bcast:141.163.109.255 
Mask:255.255.254.0
          inet6 addr: fe80::208:2ff:fee6:571b/64 Scope:Link
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:51545 errors:0 dropped:0 overruns:0 frame:0
          TX packets:31087 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:4240959 (4.0 MiB)  TX bytes:4948509 (4.7 MiB)
===================================================

As can be seen the netmask is the same, hence both interfaces are
deemed by pppd to be 'useable'.

Comment 1 John Horne 2005-01-28 15:48:17 UTC
As far as I can see, line 229 of pppd/sys-linux.c should include
IFF_SLAVE. That is:

  #define FLAGS_MASK (IFF_UP          | IFF_BROADCAST | \
                      IFF_POINTOPOINT | IFF_LOOPBACK  | IFF_NOARP |
IFF_SLAVE)

This then causes pppd to skip 'slave' interfaces and use the master
interface. After testing this, pppd correctly used the bond0 interface
and the client had network connectivity.

Comment 2 John Horne 2005-01-28 16:11:28 UTC
A quick test with an FC2 server whilst no-one was using it, showed:

=======================================================
Jan 28 16:06:32 barney pppd[8248]: proxy arp: scanning 7 interfaces
for IP 141.163.106.1
Jan 28 16:06:32 barney pppd[8248]: proxy arp: examining interface lo
Jan 28 16:06:32 barney pppd[8248]: proxy arp: examining interface eth0
Jan 28 16:06:32 barney pppd[8248]: proxy arp: examining interface eth1
Jan 28 16:06:32 barney pppd[8248]: proxy arp: examining interface eth2
Jan 28 16:06:32 barney pppd[8248]: proxy arp: interface addr
141.163.107.250 mask ffff
Jan 28 16:06:32 barney pppd[8248]: found interface to be used eth2
Jan 28 16:06:32 barney pppd[8248]: proxy arp: examining interface eth3
Jan 28 16:06:32 barney pppd[8248]: proxy arp: examining interface bond0
Jan 28 16:06:32 barney pppd[8248]: proxy arp: interface addr
141.163.107.250 mask ffff
Jan 28 16:06:32 barney pppd[8248]: found interface to be used bond0
Jan 28 16:06:32 barney pppd[8248]: proxy arp: examining interface ppp0
Jan 28 16:06:32 barney pppd[8248]: found interface bond0 for proxy arp
====================================================

As can be seen the bond0 interface is not looked at until after the
eth ones. As such the bond0 is selected and the cleints are happy.
Could it be that the ordering (?) of the interfaces has changed in the
kernel somewhere/somehow between FC2 and FC3?

FC2 server has rpms kernel-2.6.10-1.9_FC2 and ppp-2.4.2-3.FC2.1

John.


Comment 3 John Horne 2006-06-19 11:46:22 UTC
I have upgraded one of our VPN/PPTP servers from FC3 to FC5. This problem seems
to have gone away now. I can see that the relevant part of the pppd code is
still the same, but the order of interfaces returned by the kernel may well have
changed. The ppp daemon is reporting that it is using the bonded interface
(bond0) for the clients.

Close this bugzilla report if you wish to.


John.

Comment 4 Matthew Miller 2006-07-10 22:49:56 UTC
Fedora Core 3 is now maintained by the Fedora Legacy project for security
updates only. If this problem is a security issue, please reopen and
reassign to the Fedora Legacy product. If it is not a security issue and
hasn't been resolved in the current FC5 updates or in the FC6 test
release, reopen and change the version to match.

Thank you!


Comment 5 John Horne 2006-07-11 10:00:02 UTC
This seems to be fixed in FC5 (see comment 3). I'll close the bug report.