Bug 1153315 - Multiple bugs in NetworkManager-l2tp
Summary: Multiple bugs in NetworkManager-l2tp
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: NetworkManager-l2tp
Version: 20
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Ivan Romanov
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-10-15 18:29 UTC by Alan Stern
Modified: 2015-06-29 22:54 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-06-29 22:54:58 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Kernel stack dump preceding deadlock (4.75 KB, text/plain)
2014-10-15 18:29 UTC, Alan Stern
no flags Details
Log messages from VPN connection attempt (26.30 KB, text/plain)
2014-10-15 18:38 UTC, Alan Stern
no flags Details
Make "ipsec setup restart" work if the service isn't already running (1.27 KB, patch)
2014-10-18 22:12 UTC, Alan Stern
no flags Details | Diff
Add a delay after the ipsec service is started (854 bytes, patch)
2014-10-18 22:15 UTC, Alan Stern
no flags Details | Diff
Make the ipsec.secrets file not world-readable (1.33 KB, patch)
2014-10-18 22:16 UTC, Alan Stern
no flags Details | Diff
Remove an unnecessary command (790 bytes, patch)
2014-10-18 22:18 UTC, Alan Stern
no flags Details | Diff
Keep track of the IPsec connections state (1.61 KB, patch)
2014-10-18 22:19 UTC, Alan Stern
no flags Details | Diff
Fix the parameters passed to the ipsec daemon (2.09 KB, patch)
2014-10-18 22:20 UTC, Alan Stern
no flags Details | Diff

Description Alan Stern 2014-10-15 18:29:16 UTC
Created attachment 947288 [details]
Kernel stack dump preceding deadlock

In brief: I'm trying to set up a VPN connection, using PPP over L2TP over IPSEC to a Cisco ASA server.  The server works fine with Windows and OS-X clients, but not with NetworkManager-l2tp.

I have tracked down multiple reasons for these failures.  Some I have worked around but others are baffling.  I will need help to fix them.

Some information to start:

[stern@saphir ~]$ uname -a
Linux saphir.localdomain 3.16.4-200.fc20.i686 #1 SMP Mon Oct 6 13:22:51 UTC 2014 i686 i686 i386 GNU/Linux
[stern@saphir ~]$ rpm -q NetworkManager-l2tp xl2tpd NetworkManager-openswan libreswan ppp
NetworkManager-l2tp-0.9.8.7-1.fc20.i686
xl2tpd-1.3.6-1.fc20.i686
package NetworkManager-openswan is not installed
libreswan-3.10-3.fc20.i686
ppp-2.4.5-34.fc20.i686

First problem: selinux violations.  This has been reported before in bug #887674 and supposedly was fixed.  Not on my system.  For now I work around this by running "setenforce 0" before starting the VPN connection.  There doesn't seem to be any point in trying to fix this issue until everything else is working.

Second problem: The /usr/libexec/nm-l2tp-service program doesn't start the pluto daemon properly.  Probably because it's not integrated with systemd, but I didn't try to investigate the reason for the failure.  I edited the program's source code and in nm_l2tp_start_ipsec() changed

       "[ \"x$defaultrouteaddr\" = \"x\" ] && ipsec setup restart");
to
       "[ \"x$defaultrouteaddr\" = \"x\" ] && /usr/bin/systemctl restart ipsec.service");
       sleep(2);

The sleep() appears to be necessary because the daemon takes some time to get started.  That worked (and so did starting the ipsec service by hand before turning on the VPN connection).

Third problem: The following line, where the program does

       sys += system("PATH=/usr/local/sbin:/usr/sbin:/sbin ipsec whack"
                       " --listen");

always returns a nonzero value, which messes up the error handling.  I don't know why the return value is nonzero, but I changed the program to ignore the return value instead of adding it into sys.

Fourth problem: When the program creates a temporary /etc/ipsec.secrets file to store the PSK, the file it creates is world-readable!  Even though the file persists for a short time, this is obviously a security breach -- especially when something goes wrong and the temporary file is not erased (which happened to me repeatedly).

Fifth problem: When the program creates the libreswan config file, it does not specify a rightprotoport parameter.  The pluto daemon ends up using 17/0 by default, and the VPN server doesn't like this.  Without a port number, the server doesn't realize that this connection will use L2TP.  I added

       write_config_option (ipsec_fd, "  rightprotoport=17/1701\n");

to the appropriate place in the program source, which resolved this issue.

Sixth problem: After all the previous fixes, the connection is established.  xl2tpd and pppd are started.  But they don't work right.  For example, /var/log/secure contains this entry:

Oct 13 17:11:27 saphir pluto[2182]: ERROR: "nm-ipsec-l2tpd-2573" #1: sendto on wlan0 to 140.247.233.37:4500 failed in NAT-T Keep Alive. Errno 105: No buffer space available

That error 105 comes straight from the kernel, but I don't know the reason.  A possibly related series of errors appears in /var/log/messages, multiple repetitions of:

Oct 13 17:10:28 saphir NetworkManager: xl2tpd[2616]: network_thread: select timeout

followed by:

Oct 13 17:11:32 saphir NetworkManager: xl2tpd[2616]: Maximum retries exceeded for tunnel 33716.  Closing.

I'm at a loss to tell what the reason is for those failures.

Also, as far as I can tell, no data ever gets transmitted through the PPP tunnel.  On numerous attempts, the computer deadlocked.  I was able to capture a stack dump from one of those occasions; it is in the first attachment.  As near as I can interpret it, the routine ppp_channel_push() (near the end of the dump) acquires the pch->downl spinlock and then calls pch->chan->ops->start_xmit(), which indirectly calls down to ppp_push(), which tries to acquire the same spinlock.  This seems to indicate that the kernel tried to route a PPP-encapsulated packet through the PPP tunnel itself, which is clearly wrong.  I don't know what could cause this -- some sort of routing misconfiguration?

I've got detailed logs (probably *too* detailed) showing the setup of the IPSEC tunnel, which seems to work okay.  (And indeed, the client does receive an IP address that's in the VPN server's DHCP range, so some information definitely is getting sent.)  But even though I have enabled debugging for xl2tpd and pppd, the logs don't contain much to indicate what's going wrong.  I'll also attach the relevant portion of /var/log/messages.

Comment 1 Alan Stern 2014-10-15 18:38:22 UTC
Created attachment 947289 [details]
Log messages from VPN connection attempt

This is what appeared in /var/log/messages (with PSK and password redacted) during a recent VPN connection attempt.

Comment 2 Sergey 2014-10-15 20:58:03 UTC
Hi, Alan.

Thank you for your great bugreport. Unfortunately, I, as upstream developer, don't use ipsec part of plugin. ipsec was contributed before I joined this project, so, I really have no competence on it.

But, if you can make a patch, I'll definitely take a look at it and will try to merge.

Comment 3 Alan Stern 2014-10-18 22:09:19 UTC
I've got a series of six (!) patches fixing various aspects of this thing.  With them in place, NetworkManager succeeds in setting up and tearing down the VPN tunnel.  But the tunnel still doesn't work; I will need help to debug the underlying problem.

I will attach the patches to this bug report.  The first one changes a file in the libreswan package, but the others affect only nm-l2tp-service.c.

Comment 4 Alan Stern 2014-10-18 22:12:57 UTC
Created attachment 948191 [details]
Make "ipsec setup restart" work if the service isn't already running

Patch 1: change the "ipsec setup" script.  It's not entirely clear that this is the right thing to do; however there needs to be some way to (1) start the service if it isn't running and (2) restart it if it is.  Currently the "start" command does (1) and the "restart" command does (2), but nothing does both.

Comment 5 Alan Stern 2014-10-18 22:15:00 UTC
Created attachment 948193 [details]
Add a delay after the ipsec service is started

Without this change, nm-l2tp-service tries to connect to the ipsec service before the service's daemon is fully initialized.  I don't know how to determine a good length for the delay, but one second works on my system.

Comment 6 Alan Stern 2014-10-18 22:16:12 UTC
Created attachment 948194 [details]
Make the ipsec.secrets file not world-readable

Patch 3: This one is a no-brainer.

Comment 7 Alan Stern 2014-10-18 22:18:28 UTC
Created attachment 948195 [details]
Remove an unnecessary command

Patch 4: It appears from the name that the person who added cmd11 in the first place knew that it did pretty much the same thing as cmd1.  There's no explanation in the code for why both are present, and it works just as well without cmd11.

Comment 8 Alan Stern 2014-10-18 22:19:24 UTC
Created attachment 948196 [details]
Keep track of the IPsec connections state

Patch 5: another no-brainer.

Comment 9 Alan Stern 2014-10-18 22:20:43 UTC
Created attachment 948197 [details]
Fix the parameters passed to the ipsec daemon

Patch 6: Without the rightprotoport parameter in particular, my server will never accept the VPN connection request.

Comment 10 Sergey 2014-10-28 15:09:58 UTC
(In reply to Alan Stern from comment #3)
> I've got a series of six (!) patches fixing various aspects of this thing. 
> With them in place, NetworkManager succeeds in setting up and tearing down
> the VPN tunnel.  But the tunnel still doesn't work; I will need help to
> debug the underlying problem.
> 
> I will attach the patches to this bug report.  The first one changes a file
> in the libreswan package, but the others affect only nm-l2tp-service.c.

Hi. I've scheduled a task to review and apply your patches in project's bug tracker there: https://github.com/seriyps/NetworkManager-l2tp/issues/25

Comment 11 Alan Stern 2014-10-28 17:47:23 UTC
Okay.  I don't know how the upstream developers for libreswan would react to proposed patch #1 (which isn't in the right format for a source-package change anyway).  They might think the way things work currently is preferable, in which case nm-l2tp-service would have to work around the problem (for example, by stopping ipsec and then starting it instead of simply doing a restart).

I figured out the reason why my VPN connection didn't work even after all these changes.  It turned out to be a configuration problem in the VPN server, which I was able to work around -- it has nothing to do with NetworkManager.  Which is good, because it means that issue is irrelevant to this bug report.

On the other hand, I haven't yet tried to investigate the selinux violations.  I'll add more information about that when I have a chance.

Comment 12 Fedora End Of Life 2015-05-29 13:05:35 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 13 Fedora End Of Life 2015-06-29 22:54:58 UTC
Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.