Bug 1478704 - NetworkManager DSL connection - not always establishing HTTPS connections when packets routed
NetworkManager DSL connection - not always establishing HTTPS connections whe...
Status: NEW
Product: Fedora
Classification: Fedora
Component: NetworkManager (Show other bugs)
26
x86_64 Linux
unspecified Severity medium
: ---
: ---
Assigned To: Lubomir Rintel
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-08-06 05:58 EDT by Corey
Modified: 2017-08-23 01:31 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey 2017-08-06 05:58:29 EDT
Description of problem:

I have a new PC with Fedora 26 Workstation.  2 NICs installed - one for the LAN (via Wifi), the other for the WAN (internet).  The ADSL2+ connection to the ISP on the WAN is configured using a NetworkManager DSL connection (created with nm-connection-editor).  This seems to connect fine to my ISP and creates a ppp0 interface. 

Using the new PCs console, I can use the browser and connect to any website via HTTPS.  All good.

However, from a laptop connected to that new PC over Wifi, most HTTPS websites seem to work, but some *fail* to establish a HTTPS connection and can't be reached.  From using tcpdump and wireshark, it seems to fail the HTTPS handshake at the 'Client Hello' and never completes, resulting in a timeout:

No.     Time           Source                Destination           Protocol Length Info
      1 0.000000       192.168.15.2          210.55.180.35         TCP      78     52207 → 443 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=32 TSval=1154684031 TSecr=0 SACK_PERM=1
      2 0.046218       210.55.180.35         192.168.15.2          TCP      60     443 → 52207 [SYN, ACK] Seq=0 Ack=1 Win=4380 Len=0 MSS=1460
      3 0.046315       192.168.15.2          210.55.180.35         TCP      54     52207 → 443 [ACK] Seq=1 Ack=1 Win=65535 Len=0
      4 0.046785       192.168.15.2          210.55.180.35         TLSv1    154    Client Hello
      6 0.095550       192.168.15.2          210.55.180.35         TCP      154    [TCP Retransmission] 52207 → 443 [PSH, ACK] Seq=1 Ack=1 Win=65535 Len=100
      7 0.143410       210.55.180.35         192.168.15.2          SSL      292    [TCP Previous segment not captured] , Continuation Data
      8 0.143474       192.168.15.2          210.55.180.35         TCP      54     [TCP Dup ACK 3#1] 52207 → 443 [ACK] Seq=101 Ack=1 Win=65535 Len=0
      9 10.203717      210.55.180.35         192.168.15.2          TCP      60     443 → 52207 [RST, ACK] Seq=4619 Ack=101 Win=4480 Len=0
     10 452.268757     192.168.15.2          210.55.180.35         TCP      54     52207 → 443 [FIN, ACK] Seq=101 Ack=1 Win=65535 Len=0
     11 452.599721     192.168.15.2          210.55.180.35         TCP      54     [TCP Retransmission] 52207 → 443 [FIN, ACK] Seq=101 Ack=1 Win=65535 Len=0
     12 453.058927     192.168.15.2          210.55.180.35         TCP      54     [TCP Retransmission] 52207 → 443 [FIN, ACK] Seq=101 Ack=1 Win=65535 Len=0
     13 453.776764     192.168.15.2          210.55.180.35         TCP      54     [TCP Retransmission] 52207 → 443 [FIN, ACK] Seq=101 Ack=1 Win=65535 Len=0
     14 455.010946     192.168.15.2          210.55.180.35         TCP      54     [TCP Retransmission] 52207 → 443 [FIN, ACK] Seq=101 Ack=1 Win=65535 Len=0

It is not 'random' as to which sites don't work in the sense that the websites that can't be connected to via HTTPS consistently don't work (examples below).  Others like https://www.google.com seem to be able to be connected to via HTTPS and work fine.

Its not clear if its a routing issue or perhaps an MTU problem (the ppp0 interface reports MTU=1492 via nmcli, netstat -i, and ifconfig), but makes no sense given some websites work and some don't??


Example websites that can't be connected to:

https://www.quora.com
https://www.lucidchart.com
https://www.asb.co.nz

Doesn't seem to be anything obvious in common with ones that can't be connected to.


Temporary Solution:

Through experimentation, I found a fix by:

1.  Turning off the DSL connection to my ISP from NetworkManager
2.  Installing Roaring Penguin PPPoE (rp-pppoe-3.12-8.fc26.x86_64.rpm) and configuring it
3.  Enabling the interface with: ifup ppp0
4.  The problem HTTPS websites above started working via Wifi.

This is what led me to conclude that there's possibly a problem with NetworkManagers DSL connection / configuration.



Version-Release number of selected component (if applicable):

NetworkManager-1.8.2-1.fc26.x86_64
NetworkManager-ppp-1.8.2-1.fc26.x86_64


How reproducible:

Steps to Reproduce:

1. PC set up with 2 NICs - one to a Wifi Network; one to the WAN (internet)
2. NetworkManager DSL connection to WAN configured via nm-connection-editor, then establish connection.
3. Ensure masquerading on WAN interface / IPv4 packet forwarding set up
4. Connect to Wifi network and browse to one of the example websites above.


Actual results:

Browsing to some websites over HTTPS results in a err_timeout after 20-30s.


Expected results:

HTTPS connection would be established.


Additional info:

- iptables is configured for masquerading, and IPv4 packet forwarding is enabled.  Was originally using firewalld but switched to pure iptables to try and eliminate that as a problem. 
- Everything seems to be routing correctly and no obvious issues with any unexpected packets being dropped (i.e., all dropped packets are being logged and I'm not seeing anything dropped for establishing the connection).  
- Happy to provide any other info required to assist.
Comment 1 Beniamino Galvani 2017-08-08 08:36:51 EDT
What's the MTU of the PPP interface with NM and when using pppoe directly?
Comment 2 Corey 2017-08-09 06:53:06 EDT
The MTU shows as 1492 on the ppp0 interface regardless of whether I'm using RP PPPoE or a DSL connection with NM.

I've checked 'ifconfig ppp0', 'netstat --interfaces=ppp0', and (when using NM) 'nmcli conn show {DSL interface}' and they all show 1492.
Comment 3 Corey 2017-08-23 01:31:32 EDT
This might be a red-herring (it's getting out of the limits of my knowledge about this stuff), but anyway ... I noticed that the roaring-penguin pppoe process runs with a '-m 1412' parameter setting the MSS (Max Segment Size).  From the pppoe man page:

-m MSS Causes  pppoe  to  clamp the TCP maximum segment size at the specified value.  Because of PPPoE overhead, the maximum segment size for PPPoE is smaller than for normal Ethernet encapsulation.  This could cause problems for machines on a LAN behind a gateway using PPPoE.  If you have a LAN behind a gateway, and the gateway connects to the Internet using PPPoE, you are strongly recommended to use a -m 1412 option.  This  avoids  having to set the MTU on all the hosts on the LAN.

This seems to describe what I have here - connecting over a Wifi interface to a linux server that is masquerading the external ppp0 connection out to the internet.

Is it possible that the MSS when using NetworkManager's DSL connection is bigger than it should be?  Is it possible to tell what value is being used?

Note You need to log in before you can comment on or make changes to this bug.