Bug 1634286

Summary:	spontaneous arp poisoning of other hosts when connecting multiple interfaces on the same network
Product:	[Fedora] Fedora	Reporter:	Joel Stienlet <j.stienlet>
Component:	NetworkManager	Assignee:	Lubomir Rintel <lkundrak>
Status:	CLOSED EOL	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	29	CC:	alexl, bgalvani, dcbw, fgiudici, frederickkaempfer, john.j5live, j.stienlet, laine, lkundrak, mclasen, rhughes, rstrode, sandmann, thaller
Target Milestone:	---	Keywords:	Reopened
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-11-27 19:58:39 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Joel Stienlet 2018-09-29 15:04:32 UTC

Hello,

I got a strange bug recently, and I think the problem may come from networkmanager. I now set up my network manually and everything is fine FOR ME. But some other people may still experience problems, hence this bug report.

First of all, the description of my problem: 
I use ssh with public key auth (client: 192.168.1.19  server: 192.168.1.16). 
After some inactivity, ssh is slow to start a connexion (~5 sec or more on my gigabit ethernet LAN). Any subsequent connexion made within 5 minutes took a normal delay (0.5 sec). I would expect ssh to always take 0.5 sec to establish its connexion, never 5 sec.

The bug is fairly reproducible (>80% of the tests).

Steps I took to find out where the bug comes from:
-> on client (.19): "ssh -vvv" was halting at "debug1: Connecting to 192.168.1.16 [192.168.1.16] port 22" => this was not that informative, apparently something was wrong in the TCP/IP/eth chain...
-> I plugged a raspberry pi with a second ethernet port (usb<->eth adapter + pi's native eth connector) configured as a bridge between the ssh client (.19) and the ssh server (.16), and ran wireshark. It showed some strange packets:
[TCP Retransmission] 22->43310 
[TCP Spurious Retransmission] 43310-> 22
etc... (7 such packets highlighted by wireshark (default settings))
When the connexion establishes quickly (~0.5 seconds instead of 5 seconds), no packet is highlighted.
-> I plugged an USB<->eth adapter on the ssh client (.19) to exclude a kernel bug in the r8169 driver. As ssh was still slow after some inactivity, I excluded a kernel bug in the driver.
-> I also tested ssh from a VM (arch kernel 4.18.5 installed on 192.168.1.19, sharing its network via NAT): ssh to 192.168.1.16 was slow too.
-> ssh to 192.168.1.16 is fast from any other machine.
-> I tried to disable networkmanager and configure the network settings manually:

$sudo systemctl disable NetworkManager.service 
$sudo shutdown -r now
$sudo ifconfig enp5s0 up 192.168.1.19 netmask 255.255.255.0 
$route add default gw 192.168.1.1

now every ssh connexion is reliably fast.

=> My conclusion is that this bug comes from networkmanager, but I haven't investigated that any further.

Comment 1 Thomas Haller 2018-10-04 07:26:18 UTC

Sounds like sshd tries to resolve the DNS name of the client, which takes a while before it times out. Either your DNS is configured wrongly on the server, or you should configure sshd not do to that.

Comment 2 Joel Stienlet 2018-10-04 10:21:25 UTC

Hello Thomas,
Thank you for your reply.

=> Indeed this was one hypothesis when debugging this, so I already disabled reverse DNS lookup on the server (at least I think so, I copied my sshd_config and /etc/sysconfig/sshd below for double-check if needed).

=> I did an additional test this morning: with NetworkManager still disabled, I ran dhclient manually on the ssh client (.1.19) after reboot (to always start from a well-known initial state):
  $dhclient enp5s0
The result is: my ssh is fast, just as fast as it was when I simply did the manual ifconfig:
  $ifconfig enp5s0 up 192.168.1.19 netmask 255.255.255.0
  $route add default gw 192.168.1.1
Note that the dhcp server has ever been configured to give 192.168.1.19 as IP to this machine, so the IP of the ssh client doesn't change between manual dhcp, manual ifconfig, or NetworkManager dhcp.
This test should have made everything equal on the dhcp server side (now the dhcp server knows about 192.168.1.19, if the query is somehow memorized(?)). So any reverse DNS query from the ssh server (.1.16) would get the same answer from the dhcp server now (just as if it were NetworkManager on the ssh client .1.19 that had done the DNS request).

=> The question is also: why would a bad configuration on the ssh server side have made a difference when I upgraded the ssh client from F28 to F29? (the server version and configuration didn't change)

=> When I connect from other machines on the LAN (who are not under F29), the connection is also fast. If the problem were ssh-server side, I would expect them to be slow too.

=> A failed reverse DNS lookup from the ssh server would not (at least should not), IMHO, explain these spurious packets from the TCP protocol on port 22 evidenced by wireshark.

=> What I will do now: enable NetworkManager again, and keep an eye on the logs to see if *something* happens after some idle time.

Thanks for your efforts and advice in solving this bug.
Best regards.

------------------------------------------------------------

$cat /etc/sysconfig/sshd

# Configuration file for the sshd service.

# The server keys are automatically generated if they are missing.
# To change the automatic creation, adjust sshd.service options for
# example using  systemctl enable sshd-keygen  to allow creation
# of DSA key or  systemctl mask sshd-keygen  to disable RSA key
# creation.

# Do not change this option unless you have hardware random
# generator and you REALLY know what you are doing

SSH_USE_STRONG_RNG=0
# SSH_USE_STRONG_RNG=1

# System-wide crypto policy:
# To opt-out, uncomment the following line
# CRYPTO_POLICY=
OPTIONS=-u0

------------------------------------------------------------

$cat /etc/ssh/sshd_config
sudo cat sshd_config 

#	$OpenBSD: sshd_config,v 1.101 2017/03/14 07:19:07 djm Exp $

# This is the sshd server system-wide configuration file.  See
# sshd_config(5) for more information.

# This sshd was compiled with PATH=/usr/local/bin:/usr/bin

# The strategy used for options in the default sshd_config shipped with
# OpenSSH is to specify options with their default value where
# possible, but leave them commented.  Uncommented options override the
# default value.

# If you want to change the port on a SELinux system, you have to tell
# SELinux about this change.
# semanage port -a -t ssh_port_t -p tcp #PORTNUMBER
#
#Port 22
#AddressFamily any
#ListenAddress 0.0.0.0
#ListenAddress ::

#HostKey /etc/ssh/ssh_host_rsa_key
##HostKey /etc/ssh/ssh_host_dsa_key
#HostKey /etc/ssh/ssh_host_ecdsa_key
HostKey /etc/ssh/ssh_host_ed25519_key

# Ciphers and keying
#RekeyLimit default none

# System-wide Crypto policy:
# If this system is following system-wide crypto policy, the changes to
# Ciphers, MACs, KexAlgoritms and GSSAPIKexAlgorithsm will not have any
# effect here. They will be overridden by command-line options passed on
# the server start up.
# To opt out, uncomment a line with redefinition of  CRYPTO_POLICY=
# variable in  /etc/sysconfig/sshd  to overwrite the policy.
# For more information, see manual page for update-crypto-policies(8).

# Logging
#SyslogFacility AUTH
SyslogFacility AUTHPRIV
#LogLevel INFO

# Authentication:

#LoginGraceTime 2m
PermitRootLogin yes
#StrictModes yes
#MaxAuthTries 6
#MaxSessions 10

#PubkeyAuthentication yes

# The default is to check both .ssh/authorized_keys and .ssh/authorized_keys2
# but this is overridden so installations will only check .ssh/authorized_keys
AuthorizedKeysFile	.ssh/authorized_keys

#AuthorizedPrincipalsFile none

#AuthorizedKeysCommand none
#AuthorizedKeysCommandUser nobody

# For this to work you will also need host keys in /etc/ssh/ssh_known_hosts
HostbasedAuthentication no
# Change to yes if you don't trust ~/.ssh/known_hosts for
# HostbasedAuthentication
#IgnoreUserKnownHosts no
# Don't read the user's ~/.rhosts and ~/.shosts files
IgnoreRhosts yes

# To disable tunneled clear text passwords, change to no here!
#PasswordAuthentication yes
#PermitEmptyPasswords no
PasswordAuthentication no

# Change to no to disable s/key passwords
#ChallengeResponseAuthentication yes
ChallengeResponseAuthentication no

# Kerberos options
KerberosAuthentication no
#KerberosOrLocalPasswd yes
#KerberosTicketCleanup yes
#KerberosGetAFSToken no
#KerberosUseKuserok yes

# GSSAPI options
GSSAPIAuthentication no
#GSSAPICleanupCredentials no
#GSSAPIStrictAcceptorCheck yes
#GSSAPIKeyExchange no
#GSSAPIEnablek5users no

# Set this to 'yes' to enable PAM authentication, account processing,
# and session processing. If this is enabled, PAM authentication will
# be allowed through the ChallengeResponseAuthentication and
# PasswordAuthentication.  Depending on your PAM configuration,
# PAM authentication via ChallengeResponseAuthentication may bypass
# the setting of "PermitRootLogin without-password".
# If you just want the PAM account and session checks to run without
# PAM authentication, then enable this but set PasswordAuthentication
# and ChallengeResponseAuthentication to 'no'.
# WARNING: 'UsePAM no' is not supported in Fedora and may cause several
# problems.
UsePAM no

#AllowAgentForwarding yes
#AllowTcpForwarding yes
#GatewayPorts no
X11Forwarding yes
#X11DisplayOffset 10
#X11UseLocalhost yes
#PermitTTY yes
#PrintMotd yes
#PrintLastLog yes
#TCPKeepAlive yes
#UseLogin no
#PermitUserEnvironment no
#Compression delayed
#ClientAliveInterval 0
#ClientAliveCountMax 3
#ShowPatchLevel no
UseDNS no
#PidFile /var/run/sshd.pid
#MaxStartups 10:30:100
#PermitTunnel no
#ChrootDirectory none
#VersionAddendum none

# no default banner path
#Banner none

# Accept locale-related environment variables
AcceptEnv LANG LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE LC_MONETARY LC_MESSAGES
AcceptEnv LC_PAPER LC_NAME LC_ADDRESS LC_TELEPHONE LC_MEASUREMENT
AcceptEnv LC_IDENTIFICATION LC_ALL LANGUAGE
AcceptEnv XMODIFIERS

# override default of no subsystems
Subsystem	sftp	/usr/libexec/openssh/sftp-server

# Example of overriding settings on a per-user basis
#Match User anoncvs
#	X11Forwarding no
#	AllowTcpForwarding no
#	PermitTTY no
#	ForceCommand cvs server

Comment 3 Joel Stienlet 2018-10-04 11:49:23 UTC

I've tried to look at the logs of networkmanager but they didn't make me any wiser. Anyway, I'll share the results:

I edited /etc/NetworkManager/NetworkManager.conf to set:
[logging]
level=TRACE

what I did:
reboot
$ssh 192.168.1.16 echo 1234    ---> was fast
$journalctl -b -u NetworkManager.service --no-pager > before_waiting
wait some time... (~ 10 minutes)
$journalctl -b -u NetworkManager.service --no-pager > after_waiting
$ssh 192.168.1.16 echo 1234  ---> slow, wireshark showing bad TCP packets

$diff after_waiting before_waiting | grep enp5s0

This is the result (as I said, didn't see anything useful here...):

< Oct 04 13:24:23 work_pc NetworkManager[2044]: <trace> [1538652263.9692] device[0x55dea3b51310] (enp5s0): connectivity: periodic-check: re-scheduled in 299996 milliseconds (300 seconds interval)
< Oct 04 13:24:23 work_pc NetworkManager[2044]: <trace> [1538652263.9692] device[0x55dea3b51310] (enp5s0): connectivity: start check (seq:32, periodic-check)
< Oct 04 13:24:23 work_pc NetworkManager[2044]: <debug> [1538652263.9693] connectivity: (enp5s0) start request to 'http://fedoraproject.org/static/hotspot.txt'
< Oct 04 13:24:24 work_pc NetworkManager[2044]: <debug> [1538652264.3435] connectivity: (enp5s0) check completed: FULL; expected response
< Oct 04 13:24:24 work_pc NetworkManager[2044]: <trace> [1538652264.3435] device[0x55dea3b51310] (enp5s0): connectivity: complete check (seq:32, state:FULL)
< Oct 04 13:24:24 work_pc NetworkManager[2044]: <trace> [1538652264.3435] device[0x55dea3b51310] (enp5s0): connectivity: periodic-check: re-scheduled in 299621 milliseconds (300 seconds interval)
< Oct 04 13:29:23 work_pc NetworkManager[2044]: <trace> [1538652563.9692] device[0x55dea3b51310] (enp5s0): connectivity: periodic-check: re-scheduled in 299996 milliseconds (300 seconds interval)
< Oct 04 13:29:23 work_pc NetworkManager[2044]: <trace> [1538652563.9693] device[0x55dea3b51310] (enp5s0): connectivity: start check (seq:36, periodic-check)
< Oct 04 13:29:23 work_pc NetworkManager[2044]: <debug> [1538652563.9693] connectivity: (enp5s0) start request to 'http://fedoraproject.org/static/hotspot.txt'
< Oct 04 13:29:24 work_pc NetworkManager[2044]: <debug> [1538652564.3936] connectivity: (enp5s0) check completed: FULL; expected response
< Oct 04 13:29:24 work_pc NetworkManager[2044]: <trace> [1538652564.3937] device[0x55dea3b51310] (enp5s0): connectivity: complete check (seq:36, state:FULL)
< Oct 04 13:29:24 work_pc NetworkManager[2044]: <trace> [1538652564.3937] device[0x55dea3b51310] (enp5s0): connectivity: periodic-check: re-scheduled in 299571 milliseconds (300 seconds interval)

Comment 4 Thomas Haller 2018-10-04 14:19:33 UTC

When it comes to aspects how connectivity (like here), then NetworkManager is only the component that configures you networking.

Meaning for example, if DNS does not resolve, then NetworkManager (possibly on user request) configured DNS wrongly. How applications resolve names via DNS has no relevant to NetworkManager, only with the configuration which NetworkManager produced.


Since you have a problem with using SSH, it would be helpful to find out what is wrongly configured (and whether NetworkManager did something wrong or it was a misconfiguration).


If you look at NM logs, you only see what NM is doing (in a verbose, confusing way). But there are much more direct ways to see the current configuration, like `ip route`, `ip addr`, /etc/resolv.conf, dnsmasq/systemd-resolved, sshd_config, etc.

If you find which ill-configuration causes the issues, you can look how to avoid that.

Comment 5 Joel Stienlet 2018-10-05 14:08:51 UTC

Hello Thomas,

I've investigated this further, and now I've a much clearer idea of where the problem comes from!

=> First of all, I've written a small (~10 lines) test program in C to perform a simple connect() call on 192.168.1.16:22 (source code copied below). This test program is sufficient to reproduce the bug (connect() hangs for more than 5 seconds). 
It learns us that the ssh protocol has obviously nothing to do with the bug, and that all the attention should be focused on the TCP/IP/Eth layers.

=> Now some more information about my configuration:
I have two ethernet inferfaces, enp4s0 and enp5s0.
- enp4s0 is reserved for the exclusive use of a virtual machine which, when started, is given full control of this physical ethernet port (VM managed with "Virtual Machine Manager" 1.6.0 using libvirt). This VM with direct access to enp4s0 remained down during all the tests I made since I reported this bug.
- enp5s0 is the ethernet port that gives network access to F29 (192.168.1.19).
Both interfaces are physically connected to my LAN on the same switch.

=> When I unplug the physical cable from enp4s0, everything works fine (ssh always connects in < 600ms when NetworkManager is configuring my network).

=> Now, with the cable plugged in, the gnome graphical interface of NetworkManager says "Ethernet (enp4s0) Off", and when I click on "Wired Settings" it is indeed configured as "Off".
but:
cat /sys/class/net/enp4s0/operstate 
up

I conclude from this that the interface is up from the kernel ethernet layer's point of view, and $ifconfig shows that no IP is assigned to it:

enp4s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether c4:6e:1f:04:51:6b  txqueuelen 1000  (Ethernet)
        RX packets 1837  bytes 138022 (134.7 KiB)
        RX errors 0  dropped 1207  overruns 0  frame 0
        TX packets 195  bytes 8190 (7.9 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Now one problem: it seems that, although I ask (through the GUI) to keep enp4s0 down, it is still brought up somehow! Now the question is: why?
Do I have to edit /etc/sysconfig/network-scripts/ifcfg-enp4s0  and set the interface as not managed by NetworkManager? It would be easier if NetworkManager had an option to completely disable an interface.

(Obviously when I disable NetworkManager and only enable enp5s0 manually using ifconfig, enp4s0 remains down.)

=> Now something I don't understand:

$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.1.1     0.0.0.0         UG    100    0        0 enp5s0
192.168.1.0     0.0.0.0         255.255.255.0   U     100    0        0 enp5s0
192.168.122.0   0.0.0.0         255.255.255.0   U     0      0        0 virbr0

The routing table seems pretty clear: all network traffic on my LAN 192.168.1.0/24 should go through enp5s0, correct?

---> So: even if enp4s0 is up 
(which I think shouldn't be the case in the first place, but let's assume we put this up without assigning it an IP, which is the case when NM is enabled), 
why do some data go through enp4s0 when performing a TCP/IP connect() to 192.168.1.0/24? (same problem confirmed when connecting to other machines on the same LAN)
(I haven't put my wireshark bridge stuff between that physical port to actually prove that data are transmitted through enp4s0 -> TODO!)
Note that ifconfig shows that enp4s0 is NOT given an IP4 address (nor an IP6 addr), so that enp4s0 can only (on theory) mess with the ethernet layer, and should not interfere with a connect() call on the TCP layer.
Anyway, it looks like the kernel is disturbed by the simultaneous presence of both enp4s0 and enp5s0 on the same LAN...

Sorry for the long posts! I've tried to be as precise as possible.
Best regards

--------------------------------------------------------------

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#define PORT 22
#define IP "192.168.1.16"
int main()
{
struct sockaddr_in addr;
int fd;
  fd = socket(AF_INET, SOCK_STREAM, 0);
  if(fd == -1) {printf("Error opening socket\n");return -1;}
  addr.sin_port = htons(PORT);
  addr.sin_family = AF_INET;
  inet_pton(AF_INET, IP, &(addr.sin_addr));
if( connect(fd, (struct sockaddr *)&addr, sizeof(addr)) < 0){
       printf("\n Error : Connect Failed \n"); close(fd); return 1;} 
printf("Successfully connected to port %u\n", PORT);
close(fd);
return 0;
}

Comment 6 Thomas Haller 2018-10-05 15:09:32 UTC

> but:
> cat /sys/class/net/enp4s0/operstate 
> up

Disconnecting a device with NetworkManager does not mean that it goes "operstate" down. Instead, it's "logically" disconnect (as reported by `nmcli device`), which in particular means it has no IP configuration. This is expected.

> why do some data go through enp4s0 when performing a TCP/IP connect() to 
> 192.168.1.0/24? (same problem confirmed when connecting to other machines on 
> the same LAN)

Why do you think that's the case? Did you run tcpdump to capture packets on enp4s0?

> Anyway, it looks like the kernel is disturbed by the simultaneous presence of
> both enp4s0 and enp5s0 on the same LAN...

Maybe. But that would be rather unexpected and if that's the case, many people would have issues. What lets you conclude that? Because unplugging the cable avoids the issue? 
Did you bridge/bond these interfaces in any way? Did you assign the same MAC addresses to both?

Comment 7 Joel Stienlet 2018-10-05 16:52:50 UTC

Hello again Thomas,

Thank you for the explanation of the expected behaviour of networkmanager. At this point it seems clear that neither networkmanager nor ssh are related to the issue (and the title of the report is thus completely wrong, this should be corrected later I guess).

> Why do you think that's the case? Did you run tcpdump to capture packets on enp4s0?

Sorry for this confusing sentence and the lack of tests and precision here. I've run wireshark now to capture everything on enp4s0 (title says "Capturing from enp4s0", so I trust him on this).

I've a continuous stream of incoming ethernet broadcast packets (ARP, DHCP,....), and that is expected for an interface in the "up" state connected to a switch.

-> To write this reply, I did once:

$  ping -c 1 192.168.1.16

I got in wireshark, listening on enp4s0:

N°    Time            Source         Destination    Protocol Length  Info
618	1035.896651831	192.168.1.19	192.168.1.16	ICMP	 98      Echo (ping) request  id=0x188c, seq=1/256, ttl=64 (reply in 619)
619	1035.897464485	192.168.1.16	192.168.1.19	ICMP	 98      Echo (ping) reply    id=0x188c, seq=1/256, ttl=64 (request in 618)

These ICMP packets should IMHO not exist when the interface enp4s0 has no IP configured (that interface could even be on the wrong LAN!). Correct?

Another thing I've seen after some minutes of observation (packets highlighted this time):

715	1208.302865399	172.217.19.229	192.168.1.19	TCP	66	443 → 48856 [FIN, ACK] Seq=1 Ack=1 Win=265 Len=0 TSval=2560461381 TSecr=3169558157
716	1208.536734716	172.217.19.229	192.168.1.19	TCP	66	[TCP Retransmission] 443 → 48856 [FIN, ACK] Seq=1 Ack=2 Win=265 Len=0 TSval=2560461615 TSecr=3169558158
etc.. (other packets highlighted)

seeing these TCP protocol packets on enp4s0 is weird.

> Did you bridge/bond these interfaces in any way? 
- from the software point of view on .1.19: not that I'm aware of:

[stienlet@work_pc ~]$ brctl show
bridge name	bridge id		STP enabled	interfaces
virbr0		8000.52540097e96d	yes		virbr0-nic

- from an hardware point of view: both have the other end of their RJ45 plugged in the same switch (as intended, to give both .1.19 and the VM access to the LAN (VM turned off since the beginning)).

>Did you assign the same MAC addresses to both?
That would have been a terrible mistake indeed! But this is not the case (ether different in both cases):

[stienlet@work_pc ~]$ ifconfig
enp4s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether c4:6e:1f:04:51:6b  txqueuelen 1000  (Ethernet)
        RX packets 4672  bytes 376052 (367.2 KiB)
        RX errors 0  dropped 2580  overruns 0  frame 0
        TX packets 416  bytes 17472 (17.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp5s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.19  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::e269:95ff:fec6:da4d  prefixlen 64  scopeid 0x20<link>
        ether e0:69:95:c6:da:4d  txqueuelen 1000  (Ethernet)
        RX packets 52642  bytes 40813715 (38.9 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 40420  bytes 6068906 (5.7 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

> What lets you conclude that? Because unplugging the cable avoids the issue? 
- plugging that cable on enp4s0 and unplugging it are both legitimate states of my network, I expect the network stack to behave well in both cases, so yes, this is one argument, the other 3 arguments beeing:
arg 1: I don't see anything wrong with my routes:
[stienlet@work_pc ~]$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.1.1     0.0.0.0         UG    100    0        0 enp5s0
192.168.1.0     0.0.0.0         255.255.255.0   U     100    0        0 enp5s0
192.168.122.0   0.0.0.0         255.255.255.0   U     0      0        0 virbr0

arg 2: arp tables also look fine to me (although containing so many entries?!):
$ arp -n
Address                  HWtype  HWaddress           Flags Mask            Iface
152.19.134.142                   (incomplete)                              enp4s0
209.132.181.15                   (incomplete)                              virbr0
8.43.85.67                       (incomplete)                              enp4s0
209.132.181.16                   (incomplete)                              virbr0
192.168.1.1              ether   b8:26:6c:0d:69:fc   C                     enp5s0
85.236.55.6                      (incomplete)                              virbr0
192.168.1.16             ether   00:07:32:42:b6:11   C                     enp5s0
152.19.134.142                   (incomplete)                              virbr0
8.43.85.67                       (incomplete)                              virbr0
209.132.190.2                    (incomplete)                              enp4s0
209.132.181.15                   (incomplete)                              enp4s0
152.19.134.198                   (incomplete)                              enp4s0
152.19.134.198                   (incomplete)                              virbr0
209.132.181.16                   (incomplete)                              enp4s0
67.219.144.68                    (incomplete)                              virbr0
85.236.55.6                      (incomplete)                              enp4s0
67.219.144.68                    (incomplete)                              enp4s0
185.141.165.254                  (incomplete)                              virbr0
140.211.169.206                  (incomplete)                              virbr0
185.141.165.254                  (incomplete)                              enp4s0
209.132.190.2                    (incomplete)                              virbr0
140.211.169.206                  (incomplete)                              enp4s0

I'm not aware of anything else that would be required by the IP stack to work normally?
If everything is well configured, it should work in principle.

arg 3: My C code showing that connect() fails on establishing a TCP socket (by fail I mean: is abnormally slow). Apart from glibc, I don't think many software layers remain between my C and the kernel, and this connect() fails, so you can understand I become more suspicious here.

I thank you again for the time spent on this, and I sincerely hope that this is a real bug, not just a stupid mistake or misconception from my part.
Have a nice week-end,
Best regards.

Comment 8 Joel Stienlet 2018-10-06 12:11:32 UTC

Some good news: the bug is perfectly reproducible, because he also affects the live CD:
Fedora-Workstation-Live-x86_64-29_Beta-1.5.iso
sha256sum: f2418303844c0d129e80b5527bd7e81bf519466fc7c5ef8236c7ebe75f7871e8  

I've tried both the live CD of fedora 28 and fedora 29 on an Upboard² (who has 2 native eternet ports), see screenshots in my google drive (link below).

=> livecd fedora 28 did NOT have this bug, connection times were consistently low (the time to type my password ~5s, I did my best to type it always at the same speed)
This is coherent with the fact that I never experienced this bug on .1.19 before the upgrade to F29.

=> livecd fedora 29 does have this bug, as you can see the connexion time jumps from ~5s to ~10s (always including these ~5s to type my password)

Screenshots in my google drive:
https://drive.google.com/open?id=1Ev0T2eFnFZ85Sv-MLO716U9tIsBZB6IU

Comment 9 Joel Stienlet 2018-10-07 17:58:36 UTC

I've added a small capture from wireshark in my google-drive, capturing simultaneously from enp4s0 and enp5s0 during a simple connect() (test in C, source above).
The capture shows that at the very beginning of the TCP handshake, .1.19 sends the same SYN packet twice, once from enp4s0 and once from enp5s0.
at   0.000000000 from enp4s0
and -0.000043091 from enp5s0

The expected behaviour is that only one TCP packet is sent, either from enp4s0 or (exclusive) from enp5s0, then the kernel should wait for the corresponding ACK.

Comment 10 Joel Stienlet 2018-10-10 09:15:32 UTC

Some quick update:
I've inserted some printk() in the kernel.
When the ACK from .1.16 is received, ip_rcv() in ip_input.c gets called, and this line gets executed:
return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING,
		       net, NULL, skb, dev, NULL,
		       ip_rcv_finish);
BUT:
tcp_v4_do_rcv() in tcp_ipv4.c is never called. Looks like the ACK packet is discarded after the IP layer, and never reaches the TCP layer.

Comment 11 Joel Stienlet 2018-10-10 21:54:00 UTC

Hello Thomas, 
I think I've finally understood where the bug comes from.

As previously, both interfaces are configured with networkmanager:
enp4s0 c4:6e:1f:04:51:6b is "up", but not assigned any IP (and no VM using it).
enp5s0 e0:69:95:c6:da:4d is 192.168.1.19

Many arp requests originate from enp4s0, sent as .1.19, as you can see on this screenshot taken from another computer on the LAN (192.168.1.106):

https://drive.google.com/open?id=1_afgKrWohIncOM1GQYc9emZZVkgvADO-

When these arp requests are received, they poison the arp caches of all PCs on the LAN, here the example from 192.168.1.106:

Before poisoning, the .1.19 entry has the correct eth address:
[stienlet@localhost ~]$ arp -n
Address                  HWtype  HWaddress           Flags Mask            Iface
192.168.1.82             ether   00:50:b6:08:42:14   C                     veth_prt_1
192.168.1.51             ether   b0:d5:cc:43:69:b3   C                     veth_prt_1
192.168.1.83             ether   00:07:32:4e:c2:f9   C                     veth_prt_1
192.168.1.106            ether   90:fb:a6:a2:66:55   C                     veth_prt_1
192.168.1.19             ether   e0:69:95:c6:da:4d   C                     veth_prt_1
192.168.1.1              ether   b8:26:6c:0d:69:fc   C                     veth_prt_1
192.168.1.116            ether   c4:6e:1f:04:51:6b   C                     veth_prt_1
192.168.1.109            ether   54:e4:3a:32:80:f9   C                     veth_prt_1
192.168.1.80             ether   00:07:32:4e:c2:f9   C                     veth_prt_1
After some time, the arp cache gets poisoned:
[stienlet@localhost ~]$ arp -n
Address                  HWtype  HWaddress           Flags Mask            Iface
192.168.1.82             ether   00:50:b6:08:42:14   C                     veth_prt_1
192.168.1.51             ether   b0:d5:cc:43:69:b3   C                     veth_prt_1
192.168.1.83             ether   00:07:32:4e:c2:f9   C                     veth_prt_1
192.168.1.106            ether   90:fb:a6:a2:66:55   C                     veth_prt_1
192.168.1.19             ether   c4:6e:1f:04:51:6b   C                     veth_prt_1
192.168.1.1              ether   b8:26:6c:0d:69:fc   C                     veth_prt_1
192.168.1.116            ether   c4:6e:1f:04:51:6b   C                     veth_prt_1
192.168.1.109            ether   54:e4:3a:32:80:f9   C                     veth_prt_1
192.168.1.80             ether   00:07:32:4e:c2:f9   C                     veth_prt_1

Then, when a TCP connection is initiated from .1.19, the ACK from the remote host (ex: .1.16) is sent back to the wrong ethernet port (enp4s0) because the arp cache was poisoned, and the IP4 stack considers it as a bad "martian" packet:
__fib_validate_source() in fib_frontend.c : return -EXDEV;

Now when I run:
$ ip link set dev enp4s0 arp off
on .1.19, the connect() call is always fast, which is coherent with this arp poisoning scenario.

I think the arp system in the kernel should be modified to send these arp requests only from the interface who has the sender's IP assigned to it. 
What do you think about it?

Comment 12 Joel Stienlet 2018-10-13 14:43:31 UTC

Final solution to the problem:
-> Edit /etc/sysctl.conf and add:
net.ipv4.neigh.default.mcast_solicit=0
-> in /etc/NetworkManager/dispatcher.d/ create a file "99_script_mcast.sh":
#!/bin/bash
if [ "$2" = "up" ]; then
    sysctl -w net.ipv4.neigh.$1.mcast_solicit=3
fi
(to restore the default value when the interface goes up)

Note that changing:
 net.ipv4.conf.all.arp_filter=1
 net.ipv4.conf.all.arp_announce=2
 net.ipv4.conf.default.arp_filter=1
 net.ipv4.conf.default.arp_announce=2
did not work, only mcast_solicit=0 had any effect on enp4s0 in the "Off" state.

Best regards

Comment 13 Thomas Haller 2018-10-13 15:30:42 UTC

Thank you for persisting and identifying the cause of the issue.

It is a problem (of NetworkManager), and probably something needs to improve here significantly. Just unclear how exactly. I reopen the issue...

Comment 14 Laine Stump 2018-11-08 18:15:03 UTC

I am experiencing what I believe is the same problem.

I notice from the screenshot you referenced in Comment 11 that the ARP requests were not only for IPs not directly connected to the interface in question, but for IPs that aren't on *any* locally connected network, and instead are for an address that should end up being routed *not* ARPed. This is exactly what I'm seeing. These spurious ARPs come along every 292 seconds, and end up poisoning the ARP cache on the machines on one subnet making IPv4 connectivity cease (due to incorrect ARP cache entries) for 42 seconds.

The details of my setup:

I have a machine with 2 NICS, 3 physical ethernet connections:

1) an integrated NIC, p14p1, on 10.xx.0.10/24.

2) an Intel 82576 SRIOV-capable dual port NIC that has 7 VFs on each of the 2 PFs:

  A) p4p1 is connected to a separate ethernet segment and statically
     assigned 10.xx.3.10/24

  B) p4p2 is connected to the same segment as p14p1 (the 10.xx.0.0/24 subnet),
     but is statically configured with the IP 10.xx.24.1/24 (i.e. it's set
     for a different subnet). I originally had it configured with *no* IP
     address, but gave it an IP to try and deduce what was causing my problem)

As I said, each of the PFs has 7 VFs, and those VFs have no IP addresses configured. The VFs are used by virtual machines, either via vfio device assignment or macvtap passthrough mode; during the testing, there were no virtual machines running, so the VFs were sitting idle.

Description of problem:

After upgrading from F27 to F29, I began experiencing a 42 second outage of IPv4 on one subnet every 292 seconds. After a lot of wrong turns and frustration (all part of the excitement of pursuit, right?) I found the same thing that Joel found - periodically I see a flurry of outgoing ARP packets on *every interface that has no assigned IP address*, with the "tell" IP being that of the interface on 10.xx.0.0/24 that *does* have an IP, e.g.:

32:3b:5d:ba:4c:79 > Broadcast
    ARP Request who-has 140.211.169.196 tell 10.xx.0.10

(32:3b:.... is the MAC address of one of the VFs).

A couple important points:

* The IP whose info is being requested is not on any locally connected subnet - all of the IPs I checked out ended up being parts of the fedoraproject.org infrastructure. According to the routine table, there should be no ARP going out *at all* (unless the IP of the default route has coincidentally disappeared from the ARP cache, instead the packet should just be sent to the MAC of the default route IP.


* All of the ARP requests have 10.xx.0.10 as the "tell" address, even though the host has multiple IPs, and even for the VFs that are physically connected to the 10.xx.3.0/24 segment (it makes sense that the latter would make no difference, since it's only me that knows which interface is on which physical segment).

So, since the VFs associated with p4p2 are physically connected to the same ethernet segment as p14p1 (the interface used to communicate on the 10.xx.0.0/24 subnet), this results in the ARP cache on other machines on 10.xx.0.0/24 being poisoned with the MAC address of one of the VFs, so all communication with the test machine on that subnet ceases (until some other traffic or a timeout resets the ARP cache to the correct MAC) because the packets are being sent to the VF's MAC, and the VF has no IP address.

I've verified that it is the lack of an IP on an interface that makes it a candidate for these ARPs - as I said before I originally had no IP set for p4p2 itself (the PF), and an ARP request was being sent out with p4p2's MAC (in addition to 7 others on that segment, one for each VF, as well as 7 ARPs being sent out on p4p1 (again, with a "tell" IP of 10.xx.0.10, which in this case caused no problem since no machines are listening for 10.xx.0.0/24 traffic on that segment). As soon as I assigned a static IP to p4p2, I stopped seeing ARPs from p4p2's MAC, but continued seeing them from the VFs.

I've also verified that the spurious ARPs are somehow caused by NM, not just the kernel - as soon as I run "systemctl stop NetworkManager.service" the spurious ARPs stop; I've kept it in this state for several *hours* and not seen a single bad ARP.

Another thing I was asked on IRC when describing the problem: after stopping NeworkManager.service, the VFs *are* still IFF_UP - nothing about their state changes, but the ARPs stop.

Additionally, I setup three shell windows constantly running "ip l", "ip r", and "ip n" in a loop, saving the output to a temporary file, and running a diff with the previous output of that same command, and there is *no change* in the output of any of those commands leading up to the spurious ARPs.

This is my Fedora development/testing machine (along with several other tasks), so I can run any additional test anyone would like to see. If you want a more immediate response, or interactive troubleshooting, you can contact me as "laine" on freenode or oftc.

I can say that having interfaces that are up yet have no IP address is *not* an uncommon configuration - it's used all the time when you want to use host NICs in a virtual machine with vfio device assignment or macvtap connections to emulated NICs (and of course there is the fact that NetworkManager *insists* that all NICS *always* be IFF_UP regardless of what you say :-/). So I do think this is an important issue.

Comment 15 Laine Stump 2018-11-08 19:01:53 UTC

BTW, when testing this out on another similar machine I realized that by default NetworkManager will run a dhcp client on *every* interface, so some people may not run into this unless they need to do a bit of tweaking (as I have). Since my SRIOV VFs are never used by the host itself, and so I would never want it to run DHCP on any of them, I added "no-auto-default=*" to my NetworkManager.conf. Anybody who hadn't done this, and had no usable LAN segments with no DHCP server, would not see this bug.

Comment 16 Frederick Kay 2018-11-18 21:34:30 UTC

I'm having the same issue of strange arp entries in my bridge connections.

Could this be related: https://www.spinics.net/linux/fedora/fedora-users/msg484888.html

If yes, removing NetworkManager-config-connectivity-fedora could fix the problem.

Comment 17 Laine Stump 2018-11-19 17:28:39 UTC

(In reply to Frederick Kay from comment #16)
> If yes, removing NetworkManager-config-connectivity-fedora could fix the
> problem.

I'm less interested in solving the symptoms on my own machine (which I've done temporarily by disabling NetworkManager completely) than I am with fixing the root cause, because of the problems (and ensuing bug reports) that it will cause for the many other users who have SRIOV cards with VFs that they use for virtual machines, and who also want to run NetworkManager on their hosts.

As it also says in that email thread, there is simply no valid reason to be sending out an ARP for an IP that isn't on a directly connected network (especially not on an interface that has no IP at all, and so obviously couldn't be used for communication with the ARPed address, even if there *was* a response).

Comment 18 Joel Stienlet 2018-11-19 18:37:49 UTC

I found this link when I was looking for more information regarding the behaviour of the kernel:
https://blogs.oracle.com/stw/why-are-packets-going-out-the-wrong-interface-preserving-for-historical-reasons
That link says this behaviour is described in RFC 1122 section 3.3.1.
But it is important that the packet is send with the source mac of the correct interface, or you corrupt all ARP caches of other hosts. And that the interface is indeed on the same subnet.

Comment 19 Frederick Kay 2018-11-20 08:21:12 UTC

(In reply to Laine Stump from comment #17)

> As it also says in that email thread, there is simply no valid reason to be
> sending out an ARP for an IP that isn't on a directly connected network
> (especially not on an interface that has no IP at all, and so obviously
> couldn't be used for communication with the ARPed address, even if there
> *was* a response).

That is true, however my guess is that the connectivity check module of NetworkManager specifically asks the kernel to send out the ip packages over all interfaces (to detect if there is a connection or not). Then of course, if there is no gateway it tries to resolve the IP on the local network via ARP. That would explain why only the ip addresses used in the connectivity check module end up in the ARP cache as FAILED.

So is this a bug? I think you could improve the connectivity check module to not send any packages if the interface has no gateway or is a bridge interface (?).

I have not checked the network manager source to see whether my assumptions are correct, but to me this is the most logical explanation.

I uninstalled the module and for me this is no longer happening, but I would be glad to see this improved on NM level, because it's annoying and looks like a security problem (that's what made me investigate it)

Comment 20 Ben Cotton 2019-10-31 20:28:54 UTC

This message is a reminder that Fedora 29 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 29 on 2019-11-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '29'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 29 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 21 Ben Cotton 2019-11-27 19:58:39 UTC

Fedora 29 changed to end-of-life (EOL) status on 2019-11-26. Fedora 29 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.