Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1211287

Summary:

30 second total network blackout after activating second interface

Product:

Red Hat Enterprise Linux 7

Reporter:

Marius Vollmer <mvollmer>

Component:

NetworkManager

Assignee:

Lubomir Rintel <lrintel>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Desktop QE <desktop-qa-list>

Severity:

high

Docs Contact:

Mark Flitter <mflitter>

Priority:

high

Version:

7.1

CC:

dcbw, dperpeet, lrintel, rkhan, stefw, thaller

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

NetworkManager-1.0.4-1

Doc Type:

Release Note

Doc Text:

Fix for network blackout with multihomed connections NetworkManager now avoids a network blackout when activating the second device in a multihomed connection.

Story Points:

---

Clone Of:

Clones:

1220344 (view as bug list)

Environment:

Last Closed:

2016-01-18 18:15:15 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1143927, 1187481, 1220344, 1301628

Attachments:

Description	Flags
Console showing nmcli commands.	none
Suggested fix	none

Description Marius Vollmer 2015-04-13 13:59:28 UTC

Created attachment 1013979 [details]
Console showing nmcli commands.

Description of problem:

After making certain (maybe any) changes to a connection, including activation and deactivation, all network traffic to the machine (such as ping) is blocked for a certain time, even if the machine still has other working connections.

[ This could be caused by anything, including the virtual network I use with my virtual machines, of course.  So this more a call for debugging help than a bug report.
]

Version-Release number of selected component (if applicable):

NetworkManager-1.0.0-14.git20150121.b4ea599c.el7.x86_64

How reproducible:

Always

Steps to Reproduce:
1. Install rhel-server-7.1-x86_64-dvd.iso into a new virtual machine by clicking Next in the virtual-manager UI until done.
2. Add a second network interface to it.  (I used default settings.)
3. Make sure that the primary network interface (probably eth0) is connected.
4. Make sure that the second interface (ens9 in my case) is disconnected.
5. Start a ping to the machine from the outside to the IP address of the primary interface.
6. Connect the second interface.

Actual results:

Ping stops for about 30 seconds.  All other connections, such as ssh, also freeze for the same time.

Expected results:

Machine stays connected normally.

Additional info:

A screenshot of the VM console is attached, and this is the output of ping.  Note the gap from 8 to 46.

$ ping 192.168.100.242
PING 192.168.100.242 (192.168.100.242) 56(84) bytes of data.
64 bytes from 192.168.100.242: icmp_seq=1 ttl=64 time=0.169 ms
64 bytes from 192.168.100.242: icmp_seq=2 ttl=64 time=0.216 ms
64 bytes from 192.168.100.242: icmp_seq=3 ttl=64 time=0.218 ms
64 bytes from 192.168.100.242: icmp_seq=4 ttl=64 time=0.197 ms
64 bytes from 192.168.100.242: icmp_seq=5 ttl=64 time=0.241 ms
64 bytes from 192.168.100.242: icmp_seq=6 ttl=64 time=0.231 ms
64 bytes from 192.168.100.242: icmp_seq=7 ttl=64 time=0.222 ms
64 bytes from 192.168.100.242: icmp_seq=8 ttl=64 time=0.159 ms
64 bytes from 192.168.100.242: icmp_seq=46 ttl=64 time=0.854 ms
64 bytes from 192.168.100.242: icmp_seq=47 ttl=64 time=0.311 ms
64 bytes from 192.168.100.242: icmp_seq=48 ttl=64 time=0.412 ms
64 bytes from 192.168.100.242: icmp_seq=49 ttl=64 time=0.544 ms
64 bytes from 192.168.100.242: icmp_seq=50 ttl=64 time=0.554 ms
64 bytes from 192.168.100.242: icmp_seq=51 ttl=64 time=0.281 ms
64 bytes from 192.168.100.242: icmp_seq=52 ttl=64 time=0.489 ms
64 bytes from 192.168.100.242: icmp_seq=53 ttl=64 time=0.724 ms
64 bytes from 192.168.100.242: icmp_seq=54 ttl=64 time=0.426 ms
64 bytes from 192.168.100.242: icmp_seq=55 ttl=64 time=0.248 ms
64 bytes from 192.168.100.242: icmp_seq=56 ttl=64 time=0.636 ms
64 bytes from 192.168.100.242: icmp_seq=57 ttl=64 time=0.261 ms
64 bytes from 192.168.100.242: icmp_seq=58 ttl=64 time=0.655 ms
64 bytes from 192.168.100.242: icmp_seq=59 ttl=64 time=0.327 ms
64 bytes from 192.168.100.242: icmp_seq=60 ttl=64 time=0.401 ms
64 bytes from 192.168.100.242: icmp_seq=61 ttl=64 time=0.279 ms
64 bytes from 192.168.100.242: icmp_seq=62 ttl=64 time=0.486 ms
^C
--- 192.168.100.242 ping statistics ---
62 packets transmitted, 25 received, 59% packet loss, time 61000ms
rtt min/avg/max/mdev = 0.159/0.381/0.854/0.188 ms

Comment 1 Marius Vollmer 2015-04-13 14:01:28 UTC

NM state for eth0 and ens9

# nmcli c
NAME  UUID                                  TYPE            DEVICE 
ens9  04afb9b7-3ff7-4843-ae2d-448f3a77723b  802-3-ethernet  ens9   
eth0  aea527e8-5880-4790-8da5-f15d5f7c5dee  802-3-ethernet  eth0   
[root@localhost ~]# nmcli c show eth0
connection.id:                          eth0
connection.uuid:                        aea527e8-5880-4790-8da5-f15d5f7c5dee
connection.interface-name:              eth0
connection.type:                        802-3-ethernet
connection.autoconnect:                 yes
connection.autoconnect-priority:        0
connection.timestamp:                   1428933572
connection.read-only:                   no
connection.permissions:                 
connection.zone:                        --
connection.master:                      --
connection.slave-type:                  --
connection.secondaries:                 
connection.gateway-ping-timeout:        0
802-3-ethernet.port:                    --
802-3-ethernet.speed:                   0
802-3-ethernet.duplex:                  --
802-3-ethernet.auto-negotiate:          yes
802-3-ethernet.mac-address:             --
802-3-ethernet.cloned-mac-address:      --
802-3-ethernet.mac-address-blacklist:   
802-3-ethernet.mtu:                     auto
802-3-ethernet.s390-subchannels:        
802-3-ethernet.s390-nettype:            --
802-3-ethernet.s390-options:            
ipv4.method:                            auto
ipv4.dns:                               
ipv4.dns-search:                        
ipv4.addresses:                         
ipv4.gateway:                           --
ipv4.routes:                            
ipv4.route-metric:                      -1
ipv4.ignore-auto-routes:                no
ipv4.ignore-auto-dns:                   no
ipv4.dhcp-client-id:                    --
ipv4.dhcp-send-hostname:                yes
ipv4.dhcp-hostname:                     --
ipv4.never-default:                     no
ipv4.may-fail:                          yes
ipv6.method:                            auto
ipv6.dns:                               
ipv6.dns-search:                        
ipv6.addresses:                         
ipv6.gateway:                           --
ipv6.routes:                            
ipv6.route-metric:                      -1
ipv6.ignore-auto-routes:                no
ipv6.ignore-auto-dns:                   no
ipv6.never-default:                     no
ipv6.may-fail:                          yes
ipv6.ip6-privacy:                       -1 (unknown)
ipv6.dhcp-send-hostname:                yes
ipv6.dhcp-hostname:                     --
GENERAL.NAME:                           eth0
GENERAL.UUID:                           aea527e8-5880-4790-8da5-f15d5f7c5dee
GENERAL.DEVICES:                        eth0
GENERAL.STATE:                          activated
GENERAL.DEFAULT:                        yes
GENERAL.DEFAULT6:                       no
GENERAL.VPN:                            no
GENERAL.ZONE:                           --
GENERAL.DBUS-PATH:                      /org/freedesktop/NetworkManager/ActiveConnection/0
GENERAL.CON-PATH:                       /org/freedesktop/NetworkManager/Settings/0
GENERAL.SPEC-OBJECT:                    /
GENERAL.MASTER-PATH:                    --
IP4.ADDRESS[1]:                         192.168.100.242/24
IP4.GATEWAY:                            192.168.100.1
IP4.DNS[1]:                             192.168.100.1
IP4.DOMAIN[1]:                          mvo.lan
DHCP4.OPTION[1]:                        requested_classless_static_routes = 1
DHCP4.OPTION[2]:                        requested_rfc3442_classless_static_routes = 1
DHCP4.OPTION[3]:                        subnet_mask = 255.255.255.0
DHCP4.OPTION[4]:                        requested_subnet_mask = 1
DHCP4.OPTION[5]:                        domain_name_servers = 192.168.100.1
DHCP4.OPTION[6]:                        ip_address = 192.168.100.242
DHCP4.OPTION[7]:                        requested_static_routes = 1
DHCP4.OPTION[8]:                        dhcp_server_identifier = 192.168.100.1
DHCP4.OPTION[9]:                        requested_nis_servers = 1
DHCP4.OPTION[10]:                       requested_time_offset = 1
DHCP4.OPTION[11]:                       broadcast_address = 192.168.100.255
DHCP4.OPTION[12]:                       requested_interface_mtu = 1
DHCP4.OPTION[13]:                       dhcp_rebinding_time = 3150
DHCP4.OPTION[14]:                       requested_domain_name_servers = 1
DHCP4.OPTION[15]:                       dhcp_message_type = 5
DHCP4.OPTION[16]:                       requested_broadcast_address = 1
DHCP4.OPTION[17]:                       routers = 192.168.100.1
DHCP4.OPTION[18]:                       dhcp_renewal_time = 1800
DHCP4.OPTION[19]:                       requested_domain_name = 1
DHCP4.OPTION[20]:                       domain_name = mvo.lan
DHCP4.OPTION[21]:                       requested_routers = 1
DHCP4.OPTION[22]:                       expiry = 1428935072
DHCP4.OPTION[23]:                       requested_wpad = 1
DHCP4.OPTION[24]:                       requested_nis_domain = 1
DHCP4.OPTION[25]:                       requested_ms_classless_static_routes = 1
DHCP4.OPTION[26]:                       network_number = 192.168.100.0
DHCP4.OPTION[27]:                       requested_domain_search = 1
DHCP4.OPTION[28]:                       next_server = 192.168.100.1
DHCP4.OPTION[29]:                       requested_ntp_servers = 1
DHCP4.OPTION[30]:                       requested_host_name = 1
DHCP4.OPTION[31]:                       dhcp_lease_time = 3600
IP6.ADDRESS[1]:                         fe80::5054:ff:fed5:5bc7/64
IP6.GATEWAY:                            
[root@localhost ~]# nmcli c show ens9
connection.id:                          ens9
connection.uuid:                        04afb9b7-3ff7-4843-ae2d-448f3a77723b
connection.interface-name:              ens9
connection.type:                        802-3-ethernet
connection.autoconnect:                 no
connection.autoconnect-priority:        0
connection.timestamp:                   1428933572
connection.read-only:                   no
connection.permissions:                 
connection.zone:                        --
connection.master:                      --
connection.slave-type:                  --
connection.secondaries:                 
connection.gateway-ping-timeout:        0
802-3-ethernet.port:                    --
802-3-ethernet.speed:                   0
802-3-ethernet.duplex:                  --
802-3-ethernet.auto-negotiate:          yes
802-3-ethernet.mac-address:             52:54:00:BC:91:FE
802-3-ethernet.cloned-mac-address:      --
802-3-ethernet.mac-address-blacklist:   
802-3-ethernet.mtu:                     auto
802-3-ethernet.s390-subchannels:        
802-3-ethernet.s390-nettype:            --
802-3-ethernet.s390-options:            
ipv4.method:                            auto
ipv4.dns:                               
ipv4.dns-search:                        
ipv4.addresses:                         
ipv4.gateway:                           --
ipv4.routes:                            
ipv4.route-metric:                      -1
ipv4.ignore-auto-routes:                no
ipv4.ignore-auto-dns:                   no
ipv4.dhcp-client-id:                    --
ipv4.dhcp-send-hostname:                yes
ipv4.dhcp-hostname:                     --
ipv4.never-default:                     no
ipv4.may-fail:                          yes
ipv6.method:                            auto
ipv6.dns:                               
ipv6.dns-search:                        
ipv6.addresses:                         
ipv6.gateway:                           --
ipv6.routes:                            
ipv6.route-metric:                      -1
ipv6.ignore-auto-routes:                no
ipv6.ignore-auto-dns:                   no
ipv6.never-default:                     no
ipv6.may-fail:                          yes
ipv6.ip6-privacy:                       -1 (unknown)
ipv6.dhcp-send-hostname:                yes
ipv6.dhcp-hostname:                     --
GENERAL.NAME:                           ens9
GENERAL.UUID:                           04afb9b7-3ff7-4843-ae2d-448f3a77723b
GENERAL.DEVICES:                        ens9
GENERAL.STATE:                          activated
GENERAL.DEFAULT:                        no
GENERAL.DEFAULT6:                       no
GENERAL.VPN:                            no
GENERAL.ZONE:                           --
GENERAL.DBUS-PATH:                      /org/freedesktop/NetworkManager/ActiveConnection/1
GENERAL.CON-PATH:                       /org/freedesktop/NetworkManager/Settings/1
GENERAL.SPEC-OBJECT:                    /
GENERAL.MASTER-PATH:                    --
IP4.ADDRESS[1]:                         192.168.100.148/24
IP4.GATEWAY:                            192.168.100.1
IP4.DNS[1]:                             192.168.100.1
IP4.DOMAIN[1]:                          mvo.lan
DHCP4.OPTION[1]:                        requested_classless_static_routes = 1
DHCP4.OPTION[2]:                        requested_rfc3442_classless_static_routes = 1
DHCP4.OPTION[3]:                        subnet_mask = 255.255.255.0
DHCP4.OPTION[4]:                        requested_subnet_mask = 1
DHCP4.OPTION[5]:                        domain_name_servers = 192.168.100.1
DHCP4.OPTION[6]:                        ip_address = 192.168.100.148
DHCP4.OPTION[7]:                        requested_static_routes = 1
DHCP4.OPTION[8]:                        dhcp_server_identifier = 192.168.100.1
DHCP4.OPTION[9]:                        requested_nis_servers = 1
DHCP4.OPTION[10]:                       requested_time_offset = 1
DHCP4.OPTION[11]:                       broadcast_address = 192.168.100.255
DHCP4.OPTION[12]:                       requested_interface_mtu = 1
DHCP4.OPTION[13]:                       dhcp_rebinding_time = 3023
DHCP4.OPTION[14]:                       requested_domain_name_servers = 1
DHCP4.OPTION[15]:                       dhcp_message_type = 5
DHCP4.OPTION[16]:                       requested_broadcast_address = 1
DHCP4.OPTION[17]:                       routers = 192.168.100.1
DHCP4.OPTION[18]:                       dhcp_renewal_time = 1673
DHCP4.OPTION[19]:                       requested_domain_name = 1
DHCP4.OPTION[20]:                       domain_name = mvo.lan
DHCP4.OPTION[21]:                       requested_routers = 1
DHCP4.OPTION[22]:                       expiry = 1428936657
DHCP4.OPTION[23]:                       requested_wpad = 1
DHCP4.OPTION[24]:                       requested_nis_domain = 1
DHCP4.OPTION[25]:                       requested_ms_classless_static_routes = 1
DHCP4.OPTION[26]:                       network_number = 192.168.100.0
DHCP4.OPTION[27]:                       requested_domain_search = 1
DHCP4.OPTION[28]:                       next_server = 192.168.100.1
DHCP4.OPTION[29]:                       requested_ntp_servers = 1
DHCP4.OPTION[30]:                       requested_host_name = 1
DHCP4.OPTION[31]:                       dhcp_lease_time = 3600
IP6.ADDRESS[1]:                         fe80::5054:ff:febc:91fe/64
IP6.GATEWAY:

Comment 2 Marius Vollmer 2015-04-13 14:02:52 UTC

Virtual network XML:

$ virsh net-dumpxml default
<network connections='4'>
  <name>default</name>
  <uuid>60a2a9c4-ccae-1d11-0aff-6fc9f74e3847</uuid>
  <forward mode='nat'>
    <nat>
      <port start='1024' end='65535'/>
    </nat>
  </forward>
  <bridge name='virbr0' stp='on' delay='0'/>
  <mac address='52:54:00:16:4f:a7'/>
  <domain name='mvo.lan'/>
  <dns>
    <host ip='192.168.100.3'>
      <hostname>vm-checkmachine2</hostname>
    </host>
  </dns>
  <ip address='192.168.100.1' netmask='255.255.255.0'>
    <dhcp>
      <range start='192.168.100.128' end='192.168.100.254'/>
      <host mac='52:54:00:d0:03:00' name='f20.mvo.lan' ip='192.168.100.42'/>
      <host mac='52:54:00:13:e2:98' name='f21.mvo.lan' ip='192.168.100.21'/>
      <host mac='52:54:00:b2:c5:77' name='f22.mvo.lan' ip='192.168.100.22'/>
      <host mac='52:54:00:45:7e:db' name='ipa.mvo.lan' ip='192.168.100.2'/>
      <host mac='52:54:00:95:84:8c' name='collide.mvo.lan' ip='192.168.100.99'/>
      <host mac='52:54:00:09:33:93' name='vm-checkmachine2' ip='192.168.100.3'/>
    </dhcp>
  </ip>
</network>

Comment 3 Marius Vollmer 2015-04-13 14:14:38 UTC

I have set the virtual network bridge to stp='off', but that didn't help.

Comment 4 Marius Vollmer 2015-04-13 14:16:58 UTC

Doing the same with a Fedora 22 guest leads to the expected behaviour.

Comment 6 Dominik Perpeet 2015-04-15 12:48:32 UTC

I can reproduce the error reliably on a rhel 7.1 guest minimal install (fully updated as of today) as a guest on rhel7 csb.

I added second network interface, down.
Same commands as Marius:
cat /etc/system-release
Red Hat Enterprise Linux Server release 7.1 (Maipo)
[root@localhost ~]# yum info NetworkManager
Loaded plugins: product-id, subscription-manager
Installed Packages
Name        : NetworkManager
Arch        : x86_64
Epoch       : 1
Version     : 1.0.0
Release     : 14.git20150121.b4ea599c.el7
Size        : 8.8 M
Repo        : installed
From repo   : anaconda
Summary     : Network connection manager and user applications
URL         : http://www.gnome.org/projects/NetworkManager/
License     : GPLv2+
Description : NetworkManager is a system service that manages network interfaces and
            : connections based on user or automatic configuration. It supports
            : Ethernet, Bridge, Bond, VLAN, Team, InfiniBand, Wi-Fi, mobile broadband
            : (WWAN), PPPoE and other devices, and supports a variety of different VPN
            : services.

[root@localhost ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:12:58:1c brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.129/24 brd 192.168.122.255 scope global dynamic eth0
       valid_lft 2819sec preferred_lft 2819sec
    inet6 fe80::5054:ff:fe12:581c/64 scope link 
       valid_lft forever preferred_lft forever
3: ens9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:45:ad:d0 brd ff:ff:ff:ff:ff:ff
[root@localhost ~]# nmcli d c ens9
Device 'ens9' successfully activated with 'de9e03ca-2688-4e34-90cd-5d6c6905f86d'.




Ping while adding connection (note the gap between seq #48 and #97):
Wed Apr 15 14:35:32 2015 64 bytes from 192.168.122.129: icmp_seq=30 ttl=64 time=0.568 ms
Wed Apr 15 14:35:33 2015 64 bytes from 192.168.122.129: icmp_seq=31 ttl=64 time=0.191 ms
Wed Apr 15 14:35:34 2015 64 bytes from 192.168.122.129: icmp_seq=32 ttl=64 time=0.223 ms
Wed Apr 15 14:35:35 2015 64 bytes from 192.168.122.129: icmp_seq=33 ttl=64 time=0.234 ms
Wed Apr 15 14:35:36 2015 64 bytes from 192.168.122.129: icmp_seq=34 ttl=64 time=0.113 ms
Wed Apr 15 14:35:37 2015 64 bytes from 192.168.122.129: icmp_seq=35 ttl=64 time=0.241 ms
Wed Apr 15 14:35:38 2015 64 bytes from 192.168.122.129: icmp_seq=36 ttl=64 time=0.344 ms
Wed Apr 15 14:35:39 2015 64 bytes from 192.168.122.129: icmp_seq=37 ttl=64 time=0.325 ms
Wed Apr 15 14:35:40 2015 64 bytes from 192.168.122.129: icmp_seq=38 ttl=64 time=0.206 ms
Wed Apr 15 14:35:41 2015 64 bytes from 192.168.122.129: icmp_seq=39 ttl=64 time=0.141 ms
Wed Apr 15 14:35:42 2015 64 bytes from 192.168.122.129: icmp_seq=40 ttl=64 time=0.112 ms
Wed Apr 15 14:35:43 2015 64 bytes from 192.168.122.129: icmp_seq=41 ttl=64 time=0.203 ms
Wed Apr 15 14:35:44 2015 64 bytes from 192.168.122.129: icmp_seq=42 ttl=64 time=0.142 ms
Wed Apr 15 14:35:45 2015 64 bytes from 192.168.122.129: icmp_seq=43 ttl=64 time=0.175 ms
Wed Apr 15 14:35:46 2015 64 bytes from 192.168.122.129: icmp_seq=44 ttl=64 time=0.302 ms
Wed Apr 15 14:35:47 2015 64 bytes from 192.168.122.129: icmp_seq=45 ttl=64 time=0.192 ms
Wed Apr 15 14:35:48 2015 64 bytes from 192.168.122.129: icmp_seq=46 ttl=64 time=0.122 ms
Wed Apr 15 14:35:49 2015 64 bytes from 192.168.122.129: icmp_seq=47 ttl=64 time=1.25 ms
Wed Apr 15 14:35:50 2015 64 bytes from 192.168.122.129: icmp_seq=48 ttl=64 time=0.306 ms
Wed Apr 15 14:36:39 2015 64 bytes from 192.168.122.129: icmp_seq=97 ttl=64 time=0.351 ms
Wed Apr 15 14:36:40 2015 64 bytes from 192.168.122.129: icmp_seq=98 ttl=64 time=0.240 ms
Wed Apr 15 14:36:41 2015 64 bytes from 192.168.122.129: icmp_seq=99 ttl=64 time=0.244 ms
Wed Apr 15 14:36:42 2015 64 bytes from 192.168.122.129: icmp_seq=100 ttl=64 time=0.260 ms

Comment 7 Marius Vollmer 2015-05-11 08:45:19 UTC

I am now seeing similar behavior with NetworkManager 1.0.2 in Fedora 22, but only in our test images, not in my development VM.

Comment 8 Stef Walter 2015-05-11 09:35:38 UTC

This is really bad on servers where the only access to the server is remote.

Comment 9 Stef Walter 2015-05-11 09:52:44 UTC

Nasty work around in Cockpit, that involves breaking cases where an actual outage has occured: https://github.com/cockpit-project/cockpit/pull/2268

Comment 10 Dan Williams 2015-05-11 15:44:21 UTC

When the blackout happens, could you grab:

ip route
ip -6 route

and then the same after the blackout?

Comment 11 Marius Vollmer 2015-05-12 07:27:43 UTC

FYI, I have cloned this for Fedora 22: https://bugzilla.redhat.com/show_bug.cgi?id=1220344

Comment 12 Marius Vollmer 2015-05-12 07:35:28 UTC

(In reply to Dan Williams from comment #10)
> could you grab:

Before the blackout, with ens9 disconnected:

default via 192.168.100.1 dev eth0  proto static  metric 100 
192.168.100.0/24 dev eth0  proto kernel  scope link  src 192.168.100.242  metric 100 

unreachable ::/96 dev lo  metric 1024  error -101
unreachable ::ffff:0.0.0.0/96 dev lo  metric 1024  error -101
unreachable 2002:a00::/24 dev lo  metric 1024  error -101
unreachable 2002:7f00::/24 dev lo  metric 1024  error -101
unreachable 2002:a9fe::/32 dev lo  metric 1024  error -101
unreachable 2002:ac10::/28 dev lo  metric 1024  error -101
unreachable 2002:c0a8::/32 dev lo  metric 1024  error -101
unreachable 2002:e000::/19 dev lo  metric 1024  error -101
unreachable 3ffe:ffff::/32 dev lo  metric 1024  error -101
fe80::/64 dev eth0  proto kernel  metric 256 

During the blackout, right after connecting ens9:

default via 192.168.100.1 dev eth0  proto static  metric 100 
default via 192.168.100.1 dev ens9  proto static  metric 101 
192.168.100.0/24 dev ens9  proto kernel  scope link  src 192.168.100.148 
192.168.100.0/24 dev eth0  proto kernel  scope link  src 192.168.100.242  metric 100 

unreachable ::/96 dev lo  metric 1024  error -101
unreachable ::ffff:0.0.0.0/96 dev lo  metric 1024  error -101
unreachable 2002:a00::/24 dev lo  metric 1024  error -101
unreachable 2002:7f00::/24 dev lo  metric 1024  error -101
unreachable 2002:a9fe::/32 dev lo  metric 1024  error -101
unreachable 2002:ac10::/28 dev lo  metric 1024  error -101
unreachable 2002:c0a8::/32 dev lo  metric 1024  error -101
unreachable 2002:e000::/19 dev lo  metric 1024  error -101
unreachable 3ffe:ffff::/32 dev lo  metric 1024  error -101
fe80::/64 dev eth0  proto kernel  metric 256 
fe80::/64 dev ens9  proto kernel  metric 256 

After the blackout, when the pings are flowing again:

default via 192.168.100.1 dev eth0  proto static  metric 100 
default via 192.168.100.1 dev ens9  proto static  metric 101 
192.168.100.0/24 dev ens9  proto kernel  scope link  src 192.168.100.148 
192.168.100.0/24 dev eth0  proto kernel  scope link  src 192.168.100.242  metric 100 

unreachable ::/96 dev lo  metric 1024  error -101
unreachable ::ffff:0.0.0.0/96 dev lo  metric 1024  error -101
unreachable 2002:a00::/24 dev lo  metric 1024  error -101
unreachable 2002:7f00::/24 dev lo  metric 1024  error -101
unreachable 2002:a9fe::/32 dev lo  metric 1024  error -101
unreachable 2002:ac10::/28 dev lo  metric 1024  error -101
unreachable 2002:c0a8::/32 dev lo  metric 1024  error -101
unreachable 2002:e000::/19 dev lo  metric 1024  error -101
unreachable 3ffe:ffff::/32 dev lo  metric 1024  error -101
fe80::/64 dev eth0  proto kernel  metric 256 
fe80::/64 dev ens9  proto kernel  metric 256

Comment 13 Marius Vollmer 2015-05-12 07:46:05 UTC

(In reply to Dan Williams from comment #10)
> When the blackout happens, could you grab:
> 
> ip route
> ip -6 route
> 
> and then the same after the blackout?

Could you try to reproduce the bug?  It show itself easily with a freshly installed minimal RHEL or F22 VM, but might not happen with older installations that have been upgraded incrementally.  So please try a fresh VM.

Comment 14 Lubomir Rintel 2015-05-12 07:51:28 UTC

I am able to reproduce the issue.

Comment 15 Lubomir Rintel 2015-05-13 14:01:11 UTC

Created attachment 1025065 [details]
Suggested fix

The issue only happens in a multihomed setups. Fedora and RHEL ship with rp_filter sysctls for all interfaces set to 1, which turns strict reverse path filtering [1]. That means incoming packets with source subnet that would not be routed to the ingress interface are discarded.

[1] https://tools.ietf.org/html/rfc3704#section-2.2

1.) When the first interface is activated it a route for the subnet with metric=100 is added to it
2.) When the second interface comes up, a route with metric=0 is added to it, causing the traffic to be routed via the second interface
3.) The traffic coming in from the first interface is now discarded by rp_filter, since the routing decision for the subnet in question would favour the second one
4.) I think the connection resumes because the rp_filter also blocks ARP traffic and when the entry for first interface's address expires the kernel decides to reach the node via the second interface instead

I don't think this is a regression from 1.0.0. I've been able to reproduce the "blackout" on 1.0.0 too.

The problematic part here is the metric=0 route. This would also route the traffic through undesired interface in an unlikely case that someone has two wireless and one wired connection to the same network (one wireless device would get metric=0 route).

I'm addressing this by replacing the use of constant metric with search for lowest unused metric.

Note that some cases the blackouts are inevitable because you're reliant on what interface do the clients use. In multihomed scenario you should not use the strict reverse part filtering.

This fixes your particular scenario though.

Attaching the fix.

Comment 16 Marius Vollmer 2015-05-13 15:13:57 UTC

(In reply to Lubomir Rintel from comment #15)

> Note that some cases the blackouts are inevitable because you're reliant on
> what interface do the clients use. In multihomed scenario you should not use
> the strict reverse part filtering.

Hmm, could NetworkManager shield me from this 'expert domain knowledge'?  If not, does it have some knobs to switch rp_filter on and off?

Comment 17 Marius Vollmer 2015-05-13 15:14:39 UTC

> Attaching the fix.

And thanks a lot for the effort, of course!

Comment 18 Thomas Haller 2015-05-13 16:15:13 UTC

>> linux-platform: bump device route metric if another device's route would clash
    
    devices. We add a route with a device specific metric to cope with this.
    It causes the other route to disappear.

Adding the route with our desired metric doesn't make the other route disappear. We explicitly delete the metric 0 route.



I think the patch is not correct if there is another route with metric 0 present.

(I took the patch from comment 15, and pushed it to lr/device-route-multihomed-rh1211287 branch)

How about my two fixup commits there?

Comment 19 Thomas Haller 2015-05-13 16:37:55 UTC

(In reply to Lubomir Rintel from comment #15)
> Created attachment 1025065 [details]
> Suggested fix
> 
> The issue only happens in a multihomed setups. Fedora and RHEL ship with
> rp_filter sysctls for all interfaces set to 1, which turns strict reverse
> path filtering [1]. That means incoming packets with source subnet that
> would not be routed to the ingress interface are discarded.
> 
> [1] https://tools.ietf.org/html/rfc3704#section-2.2
> 
> 1.) When the first interface is activated it a route for the subnet with
> metric=100 is added to it
> 2.) When the second interface comes up, a route with metric=0 is added to
> it, causing the traffic to be routed via the second interface
> 3.) The traffic coming in from the first interface is now discarded by
> rp_filter, since the routing decision for the subnet in question would
> favour the second one
> 4.) I think the connection resumes because the rp_filter also blocks ARP
> traffic and when the entry for first interface's address expires the kernel
> decides to reach the node via the second interface instead
> 

You have the same issue if you do the following:

0) activate eth0. We remove the metric-0 route and add metric-100.
1) activate eth1. We remove it's metric-0 route, and add metric-101.
2) deactivate eth0.
3) activate eth2. We remove it's metric-0 route, and add metric-100.

Seems like a better (long-term) real solution would be to have NMRouteManager keeping track of the added routes. Then it would do:

0) activate eth0. We remove the metric-0 route, and add metric-100
1) activate eth1. We remove the metric-0 route, but don't add a metric-100 route.
2) deactivate eth0. Now route-manager restores the metric-100 route for eth1.
3) activate eth2. Same as 1).

Comment 20 Thomas Haller 2015-05-13 18:42:15 UTC

(In reply to Thomas Haller from comment #19)
> (In reply to Lubomir Rintel from comment #15)
> > Created attachment 1025065 [details]
> > Suggested fix
> > 
> > The issue only happens in a multihomed setups. Fedora and RHEL ship with
> > rp_filter sysctls for all interfaces set to 1, which turns strict reverse
> > path filtering [1]. That means incoming packets with source subnet that
> > would not be routed to the ingress interface are discarded.
> > 
> > [1] https://tools.ietf.org/html/rfc3704#section-2.2
> > 
> > 1.) When the first interface is activated it a route for the subnet with
> > metric=100 is added to it
> > 2.) When the second interface comes up, a route with metric=0 is added to
> > it, causing the traffic to be routed via the second interface
> > 3.) The traffic coming in from the first interface is now discarded by
> > rp_filter, since the routing decision for the subnet in question would
> > favour the second one
> > 4.) I think the connection resumes because the rp_filter also blocks ARP
> > traffic and when the entry for first interface's address expires the kernel
> > decides to reach the node via the second interface instead
> > 
> 
> You have the same issue if you do the following:
> 
> 0) activate eth0. We remove the metric-0 route and add metric-100.
> 1) activate eth1. We remove it's metric-0 route, and add metric-101.
> 2) deactivate eth0.
> 3) activate eth2. We remove it's metric-0 route, and add metric-100.

or

0) activate eth0. We remove the metric-0 route and add metric-100.
1) activate eth1. We remove it's metric-0 route, and add metric-101.
2) deactivate eth0.
3) after a while, reactivate eth0. We remove it's metric-0 route, and add metric-100.

now the same issue happens for eth1


Maybe an additional idea would be to assign different devices a different default proiority. So eth0 would get metric 100, eth1 101, and so on...

Comment 21 Thomas Haller 2015-06-24 12:14:23 UTC

Pushed a branch upstream for review: th/device-route-bgo751264

Comment 22 Lubomir Rintel 2015-08-14 14:26:11 UTC

This went into 1.0.4

Comment 23 Dan Williams 2016-01-18 18:15:15 UTC

Already fixed in RHEL 7.2.