Bug 1211287
| Summary: | 30 second total network blackout after activating second interface | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Marius Vollmer <mvollmer> | ||||||
| Component: | NetworkManager | Assignee: | Lubomir Rintel <lrintel> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Desktop QE <desktop-qa-list> | ||||||
| Severity: | high | Docs Contact: | Mark Flitter <mflitter> | ||||||
| Priority: | high | ||||||||
| Version: | 7.1 | CC: | dcbw, dperpeet, lrintel, rkhan, stefw, thaller | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | NetworkManager-1.0.4-1 | Doc Type: | Release Note | ||||||
| Doc Text: |
Fix for network blackout with multihomed connections
NetworkManager now avoids a network blackout when activating the second device in a multihomed connection.
|
Story Points: | --- | ||||||
| Clone Of: | |||||||||
| : | 1220344 (view as bug list) | Environment: | |||||||
| Last Closed: | 2016-01-18 18:15:15 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1143927, 1187481, 1220344, 1301628 | ||||||||
| Attachments: |
|
||||||||
NM state for eth0 and ens9 # nmcli c NAME UUID TYPE DEVICE ens9 04afb9b7-3ff7-4843-ae2d-448f3a77723b 802-3-ethernet ens9 eth0 aea527e8-5880-4790-8da5-f15d5f7c5dee 802-3-ethernet eth0 [root@localhost ~]# nmcli c show eth0 connection.id: eth0 connection.uuid: aea527e8-5880-4790-8da5-f15d5f7c5dee connection.interface-name: eth0 connection.type: 802-3-ethernet connection.autoconnect: yes connection.autoconnect-priority: 0 connection.timestamp: 1428933572 connection.read-only: no connection.permissions: connection.zone: -- connection.master: -- connection.slave-type: -- connection.secondaries: connection.gateway-ping-timeout: 0 802-3-ethernet.port: -- 802-3-ethernet.speed: 0 802-3-ethernet.duplex: -- 802-3-ethernet.auto-negotiate: yes 802-3-ethernet.mac-address: -- 802-3-ethernet.cloned-mac-address: -- 802-3-ethernet.mac-address-blacklist: 802-3-ethernet.mtu: auto 802-3-ethernet.s390-subchannels: 802-3-ethernet.s390-nettype: -- 802-3-ethernet.s390-options: ipv4.method: auto ipv4.dns: ipv4.dns-search: ipv4.addresses: ipv4.gateway: -- ipv4.routes: ipv4.route-metric: -1 ipv4.ignore-auto-routes: no ipv4.ignore-auto-dns: no ipv4.dhcp-client-id: -- ipv4.dhcp-send-hostname: yes ipv4.dhcp-hostname: -- ipv4.never-default: no ipv4.may-fail: yes ipv6.method: auto ipv6.dns: ipv6.dns-search: ipv6.addresses: ipv6.gateway: -- ipv6.routes: ipv6.route-metric: -1 ipv6.ignore-auto-routes: no ipv6.ignore-auto-dns: no ipv6.never-default: no ipv6.may-fail: yes ipv6.ip6-privacy: -1 (unknown) ipv6.dhcp-send-hostname: yes ipv6.dhcp-hostname: -- GENERAL.NAME: eth0 GENERAL.UUID: aea527e8-5880-4790-8da5-f15d5f7c5dee GENERAL.DEVICES: eth0 GENERAL.STATE: activated GENERAL.DEFAULT: yes GENERAL.DEFAULT6: no GENERAL.VPN: no GENERAL.ZONE: -- GENERAL.DBUS-PATH: /org/freedesktop/NetworkManager/ActiveConnection/0 GENERAL.CON-PATH: /org/freedesktop/NetworkManager/Settings/0 GENERAL.SPEC-OBJECT: / GENERAL.MASTER-PATH: -- IP4.ADDRESS[1]: 192.168.100.242/24 IP4.GATEWAY: 192.168.100.1 IP4.DNS[1]: 192.168.100.1 IP4.DOMAIN[1]: mvo.lan DHCP4.OPTION[1]: requested_classless_static_routes = 1 DHCP4.OPTION[2]: requested_rfc3442_classless_static_routes = 1 DHCP4.OPTION[3]: subnet_mask = 255.255.255.0 DHCP4.OPTION[4]: requested_subnet_mask = 1 DHCP4.OPTION[5]: domain_name_servers = 192.168.100.1 DHCP4.OPTION[6]: ip_address = 192.168.100.242 DHCP4.OPTION[7]: requested_static_routes = 1 DHCP4.OPTION[8]: dhcp_server_identifier = 192.168.100.1 DHCP4.OPTION[9]: requested_nis_servers = 1 DHCP4.OPTION[10]: requested_time_offset = 1 DHCP4.OPTION[11]: broadcast_address = 192.168.100.255 DHCP4.OPTION[12]: requested_interface_mtu = 1 DHCP4.OPTION[13]: dhcp_rebinding_time = 3150 DHCP4.OPTION[14]: requested_domain_name_servers = 1 DHCP4.OPTION[15]: dhcp_message_type = 5 DHCP4.OPTION[16]: requested_broadcast_address = 1 DHCP4.OPTION[17]: routers = 192.168.100.1 DHCP4.OPTION[18]: dhcp_renewal_time = 1800 DHCP4.OPTION[19]: requested_domain_name = 1 DHCP4.OPTION[20]: domain_name = mvo.lan DHCP4.OPTION[21]: requested_routers = 1 DHCP4.OPTION[22]: expiry = 1428935072 DHCP4.OPTION[23]: requested_wpad = 1 DHCP4.OPTION[24]: requested_nis_domain = 1 DHCP4.OPTION[25]: requested_ms_classless_static_routes = 1 DHCP4.OPTION[26]: network_number = 192.168.100.0 DHCP4.OPTION[27]: requested_domain_search = 1 DHCP4.OPTION[28]: next_server = 192.168.100.1 DHCP4.OPTION[29]: requested_ntp_servers = 1 DHCP4.OPTION[30]: requested_host_name = 1 DHCP4.OPTION[31]: dhcp_lease_time = 3600 IP6.ADDRESS[1]: fe80::5054:ff:fed5:5bc7/64 IP6.GATEWAY: [root@localhost ~]# nmcli c show ens9 connection.id: ens9 connection.uuid: 04afb9b7-3ff7-4843-ae2d-448f3a77723b connection.interface-name: ens9 connection.type: 802-3-ethernet connection.autoconnect: no connection.autoconnect-priority: 0 connection.timestamp: 1428933572 connection.read-only: no connection.permissions: connection.zone: -- connection.master: -- connection.slave-type: -- connection.secondaries: connection.gateway-ping-timeout: 0 802-3-ethernet.port: -- 802-3-ethernet.speed: 0 802-3-ethernet.duplex: -- 802-3-ethernet.auto-negotiate: yes 802-3-ethernet.mac-address: 52:54:00:BC:91:FE 802-3-ethernet.cloned-mac-address: -- 802-3-ethernet.mac-address-blacklist: 802-3-ethernet.mtu: auto 802-3-ethernet.s390-subchannels: 802-3-ethernet.s390-nettype: -- 802-3-ethernet.s390-options: ipv4.method: auto ipv4.dns: ipv4.dns-search: ipv4.addresses: ipv4.gateway: -- ipv4.routes: ipv4.route-metric: -1 ipv4.ignore-auto-routes: no ipv4.ignore-auto-dns: no ipv4.dhcp-client-id: -- ipv4.dhcp-send-hostname: yes ipv4.dhcp-hostname: -- ipv4.never-default: no ipv4.may-fail: yes ipv6.method: auto ipv6.dns: ipv6.dns-search: ipv6.addresses: ipv6.gateway: -- ipv6.routes: ipv6.route-metric: -1 ipv6.ignore-auto-routes: no ipv6.ignore-auto-dns: no ipv6.never-default: no ipv6.may-fail: yes ipv6.ip6-privacy: -1 (unknown) ipv6.dhcp-send-hostname: yes ipv6.dhcp-hostname: -- GENERAL.NAME: ens9 GENERAL.UUID: 04afb9b7-3ff7-4843-ae2d-448f3a77723b GENERAL.DEVICES: ens9 GENERAL.STATE: activated GENERAL.DEFAULT: no GENERAL.DEFAULT6: no GENERAL.VPN: no GENERAL.ZONE: -- GENERAL.DBUS-PATH: /org/freedesktop/NetworkManager/ActiveConnection/1 GENERAL.CON-PATH: /org/freedesktop/NetworkManager/Settings/1 GENERAL.SPEC-OBJECT: / GENERAL.MASTER-PATH: -- IP4.ADDRESS[1]: 192.168.100.148/24 IP4.GATEWAY: 192.168.100.1 IP4.DNS[1]: 192.168.100.1 IP4.DOMAIN[1]: mvo.lan DHCP4.OPTION[1]: requested_classless_static_routes = 1 DHCP4.OPTION[2]: requested_rfc3442_classless_static_routes = 1 DHCP4.OPTION[3]: subnet_mask = 255.255.255.0 DHCP4.OPTION[4]: requested_subnet_mask = 1 DHCP4.OPTION[5]: domain_name_servers = 192.168.100.1 DHCP4.OPTION[6]: ip_address = 192.168.100.148 DHCP4.OPTION[7]: requested_static_routes = 1 DHCP4.OPTION[8]: dhcp_server_identifier = 192.168.100.1 DHCP4.OPTION[9]: requested_nis_servers = 1 DHCP4.OPTION[10]: requested_time_offset = 1 DHCP4.OPTION[11]: broadcast_address = 192.168.100.255 DHCP4.OPTION[12]: requested_interface_mtu = 1 DHCP4.OPTION[13]: dhcp_rebinding_time = 3023 DHCP4.OPTION[14]: requested_domain_name_servers = 1 DHCP4.OPTION[15]: dhcp_message_type = 5 DHCP4.OPTION[16]: requested_broadcast_address = 1 DHCP4.OPTION[17]: routers = 192.168.100.1 DHCP4.OPTION[18]: dhcp_renewal_time = 1673 DHCP4.OPTION[19]: requested_domain_name = 1 DHCP4.OPTION[20]: domain_name = mvo.lan DHCP4.OPTION[21]: requested_routers = 1 DHCP4.OPTION[22]: expiry = 1428936657 DHCP4.OPTION[23]: requested_wpad = 1 DHCP4.OPTION[24]: requested_nis_domain = 1 DHCP4.OPTION[25]: requested_ms_classless_static_routes = 1 DHCP4.OPTION[26]: network_number = 192.168.100.0 DHCP4.OPTION[27]: requested_domain_search = 1 DHCP4.OPTION[28]: next_server = 192.168.100.1 DHCP4.OPTION[29]: requested_ntp_servers = 1 DHCP4.OPTION[30]: requested_host_name = 1 DHCP4.OPTION[31]: dhcp_lease_time = 3600 IP6.ADDRESS[1]: fe80::5054:ff:febc:91fe/64 IP6.GATEWAY: Virtual network XML:
$ virsh net-dumpxml default
<network connections='4'>
<name>default</name>
<uuid>60a2a9c4-ccae-1d11-0aff-6fc9f74e3847</uuid>
<forward mode='nat'>
<nat>
<port start='1024' end='65535'/>
</nat>
</forward>
<bridge name='virbr0' stp='on' delay='0'/>
<mac address='52:54:00:16:4f:a7'/>
<domain name='mvo.lan'/>
<dns>
<host ip='192.168.100.3'>
<hostname>vm-checkmachine2</hostname>
</host>
</dns>
<ip address='192.168.100.1' netmask='255.255.255.0'>
<dhcp>
<range start='192.168.100.128' end='192.168.100.254'/>
<host mac='52:54:00:d0:03:00' name='f20.mvo.lan' ip='192.168.100.42'/>
<host mac='52:54:00:13:e2:98' name='f21.mvo.lan' ip='192.168.100.21'/>
<host mac='52:54:00:b2:c5:77' name='f22.mvo.lan' ip='192.168.100.22'/>
<host mac='52:54:00:45:7e:db' name='ipa.mvo.lan' ip='192.168.100.2'/>
<host mac='52:54:00:95:84:8c' name='collide.mvo.lan' ip='192.168.100.99'/>
<host mac='52:54:00:09:33:93' name='vm-checkmachine2' ip='192.168.100.3'/>
</dhcp>
</ip>
</network>
I have set the virtual network bridge to stp='off', but that didn't help. Doing the same with a Fedora 22 guest leads to the expected behaviour. I can reproduce the error reliably on a rhel 7.1 guest minimal install (fully updated as of today) as a guest on rhel7 csb. I added second network interface, down. Same commands as Marius: cat /etc/system-release Red Hat Enterprise Linux Server release 7.1 (Maipo) [root@localhost ~]# yum info NetworkManager Loaded plugins: product-id, subscription-manager Installed Packages Name : NetworkManager Arch : x86_64 Epoch : 1 Version : 1.0.0 Release : 14.git20150121.b4ea599c.el7 Size : 8.8 M Repo : installed From repo : anaconda Summary : Network connection manager and user applications URL : http://www.gnome.org/projects/NetworkManager/ License : GPLv2+ Description : NetworkManager is a system service that manages network interfaces and : connections based on user or automatic configuration. It supports : Ethernet, Bridge, Bond, VLAN, Team, InfiniBand, Wi-Fi, mobile broadband : (WWAN), PPPoE and other devices, and supports a variety of different VPN : services. [root@localhost ~]# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 52:54:00:12:58:1c brd ff:ff:ff:ff:ff:ff inet 192.168.122.129/24 brd 192.168.122.255 scope global dynamic eth0 valid_lft 2819sec preferred_lft 2819sec inet6 fe80::5054:ff:fe12:581c/64 scope link valid_lft forever preferred_lft forever 3: ens9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 52:54:00:45:ad:d0 brd ff:ff:ff:ff:ff:ff [root@localhost ~]# nmcli d c ens9 Device 'ens9' successfully activated with 'de9e03ca-2688-4e34-90cd-5d6c6905f86d'. Ping while adding connection (note the gap between seq #48 and #97): Wed Apr 15 14:35:32 2015 64 bytes from 192.168.122.129: icmp_seq=30 ttl=64 time=0.568 ms Wed Apr 15 14:35:33 2015 64 bytes from 192.168.122.129: icmp_seq=31 ttl=64 time=0.191 ms Wed Apr 15 14:35:34 2015 64 bytes from 192.168.122.129: icmp_seq=32 ttl=64 time=0.223 ms Wed Apr 15 14:35:35 2015 64 bytes from 192.168.122.129: icmp_seq=33 ttl=64 time=0.234 ms Wed Apr 15 14:35:36 2015 64 bytes from 192.168.122.129: icmp_seq=34 ttl=64 time=0.113 ms Wed Apr 15 14:35:37 2015 64 bytes from 192.168.122.129: icmp_seq=35 ttl=64 time=0.241 ms Wed Apr 15 14:35:38 2015 64 bytes from 192.168.122.129: icmp_seq=36 ttl=64 time=0.344 ms Wed Apr 15 14:35:39 2015 64 bytes from 192.168.122.129: icmp_seq=37 ttl=64 time=0.325 ms Wed Apr 15 14:35:40 2015 64 bytes from 192.168.122.129: icmp_seq=38 ttl=64 time=0.206 ms Wed Apr 15 14:35:41 2015 64 bytes from 192.168.122.129: icmp_seq=39 ttl=64 time=0.141 ms Wed Apr 15 14:35:42 2015 64 bytes from 192.168.122.129: icmp_seq=40 ttl=64 time=0.112 ms Wed Apr 15 14:35:43 2015 64 bytes from 192.168.122.129: icmp_seq=41 ttl=64 time=0.203 ms Wed Apr 15 14:35:44 2015 64 bytes from 192.168.122.129: icmp_seq=42 ttl=64 time=0.142 ms Wed Apr 15 14:35:45 2015 64 bytes from 192.168.122.129: icmp_seq=43 ttl=64 time=0.175 ms Wed Apr 15 14:35:46 2015 64 bytes from 192.168.122.129: icmp_seq=44 ttl=64 time=0.302 ms Wed Apr 15 14:35:47 2015 64 bytes from 192.168.122.129: icmp_seq=45 ttl=64 time=0.192 ms Wed Apr 15 14:35:48 2015 64 bytes from 192.168.122.129: icmp_seq=46 ttl=64 time=0.122 ms Wed Apr 15 14:35:49 2015 64 bytes from 192.168.122.129: icmp_seq=47 ttl=64 time=1.25 ms Wed Apr 15 14:35:50 2015 64 bytes from 192.168.122.129: icmp_seq=48 ttl=64 time=0.306 ms Wed Apr 15 14:36:39 2015 64 bytes from 192.168.122.129: icmp_seq=97 ttl=64 time=0.351 ms Wed Apr 15 14:36:40 2015 64 bytes from 192.168.122.129: icmp_seq=98 ttl=64 time=0.240 ms Wed Apr 15 14:36:41 2015 64 bytes from 192.168.122.129: icmp_seq=99 ttl=64 time=0.244 ms Wed Apr 15 14:36:42 2015 64 bytes from 192.168.122.129: icmp_seq=100 ttl=64 time=0.260 ms I am now seeing similar behavior with NetworkManager 1.0.2 in Fedora 22, but only in our test images, not in my development VM. This is really bad on servers where the only access to the server is remote. Nasty work around in Cockpit, that involves breaking cases where an actual outage has occured: https://github.com/cockpit-project/cockpit/pull/2268 When the blackout happens, could you grab: ip route ip -6 route and then the same after the blackout? FYI, I have cloned this for Fedora 22: https://bugzilla.redhat.com/show_bug.cgi?id=1220344 (In reply to Dan Williams from comment #10) > could you grab: Before the blackout, with ens9 disconnected: default via 192.168.100.1 dev eth0 proto static metric 100 192.168.100.0/24 dev eth0 proto kernel scope link src 192.168.100.242 metric 100 unreachable ::/96 dev lo metric 1024 error -101 unreachable ::ffff:0.0.0.0/96 dev lo metric 1024 error -101 unreachable 2002:a00::/24 dev lo metric 1024 error -101 unreachable 2002:7f00::/24 dev lo metric 1024 error -101 unreachable 2002:a9fe::/32 dev lo metric 1024 error -101 unreachable 2002:ac10::/28 dev lo metric 1024 error -101 unreachable 2002:c0a8::/32 dev lo metric 1024 error -101 unreachable 2002:e000::/19 dev lo metric 1024 error -101 unreachable 3ffe:ffff::/32 dev lo metric 1024 error -101 fe80::/64 dev eth0 proto kernel metric 256 During the blackout, right after connecting ens9: default via 192.168.100.1 dev eth0 proto static metric 100 default via 192.168.100.1 dev ens9 proto static metric 101 192.168.100.0/24 dev ens9 proto kernel scope link src 192.168.100.148 192.168.100.0/24 dev eth0 proto kernel scope link src 192.168.100.242 metric 100 unreachable ::/96 dev lo metric 1024 error -101 unreachable ::ffff:0.0.0.0/96 dev lo metric 1024 error -101 unreachable 2002:a00::/24 dev lo metric 1024 error -101 unreachable 2002:7f00::/24 dev lo metric 1024 error -101 unreachable 2002:a9fe::/32 dev lo metric 1024 error -101 unreachable 2002:ac10::/28 dev lo metric 1024 error -101 unreachable 2002:c0a8::/32 dev lo metric 1024 error -101 unreachable 2002:e000::/19 dev lo metric 1024 error -101 unreachable 3ffe:ffff::/32 dev lo metric 1024 error -101 fe80::/64 dev eth0 proto kernel metric 256 fe80::/64 dev ens9 proto kernel metric 256 After the blackout, when the pings are flowing again: default via 192.168.100.1 dev eth0 proto static metric 100 default via 192.168.100.1 dev ens9 proto static metric 101 192.168.100.0/24 dev ens9 proto kernel scope link src 192.168.100.148 192.168.100.0/24 dev eth0 proto kernel scope link src 192.168.100.242 metric 100 unreachable ::/96 dev lo metric 1024 error -101 unreachable ::ffff:0.0.0.0/96 dev lo metric 1024 error -101 unreachable 2002:a00::/24 dev lo metric 1024 error -101 unreachable 2002:7f00::/24 dev lo metric 1024 error -101 unreachable 2002:a9fe::/32 dev lo metric 1024 error -101 unreachable 2002:ac10::/28 dev lo metric 1024 error -101 unreachable 2002:c0a8::/32 dev lo metric 1024 error -101 unreachable 2002:e000::/19 dev lo metric 1024 error -101 unreachable 3ffe:ffff::/32 dev lo metric 1024 error -101 fe80::/64 dev eth0 proto kernel metric 256 fe80::/64 dev ens9 proto kernel metric 256 (In reply to Dan Williams from comment #10) > When the blackout happens, could you grab: > > ip route > ip -6 route > > and then the same after the blackout? Could you try to reproduce the bug? It show itself easily with a freshly installed minimal RHEL or F22 VM, but might not happen with older installations that have been upgraded incrementally. So please try a fresh VM. I am able to reproduce the issue. Created attachment 1025065 [details] Suggested fix The issue only happens in a multihomed setups. Fedora and RHEL ship with rp_filter sysctls for all interfaces set to 1, which turns strict reverse path filtering [1]. That means incoming packets with source subnet that would not be routed to the ingress interface are discarded. [1] https://tools.ietf.org/html/rfc3704#section-2.2 1.) When the first interface is activated it a route for the subnet with metric=100 is added to it 2.) When the second interface comes up, a route with metric=0 is added to it, causing the traffic to be routed via the second interface 3.) The traffic coming in from the first interface is now discarded by rp_filter, since the routing decision for the subnet in question would favour the second one 4.) I think the connection resumes because the rp_filter also blocks ARP traffic and when the entry for first interface's address expires the kernel decides to reach the node via the second interface instead I don't think this is a regression from 1.0.0. I've been able to reproduce the "blackout" on 1.0.0 too. The problematic part here is the metric=0 route. This would also route the traffic through undesired interface in an unlikely case that someone has two wireless and one wired connection to the same network (one wireless device would get metric=0 route). I'm addressing this by replacing the use of constant metric with search for lowest unused metric. Note that some cases the blackouts are inevitable because you're reliant on what interface do the clients use. In multihomed scenario you should not use the strict reverse part filtering. This fixes your particular scenario though. Attaching the fix. (In reply to Lubomir Rintel from comment #15) > Note that some cases the blackouts are inevitable because you're reliant on > what interface do the clients use. In multihomed scenario you should not use > the strict reverse part filtering. Hmm, could NetworkManager shield me from this 'expert domain knowledge'? If not, does it have some knobs to switch rp_filter on and off? > Attaching the fix.
And thanks a lot for the effort, of course!
>> linux-platform: bump device route metric if another device's route would clash devices. We add a route with a device specific metric to cope with this. It causes the other route to disappear. Adding the route with our desired metric doesn't make the other route disappear. We explicitly delete the metric 0 route. I think the patch is not correct if there is another route with metric 0 present. (I took the patch from comment 15, and pushed it to lr/device-route-multihomed-rh1211287 branch) How about my two fixup commits there? (In reply to Lubomir Rintel from comment #15) > Created attachment 1025065 [details] > Suggested fix > > The issue only happens in a multihomed setups. Fedora and RHEL ship with > rp_filter sysctls for all interfaces set to 1, which turns strict reverse > path filtering [1]. That means incoming packets with source subnet that > would not be routed to the ingress interface are discarded. > > [1] https://tools.ietf.org/html/rfc3704#section-2.2 > > 1.) When the first interface is activated it a route for the subnet with > metric=100 is added to it > 2.) When the second interface comes up, a route with metric=0 is added to > it, causing the traffic to be routed via the second interface > 3.) The traffic coming in from the first interface is now discarded by > rp_filter, since the routing decision for the subnet in question would > favour the second one > 4.) I think the connection resumes because the rp_filter also blocks ARP > traffic and when the entry for first interface's address expires the kernel > decides to reach the node via the second interface instead > You have the same issue if you do the following: 0) activate eth0. We remove the metric-0 route and add metric-100. 1) activate eth1. We remove it's metric-0 route, and add metric-101. 2) deactivate eth0. 3) activate eth2. We remove it's metric-0 route, and add metric-100. Seems like a better (long-term) real solution would be to have NMRouteManager keeping track of the added routes. Then it would do: 0) activate eth0. We remove the metric-0 route, and add metric-100 1) activate eth1. We remove the metric-0 route, but don't add a metric-100 route. 2) deactivate eth0. Now route-manager restores the metric-100 route for eth1. 3) activate eth2. Same as 1). (In reply to Thomas Haller from comment #19) > (In reply to Lubomir Rintel from comment #15) > > Created attachment 1025065 [details] > > Suggested fix > > > > The issue only happens in a multihomed setups. Fedora and RHEL ship with > > rp_filter sysctls for all interfaces set to 1, which turns strict reverse > > path filtering [1]. That means incoming packets with source subnet that > > would not be routed to the ingress interface are discarded. > > > > [1] https://tools.ietf.org/html/rfc3704#section-2.2 > > > > 1.) When the first interface is activated it a route for the subnet with > > metric=100 is added to it > > 2.) When the second interface comes up, a route with metric=0 is added to > > it, causing the traffic to be routed via the second interface > > 3.) The traffic coming in from the first interface is now discarded by > > rp_filter, since the routing decision for the subnet in question would > > favour the second one > > 4.) I think the connection resumes because the rp_filter also blocks ARP > > traffic and when the entry for first interface's address expires the kernel > > decides to reach the node via the second interface instead > > > > You have the same issue if you do the following: > > 0) activate eth0. We remove the metric-0 route and add metric-100. > 1) activate eth1. We remove it's metric-0 route, and add metric-101. > 2) deactivate eth0. > 3) activate eth2. We remove it's metric-0 route, and add metric-100. or 0) activate eth0. We remove the metric-0 route and add metric-100. 1) activate eth1. We remove it's metric-0 route, and add metric-101. 2) deactivate eth0. 3) after a while, reactivate eth0. We remove it's metric-0 route, and add metric-100. now the same issue happens for eth1 Maybe an additional idea would be to assign different devices a different default proiority. So eth0 would get metric 100, eth1 101, and so on... Pushed a branch upstream for review: th/device-route-bgo751264 This went into 1.0.4 Already fixed in RHEL 7.2. |
Created attachment 1013979 [details] Console showing nmcli commands. Description of problem: After making certain (maybe any) changes to a connection, including activation and deactivation, all network traffic to the machine (such as ping) is blocked for a certain time, even if the machine still has other working connections. [ This could be caused by anything, including the virtual network I use with my virtual machines, of course. So this more a call for debugging help than a bug report. ] Version-Release number of selected component (if applicable): NetworkManager-1.0.0-14.git20150121.b4ea599c.el7.x86_64 How reproducible: Always Steps to Reproduce: 1. Install rhel-server-7.1-x86_64-dvd.iso into a new virtual machine by clicking Next in the virtual-manager UI until done. 2. Add a second network interface to it. (I used default settings.) 3. Make sure that the primary network interface (probably eth0) is connected. 4. Make sure that the second interface (ens9 in my case) is disconnected. 5. Start a ping to the machine from the outside to the IP address of the primary interface. 6. Connect the second interface. Actual results: Ping stops for about 30 seconds. All other connections, such as ssh, also freeze for the same time. Expected results: Machine stays connected normally. Additional info: A screenshot of the VM console is attached, and this is the output of ping. Note the gap from 8 to 46. $ ping 192.168.100.242 PING 192.168.100.242 (192.168.100.242) 56(84) bytes of data. 64 bytes from 192.168.100.242: icmp_seq=1 ttl=64 time=0.169 ms 64 bytes from 192.168.100.242: icmp_seq=2 ttl=64 time=0.216 ms 64 bytes from 192.168.100.242: icmp_seq=3 ttl=64 time=0.218 ms 64 bytes from 192.168.100.242: icmp_seq=4 ttl=64 time=0.197 ms 64 bytes from 192.168.100.242: icmp_seq=5 ttl=64 time=0.241 ms 64 bytes from 192.168.100.242: icmp_seq=6 ttl=64 time=0.231 ms 64 bytes from 192.168.100.242: icmp_seq=7 ttl=64 time=0.222 ms 64 bytes from 192.168.100.242: icmp_seq=8 ttl=64 time=0.159 ms 64 bytes from 192.168.100.242: icmp_seq=46 ttl=64 time=0.854 ms 64 bytes from 192.168.100.242: icmp_seq=47 ttl=64 time=0.311 ms 64 bytes from 192.168.100.242: icmp_seq=48 ttl=64 time=0.412 ms 64 bytes from 192.168.100.242: icmp_seq=49 ttl=64 time=0.544 ms 64 bytes from 192.168.100.242: icmp_seq=50 ttl=64 time=0.554 ms 64 bytes from 192.168.100.242: icmp_seq=51 ttl=64 time=0.281 ms 64 bytes from 192.168.100.242: icmp_seq=52 ttl=64 time=0.489 ms 64 bytes from 192.168.100.242: icmp_seq=53 ttl=64 time=0.724 ms 64 bytes from 192.168.100.242: icmp_seq=54 ttl=64 time=0.426 ms 64 bytes from 192.168.100.242: icmp_seq=55 ttl=64 time=0.248 ms 64 bytes from 192.168.100.242: icmp_seq=56 ttl=64 time=0.636 ms 64 bytes from 192.168.100.242: icmp_seq=57 ttl=64 time=0.261 ms 64 bytes from 192.168.100.242: icmp_seq=58 ttl=64 time=0.655 ms 64 bytes from 192.168.100.242: icmp_seq=59 ttl=64 time=0.327 ms 64 bytes from 192.168.100.242: icmp_seq=60 ttl=64 time=0.401 ms 64 bytes from 192.168.100.242: icmp_seq=61 ttl=64 time=0.279 ms 64 bytes from 192.168.100.242: icmp_seq=62 ttl=64 time=0.486 ms ^C --- 192.168.100.242 ping statistics --- 62 packets transmitted, 25 received, 59% packet loss, time 61000ms rtt min/avg/max/mdev = 0.159/0.381/0.854/0.188 ms