RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1428334 - Network Manager: Network Flaps when static route is set
Summary: Network Manager: Network Flaps when static route is set
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: NetworkManager
Version: 7.3
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Thomas Haller
QA Contact: Desktop QE
URL:
Whiteboard:
Depends On:
Blocks: 1470965
TreeView+ depends on / blocked
 
Reported: 2017-03-02 10:33 UTC by Martin W
Modified: 2021-06-10 11:59 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-10 13:22:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Network manager trace logs (46.86 KB, text/plain)
2017-03-02 12:13 UTC, Martin W
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0778 0 None None None 2018-04-10 13:23:39 UTC

Description Martin W 2017-03-02 10:33:33 UTC
Description of problem:

When a Static route is set and network manager is used to manage the interface, the interface flaps or doesn't come up when restarted.

Version-Release number of selected component (if applicable):
NetworkManager.x86_64  1:1.4.0-14.el7_3

How reproducible:
This happens every time when a static route is set and the network service is restarted / the server is rebooted.

Steps to Reproduce:
1. Add a static route to the interface in /etc/sysconfig/network-scripts/route-ifname
My example looked like this: 10.200.200.2/31 via 172.16.0.254

2. Reboot the server and / or restart the network service

Actual results:

After a reboot we found that the interface was flapping, when attempting to take it down and bring it back up it failed:

We see messages like this in the log:

Mar  2 10:13:11 log NetworkManager[712]: <info>  [1488449591.8840] device (eno16777984): Activation: starting connection 'eno16777984' (1217c72d-305c-4513-aebd-3cd7d9cdaa4f)
Mar  2 10:13:11 log NetworkManager[712]: <info>  [1488449591.8843] audit: op="connection-activate" uuid="1217c72d-305c-4513-aebd-3cd7d9cdaa4f" name="eno16777984" pid=37617 uid=0 result="success"
Mar  2 10:13:11 log NetworkManager[712]: <info>  [1488449591.8845] device (eno16777984): state change: disconnected -> prepare (reason 'none') [30 40 0]
Mar  2 10:13:11 log NetworkManager[712]: <info>  [1488449591.8847] manager: NetworkManager state is now CONNECTING
Mar  2 10:13:11 log NetworkManager[712]: <info>  [1488449591.8856] device (eno16777984): state change: prepare -> config (reason 'none') [40 50 0]
Mar  2 10:13:11 log NetworkManager[712]: <info>  [1488449591.8878] device (eno16777984): state change: config -> ip-config (reason 'none') [50 70 0]
Mar  2 10:13:11 log NetworkManager[712]: <error> [1488449591.8917] platform-linux: do-add-ip4-route[2: 10.200.200.2/31 100]: failure 101 (Network is unreachable)
Mar  2 10:13:43 log NetworkManager[712]: <info>  [1488449623.2936] device (eno16777984): state change: ip-config -> failed (reason 'ip-config-unavailable') [70 120 5]
Mar  2 10:13:43 log NetworkManager[712]: <info>  [1488449623.2939] manager: NetworkManager state is now DISCONNECTED
Mar  2 10:13:43 log NetworkManager[712]: <info>  [1488449623.2940] policy: disabling autoconnect for connection 'eno16777984'.
Mar  2 10:13:43 log NetworkManager[712]: <warn>  [1488449623.2942] device (eno16777984): Activation: failed for connection 'eno16777984'
Mar  2 10:13:43 log NetworkManager[712]: <info>  [1488449623.2961] device (eno16777984): state change: failed -> disconnected (reason 'none') [120 30 0]


Expected results:

The interface should come up with it's static IP

Additional info:
If 'NM_CONTROLLED=no' is added to  /etc/sysconfig/network-scripts/ifcfg-eno16777984 the problem stops

Comment 2 Martin W 2017-03-02 11:02:19 UTC
It might be worth mentioning that the IP on that interface is 192.168.28.51 and therefore on a different subnet to the gateway defined in the static route (172.16.0.254) but this wasn't a problem until we upgraded to the specified version of centos + networkmanager

Comment 3 Thomas Haller 2017-03-02 11:05:16 UTC
Sidenote: "Restart network service" or reboot is usually not the right way to reapply configuration with NetworkManager (it should work however).

Suggested approach is:

  1) edit the ifcfg-rh file
  2) nmcli connection reload
  3) nmcli connection up "$NAME"



Anyway, it would be helpful to attach full debug-logging of NetworkManager.
You do that by editing /etc/NetworkManager/NetworkManager.conf to have 

  [logging]
  level=TRACE

and restart NM. For details see https://cgit.freedesktop.org/NetworkManager/NetworkManager/plain/contrib/fedora/rpm/NetworkManager.conf?id=d63b67b0e0254c0a1d39b5ed8b7b15ce4f9ad259

also, attaching the entire ifcfg file would be helpful. Otherwise it can only be guessed which configuration you try to apply.



Anyway(2), it seems the failure is due to 172.16.0.254 not being reachable. When you configure a route that goes via a gateway (172.16.0.254), then there must be a direct route to that gateway as well. Otherwise kernel does not allow to add such a route. Also initscripts won't be able to do that.
As to whether a route to 172.16.0.254 exists, depends on the rest of your networking configuration. For example, if you use DHCP to get the address, it may be that there is no route to the gateway until you get an address. NM wouldn't know that later such a route appears and fails before DHCP completes.
Initscript may succeed because it completes dhcp4 first, and adds static routes afterwards.

the proper thing to do is to configure also an explict route to the gateway:

  10.200.200.2/31 via 172.16.0.254
  172.16.0.254/32 via 0.0.0.0

Comment 4 Thomas Haller 2017-03-02 11:10:36 UTC
(In reply to Martin W from comment #2)
> wasn't a problem until we upgraded to the
> specified version of centos + networkmanager

additional useful information:

upgraded from where and which version?
upgraded to where and which version?
(the bug is filed against rhel-7.3, not centos)

Comment 5 Martin W 2017-03-02 12:11:56 UTC
Hi Thomas 

Thanks for the really quick reply, that was helpful.

We didn't actually apply the new route config by rebooting the machine and initially the new config appeared to be working, it was after a reboot that happened later on that we noticed the problem.

Here is the full ifcg file:

TYPE="Ethernet"
BOOTPROTO="none"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
NAME="eno16777984"
UUID="1217c72d-305c-4513-aebd-3cd7d9cdaa4f"
DEVICE="eno16777984"
ONBOOT="yes"
IPADDR="192.168.28.51"
PREFIX="24"
GATEWAY="192.168.28.254"
DNS1="192.168.28.2"
DNS2="172.16.0.16"
DNS3="172.16.0.34"
IPV6_PEERDNS="yes"
IPV6_PEERROUTES="yes"
IPV6_PRIVACY="no"


The previous version of NetworkManager that we had installed was NetworkManager-1.0.6-31.el7_2.x86_64
The previous CentOS version was 7.2 but I can't find the more specific version in my logs, sorry

Right now my server is back in it's original state without its new route configured and I can reach the new gateway

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         gateway         0.0.0.0         UG    100    0        0 eno16777984
192.168.28.0    0.0.0.0         255.255.255.0   U     100    0        0 eno16777984

traceroute to 172.16.0.254 (172.16.0.254), 30 hops max, 60 byte packets
 1  gateway (192.168.28.254)  2.325 ms  7.175 ms  8.576 ms
 2  172.16.0.254 (172.16.0.254)  4.138 ms  4.529 ms  4.461 ms

Now I'll try setting the debug level for network manager and try to reproduce the problem again:

Editing the file: /etc/sysconfig/network-scripts/route-eno16777984
now contains just this one line: 10.200.200.2/31 via 172.16.0.254

nmcli connection reload
nmcli connection up eno16777984
  Error: Connection activation failed.

After about a minute the server became unpingable
Another minute or 2 later and I was able to access it again

This pattern just continues until I jump on the vmware console and put the settings back (delete the route config and run the nmcli commands again)

I have attached the logs.

Thanks
Martin

Comment 6 Martin W 2017-03-02 12:13:08 UTC
Created attachment 1259131 [details]
Network manager trace logs

Comment 7 Thomas Haller 2017-03-02 12:32:16 UTC
(as you see, the logfile doesn't contain debugging information, but doesn't matter).


Yes, you have a static configuration, but no direct route to 172.16.0.254.
Kernel does not allow you to add a route via the gateway 172.16.0.254 (which is not directly reachable). Initscripts won't be able to to that either.

Probably older versions of NM would just continue and ignore the error, but the route would not be added either.


Maybe NM should automatically add direct routes when needed. But maybe not... that's not clear to me.
Currently NM doesn't do that, so you need to configure those routes properly.
That just means to add an additional direct route to the gateway:

  172.16.0.254/32 via 0.0.0.0


does that help with your problem?

Comment 8 Thomas Haller 2017-03-02 12:34:11 UTC
nmcli connection modify eno16777984 +ipv4.routes '172.16.0.254/32 0.0.0.0'
nmcli connection up eno16777984

Comment 9 Martin W 2017-03-02 13:25:03 UTC
That Helps, thanks

I understand that It was my mistake that caused the problem but I was wandering if that it could be considered a bug that Network Manager didn't warn me during the connection reload and I'm not sure that it explains the behaviour where the link was flapping.

Cheers
Martin

Comment 10 Thomas Haller 2017-03-02 14:05:43 UTC
(In reply to Martin W from comment #9)
> That Helps, thanks
> 
> I understand that It was my mistake that caused the problem but I was
> wandering if that it could be considered a bug that Network Manager didn't
> warn me during the connection reload and I'm not sure that it explains the
> behaviour where the link was flapping.

the flapping probably comes due to the fact that activation fails (due to the error of configuring the route), then NM disconnects the interface, then after a while it tries to autoconnect, it fails again, and so on. (it's not clear what you mean exactly with flapping and how that looks like. What's the time interval there? Does it match with the activation attempts you see in the logfile?)

In case of DHCP, it could be that DHCP assigns you an address like 172.16.0.17/16, and then the manual route would be valid. In that case, NM couldn't know whether the route will work.
If you have a purely static configuration, it it might be reasonable to somehow complain about that.

But maybe it's better to add the direct route automatically and making the configuration just work.

That seems a useful RFE

Comment 11 Martin W 2017-03-02 14:34:08 UTC
By flapping I simply mean that I was unable to ping the sever for a while, there was 100% packet loss.

As you can tell from my previous update we only configure a static IP on this server and it only has the one interface, DHCP isn't configured at all.

Looking back through these logs and the "flapping" timings I can see that I was unable to ping the server at this point:

Mar  2 11:53:41 log NetworkManager[40265]: <info>  [1488455621.2907] manager: NetworkManager state is now CONNECTING
Mar  2 11:53:41 log NetworkManager[40265]: <info>  [1488455621.2916] device (eno16777984): state change: prepare -> config (reason 'none') [40 50 0]
Mar  2 11:53:41 log NetworkManager[40265]: <info>  [1488455621.2937] device (eno16777984): state change: config -> ip-config (reason 'none') [50 70 0]
Mar  2 11:53:41 log NetworkManager[40265]: <error> [1488455621.2984] platform-linux: do-add-ip4-route[2: 10.200.200.2/31 100]: failure 101 (Network is unreachable)
Mar  2 11:54:13 log NetworkManager[40265]: <info>  [1488455653.2904] device (eno16777984): state change: ip-config -> failed (reason 'ip-config-unavailable') [70 120 5]
Mar  2 11:54:13 log NetworkManager[40265]: <info>  [1488455653.2908] manager: NetworkManager state is now DISCONNECTED
Mar  2 11:54:13 log NetworkManager[40265]: <info>  [1488455653.2911] policy: disabling autoconnect for connection 'eno16777984'.
Mar  2 11:54:13 log NetworkManager[40265]: <warn>  [1488455653.2915] device (eno16777984): Activation: failed for connection 'eno16777984'
Mar  2 11:54:13 log NetworkManager[40265]: <info>  [1488455653.2924] device (eno16777984): state change: failed -> disconnected (reason 'none') [120 30 0]
Mar  2 11:54:21 log collectd[953]: network plugin: sendto failed: Network is unreachable. Closing sending socket.



It was reachable again at this point even though I had not changed anything:
Mar  2 11:59:11 log collectd[953]: network plugin: sendto failed: Network is unreachable. Closing sending socket.
Mar  2 11:59:13 log NetworkManager[40265]: <info>  [1488455953.3521] policy: auto-activating connection 'eno16777984'
Mar  2 11:59:13 log NetworkManager[40265]: <info>  [1488455953.3542] device (eno16777984): Activation: starting connection 'eno16777984' (1217c72d-305c-4513-aebd-3cd7d9cdaa4f)
Mar  2 11:59:13 log NetworkManager[40265]: <info>  [1488455953.3545] device (eno16777984): state change: disconnected -> prepare (reason 'none') [30 40 0]
Mar  2 11:59:13 log NetworkManager[40265]: <info>  [1488455953.3548] manager: NetworkManager state is now CONNECTING
Mar  2 11:59:13 log NetworkManager[40265]: <info>  [1488455953.3557] device (eno16777984): state change: prepare -> config (reason 'none') [40 50 0]
Mar  2 11:59:13 log NetworkManager[40265]: <info>  [1488455953.3577] device (eno16777984): state change: config -> ip-config (reason 'none') [50 70 0]
Mar  2 11:59:13 log NetworkManager[40265]: <error> [1488455953.3631] platform-linux: do-add-ip4-route[2: 10.200.200.2/31 100]: failure 101 (Network is unreachable)
Mar  2 11:59:45 log NetworkManager[40265]: <info>  [1488455985.2846] device (eno16777984): state change: ip-config -> failed (reason 'ip-config-unavailable') [70 120 5]
Mar  2 11:59:45 log NetworkManager[40265]: <info>  [1488455985.2850] manager: NetworkManager state is now DISCONNECTED


This pattern repeated itself one more time before I stopped it by reverting my configuration.

I agree that to make NM just automatically define the extra route would be useful. I will raise an RFE.

I will leave the fate of this ticket up to you, if you think it should have warned me in more detail and not committed the config then I guess we can keep it as a bug, if not then please go ahead and close it.

Thanks again for all your help!!!

Comment 12 Thomas Haller 2017-09-13 15:40:11 UTC
After discussion, we decided not to automatically add such a gateway-route.

The scenario is relatively unusual, because usually you are in the same IP subnet as the gateway, and have a device-route already.

In the unusual case, the user is expected to get the configuration right.



We should improve logging (so that it's clearer what failed).


Another aspect is, that we will support the RTNH_F_ONLINK "onlink" flag. Then the user can configure gateway-routes without requiring a direct route to the gateway. Still, whether the user adds the onlink flag or an explicit manual route, the user needs to explicitly configure this.

Comment 13 Thomas Haller 2017-11-08 10:26:00 UTC
The logging already got improved, to give a better explaination why adding the route fails:

<warn> ... platform: route-sync: failure to add IPv4 route: 5.6.7.0/24 via 4.5.6.7 dev 3 metric 600 mss 0 rt-src user: Network is unreachable (101); is the gateway directly reachable?


What is still missing, is to better such failures/warnings to the client/GUI. That is complicated, and something that we should improve in general. I won't do that as part of this BZ.



But I added support for onlink flag for IPv4. Please review th/platform-routes-onlink-rh1428334.

We are not going to configure manual routes as "onlink". Kernel wants to treat such configurations as invalid, and NM probably should follow that. Also, "onlink" does not work for IPv6, so it would not be the full solution anyway. The user is required to get it right, by configuring a device-route, or setting the IPv4 route as "onlink".

Comment 14 Beniamino Galvani 2017-11-10 10:37:13 UTC
> platform: consider RTNH_F_ONLINK onlink flag for IPv4 routes

+ .rtm_flags = obj->ip_route.r_rtm_flags & (is_v4
+                                           ? (unsigned) (RTNH_F_ONLINK)
...
+		                     obj->r_rtm_flags & (RTNH_F_ONLINK),

Nit: remove parens around RTNH_F_ONLINK

Can you add the new attribute to an existing test in test-route.c?

Comment 15 Beniamino Galvani 2017-11-13 10:46:02 UTC
LGTM now.

Comment 16 Thomas Haller 2017-11-13 11:02:28 UTC
merged th/platform-routes-onlink-rh1428334 to master:

https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=d44a3eb80caeeccc3f34c2c2a1b69828f0ef75aa

Comment 17 Thomas Haller 2017-11-14 10:57:13 UTC
(In reply to Thomas Haller from comment #16)
> merged th/platform-routes-onlink-rh1428334 to master:
> 
> https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/
> ?id=d44a3eb80caeeccc3f34c2c2a1b69828f0ef75aa

also backported to nm-1-10 branch as https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?10&id=c2c1031a20c8045b04fee7fdd4b2f781c9a38499


I think this is all we want to do about this issue. It wouldn't avoid the configuration error of the original report, but logging got improved (helping to identify the issue), and the onlink route property can be used as an alternative solution that the user can configure.

Comment 21 errata-xmlrpc 2018-04-10 13:22:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0778


Note You need to log in before you can comment on or make changes to this bug.