Bug 1276399

Summary: possible to have a VM with 2 default gateways
Product: Red Hat Enterprise Linux 7 Reporter: Micah Abbott <miabbott>
Component: cloud-initAssignee: Lars Kellogg-Stedman <lars>
Status: CLOSED INSUFFICIENT_DATA QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.2CC: walters
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-16 18:29:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Micah Abbott 2015-10-29 15:26:32 UTC
In my OpenStack tenant, there are two networks defined.  It appears that when I boot a VM, by default, two NICs are provisioned attached to each network.

When booting a VM using RHEL Atomic Host as the OS, the VM ends up with two default gateways and is unable to be accessed via the floating IP.  See the output from cloud-init:


[[32m  OK  [0m] Started Initial cloud-init job (pre-networking).
         Starting Initial cloud-init job (metadata service crawler)...
[    8.367978] cloud-init[1462]: Cloud-init v. 0.7.6 running 'init' at Thu, 29 Oct 2015 15:10:52 +0000. Up 8.33 seconds.
[    8.386696] cloud-init[1462]: ci-info: +++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++
[    8.389028] cloud-init[1462]: ci-info: +--------+------+---------------+---------------+-------------------+
[    8.390794] cloud-init[1462]: ci-info: | Device |  Up  |    Address    |      Mask     |     Hw-Address    |
[    8.392219] cloud-init[1462]: ci-info: +--------+------+---------------+---------------+-------------------+
[    8.394476] cloud-init[1462]: ci-info: |  lo:   | True |   127.0.0.1   |   255.0.0.0   |         .         |
[    8.395986] cloud-init[1462]: ci-info: | eth1:  | True | 172.16.135.11 | 255.255.255.0 | fa:16:3e:8c:75:20 |
[    8.397915] cloud-init[1462]: ci-info: | eth0:  | True |  172.16.69.48 | 255.255.255.0 | fa:16:3e:0b:25:93 |
[    8.399777] cloud-init[1462]: ci-info: +--------+------+---------------+---------------+-------------------+
[    8.401604] cloud-init[1462]: ci-info: ++++++++++++++++++++++++++++++++Route info+++++++++++++++++++++++++++++++++
[    8.403227] cloud-init[1462]: ci-info: +-------+--------------+--------------+---------------+-----------+-------+
[    8.404929] cloud-init[1462]: ci-info: | Route | Destination  |   Gateway    |    Genmask    | Interface | Flags |
[    8.407463] cloud-init[1462]: ci-info: +-------+--------------+--------------+---------------+-----------+-------+
[    8.409562] cloud-init[1462]: ci-info: |   0   |   0.0.0.0    | 172.16.135.1 |    0.0.0.0    |    eth1   |   UG  |
[    8.411223] cloud-init[1462]: ci-info: |   1   |   0.0.0.0    | 172.16.69.1  |    0.0.0.0    |    eth0   |   UG  |
[    8.412785] cloud-init[1462]: ci-info: |   2   | 172.16.69.0  |   0.0.0.0    | 255.255.255.0 |    eth0   |   U   |
[    8.414532] cloud-init[1462]: ci-info: |   3   | 172.16.135.0 |   0.0.0.0    | 255.255.255.0 |    eth1   |   U   |
[    8.416078] cloud-init[1462]: ci-info: +-------+--------------+--------------+---------------+-----------+-------+


However, when booting a RHEL Server Host as the OS, the VM ends up with a single default gateway and has not problems being accessed via the floating IP.  See the similar output from cloud-init


[[32m  OK  [0m] Started Initial cloud-init job (pre-networking).
[[32m  OK  [0m] Started Network Manager Wait Online.
         Starting LSB: Bring up/down networking...

Red Hat Enterprise Linux Server 7.2 (Maipo)
Kernel 3.10.0-326.el7.x86_64 on an x86_64

micah-qeos-test-ansible-rhel7 login: [   12.660738] cloud-init[813]: Cloud-init v. 0.7.6 running 'init' at Thu, 29 Oct 2015 15:21:08 +0000. Up 12.61 seconds.
[   12.730332] cloud-init[813]: ci-info: ++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++
[   12.730792] cloud-init[813]: ci-info: +--------+------+-------------+---------------+-------------------+
[   12.731917] cloud-init[813]: ci-info: | Device |  Up  |   Address   |      Mask     |     Hw-Address    |
[   12.732760] cloud-init[813]: ci-info: +--------+------+-------------+---------------+-------------------+
[   12.733529] cloud-init[813]: ci-info: |  lo:   | True |  127.0.0.1  |   255.0.0.0   |         .         |
[   12.734515] cloud-init[813]: ci-info: | eth1:  | True |      .      |       .       | fa:16:3e:f0:ac:f5 |
[   12.735285] cloud-init[813]: ci-info: | eth0:  | True | 172.16.69.4 | 255.255.255.0 | fa:16:3e:bd:ba:c2 |
[   12.736030] cloud-init[813]: ci-info: +--------+------+-------------+---------------+-------------------+
[   12.736958] cloud-init[813]: ci-info: +++++++++++++++++++++++++++++++Route info++++++++++++++++++++++++++++++++
[   12.737644] cloud-init[813]: ci-info: +-------+-------------+-------------+---------------+-----------+-------+
[   12.738531] cloud-init[813]: ci-info: | Route | Destination |   Gateway   |    Genmask    | Interface | Flags |
[   12.753759] cloud-init[813]: ci-info: +-------+-------------+-------------+---------------+-----------+-------+
[   12.754338] cloud-init[813]: ci-info: |   0   |   0.0.0.0   | 172.16.69.1 |    0.0.0.0    |    eth0   |   UG  |
[   12.755139] cloud-init[813]: ci-info: |   1   | 172.16.69.0 |   0.0.0.0   | 255.255.255.0 |    eth0   |   U   |
[   12.755884] cloud-init[813]: ci-info: +-------+-------------+-------------+---------------+-----------+-------+



I discussed this with Colin Walters and he suspected an issue between NetworkManager and cloud-init

<walters> i suspect this a network-manager-vs-initscripts issue

<walters> if anything could be improved here it'd be some intersection between NetworkManager and cloud-init


It is possible to work around this problem by specifying a single network to be used when booting the VM.

Comment 2 Lars Kellogg-Stedman 2015-10-29 23:07:36 UTC
I agree that the difference is probably NetworkManager vs. legacy init scripts.  It looks as if NetworkManager is configured such that it will attempt to activate any available interfaces using DHCP.

There are a couple of ways of looking at this issue.

It is not unreasonable that a system -- especially one optimized for cloud environments where all networks are attached explicitly -- will try to activate any available networks automatically.  One way of addressing this problem is ensuring that you do not set a default route on subnets that will not be used for outbound access:

  neutron subnet-create --no-gateway ...

As long as only one of the networks has a gateway defined, there won't be a problem.

In the case that you do need to attach two networks that both define default routes, there are a few options.  In my test environment, this does not appear to cause a problem; I end up with a routing table that looks like:

  default via 10.0.0.1 dev eth0  proto static  metric 100 
  default via 10.1.0.1 dev eth1  proto static  metric 101 
  10.0.0.0/24 dev eth0  proto kernel  scope link  src 10.0.0.59  metric 100 
  10.1.0.0/24 dev eth1  proto kernel  scope link  src 10.1.0.12  metric 100 
  172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.42.1 

Note that the two default routes have different metrics, so the first will be used preferentially, and since this is the network associated with the instance's floating ip, everything works as expected.  So, one option is simply to ensure that the first network attached to the instance is the one with a connection to a floating-ip network.

I am running a recent download of the RHEL Atomic Host 7.1 image, which suggests you should be seeing similar behavior.

Alternate solutions include fixing the network configuration via a cloud-init script, which may require providing configuration metadata via the config-drive option rather than over the network (if the routing table is preventing access to the metadata service). I do not recommend this option; I think either of the above options is preferable:

- Do not define default gateways on which they are not required, or
- Always make sure the first network attached is the one that will be used by connections to the floating-ip.

Please let me know if this helps out.