Bug 1290568 - local_interface=bond0.120 in undercloud.conf create broken network configuration
Summary: local_interface=bond0.120 in undercloud.conf create broken network configuration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: os-net-config
Version: 8.0 (Liberty)
Hardware: Unspecified
OS: Unspecified
urgent
unspecified
Target Milestone: ga
: 8.0 (Liberty)
Assignee: RHOS Maint
QA Contact: Ofer Blaut
URL:
Whiteboard:
Depends On: 1283812
Blocks: 1261979
TreeView+ depends on / blocked
 
Reported: 2015-12-10 20:06 UTC by Gonéri Le Bouder
Modified: 2016-04-07 21:43 UTC (History)
20 users (show)

Fixed In Version: os-net-config-0.1.6-1.el7ost
Doc Type: Bug Fix
Doc Text:
Clone Of: 1283812
Environment:
Last Closed: 2016-04-07 21:43:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 248246 0 None None None 2016-02-25 18:50:28 UTC
Red Hat Product Errata RHEA-2016:0604 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 8 director Enhancement Advisory 2016-04-08 01:03:56 UTC

Description Gonéri Le Bouder 2015-12-10 20:06:22 UTC
+++ This bug was initially created as a clone of Bug #1283812 +++

Description of problem:

My bond0.120 interface has a VLAN=120 line in the /etc/sysconfig/network-scripts/ifcfg-bond0.120 file. I can ifup/down it manually.

If I declare the line local_interface=bond0.120 in my undercloud.conf, instack-undercloud will use the config.json.template ( https://github.com/openstack/instack-undercloud/blob/master/elements/undercloud-stack-config/config.json.template ) to generate the /etc/os-net-config/config.json as if it was a standard interface.
os-net-config will come later to prepare the br-ctlplane file. To do so, it will regenerate the /etc/sysconfig/network-scripts/ifcfg-bond0.120 file but will miss the VLAN=120 line.

I'm not sure this can easily be fixed, but a notice or an error message would be really useful.


This is my undercloud.conf:
[DEFAULT]
local_ip = 192.168.120.1/24
local_interface=bond0.120
masquerade_network = 192.168.120.0/24
dhcp_start = 192.168.120.5
dhcp_end = 192.168.120.24
network_cidr = 192.168.120.0/24
network_gateway = 192.168.120.1
inspection_iprange = 192.168.120.100,192.168.120.120
[auth]

instack-undercloud will generate:
[root@fv2sah network-scripts]# cat /etc/os-net-config/config.json 
{"network_config": [{"type": "ovs_bridge", "ovs_extra": ["br-set-external-id br-ctlplane bridge-id br-ctlplane"], "name": "br-ctlplane", "members": [{"type": "interface", "name": "bond0.120", "primary": "true"}], "addresses": [{"ip_netmask": "192.168.120.1/24"}]}]}


The bridge member entry is wrong. bond0.120 is a vlan.


The regenerated ifcfg-bond0.120 /etc/sysconfig/network-scripts/ifcfg-bond0.120
# This file is autogenerated by os-net-config
DEVICE=bond0.120
ONBOOT=yes
HOTPLUG=no
NM_CONTROLLED=no
DEVICETYPE=ovs
TYPE=OVSPort
OVS_BRIDGE=br-ctlplane
BOOTPROTO=none

As a result OVS will generate a bogus br-ctlpane:

root@fv2sah stack]# ovs-vsctl show
1532e349-c0e3-47b9-839e-d84470094823
    Bridge br-ctlplane
        Port "bond0.120"
            Interface "bond0.120"
                error: "could not open network device bond0.120 (No such device)"
        Port br-ctlplane
            Interface br-ctlplane
                type: internal
    ovs_version: "2.4.0"
[root@fv2sah stack]# ifup bond0.120
ERROR    : [/etc/sysconfig/network-scripts/ifup-eth] Device bond0.120 does not seem to be present, delaying initialization.
ovs-vsctl: Error detected while setting up 'bond0.120'.  See ovs-vswitchd log for details.


If I add the missing vlan definition in the /etc/sysconfig/network-scripts/ifcfg-bond0.120 file:
VLAN=yes
I can now do a ifup bond0.120 and ovs now see the interface

[root@fv2sah network-scripts]# ovs-vsctl show
1532e349-c0e3-47b9-839e-d84470094823
    Bridge br-ctlplane
        Port "bond0.120"
            Interface "bond0.120"
        Port br-ctlplane
            Interface br-ctlplane
                type: internal
    ovs_version: "2.4.0"

--- Additional comment from Gonéri Le Bouder on 2015-11-19 21:13:14 EST ---

This is the correct os-net-config configuration:
{"network_config": [{"type": "ovs_bridge", "ovs_extra": ["br-set-external-id br-ctlplane bridge-id br-ctlplane"], "name": "br-ctlplane", "members": [{"type": "vlan", "name": "bond0", vlan_id: 120}], "addresses": [{"ip_netmask": "192.168.120.1/24"}]}]}

--- Additional comment from Gonéri Le Bouder on 2015-11-20 13:58:42 EST ---

Ok, it's more complicated than expected. the syntax above allow the creation of a internal OVS VLAN. That's not what we want. We just need the VLAN=true in /etc/sysconfig/network-scripts/ifcfg-bond0.120 and that's something os-net-config cannot do (yet?).

--- Additional comment from Gonéri Le Bouder on 2015-11-20 16:01:53 EST ---

I'd just submitted https://review.openstack.org/248246 upstream to fix the issue.

Comment 2 Gonéri Le Bouder 2016-01-18 16:28:17 UTC
Hello,

It's really common to see 802.1q VLAN is use and this bug make the situation really painful. The user can spend days to understand why s/he get disconnect in the middle of a bunch of scripts. The problem is very subtle and there is absolutely no error message. Worst, if you don't have a console access to the server, there is no way to get the connection back.

Comment 3 arkady kanevsky 2016-01-18 18:01:09 UTC
Does this happens for bonding by OVS and for Linux bridging?

Comment 4 Gonéri Le Bouder 2016-01-18 18:29:50 UTC
Hello Arkady,

Yes, if the VLAN is configured on a bonding, the issue remains.

This is not exactly a blocker for JetStream because you deploy the undercloud in a VM machine directly plugged on the correct VLAN.

Comment 6 Hugh Brock 2016-02-05 12:23:34 UTC
Dan Prince, just assigned this to you. There is an upstream fix merged, can you just make sure it gets backported to stable so we can ship it? Thanks.

Comment 8 Gonéri Le Bouder 2016-03-21 22:09:49 UTC
I'd just pushed a documentation update for this: https://review.openstack.org/295542

Comment 10 Ofer Blaut 2016-04-04 14:08:43 UTC
Tested code is included in os-net-config-0.2.2-1.el7ost.noarch

Comment 12 errata-xmlrpc 2016-04-07 21:43:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0604.html


Note You need to log in before you can comment on or make changes to this bug.