Bug 1316730 - os-net-config fails to bring up VLANs on a Linux Bond without a bridge present
Summary: os-net-config fails to bring up VLANs on a Linux Bond without a bridge present
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: os-net-config
Version: 8.0 (Liberty)
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ga
: 8.0 (Liberty)
Assignee: Dan Sneddon
QA Contact: Ofer Blaut
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-10 22:46 UTC by Dan Sneddon
Modified: 2016-04-07 21:49 UTC (History)
9 users (show)

Fixed In Version: os-net-config-0.2.2-1.el7ost
Doc Type: Bug Fix
Doc Text:
In previous releases, when VLAN interfaces were placed directly on a Linux kernel bond with no bridge, it was possible for the VLANs to start before the bond. When this occurred, the VLANs failed to start. With this release, the os-net-config utility now starts the physical network (namely, bridges first, then bonds and interfaces) before VLANs. This ensures that the VLANs have the interfaces necessary to start properly.
Clone Of:
Environment:
Last Closed: 2016-04-07 21:49:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 291420 0 None MERGED Fix order-of-operations bug in os-net-config restart_interfaces 2020-09-25 11:15:18 UTC
Red Hat Product Errata RHEA-2016:0604 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 8 director Enhancement Advisory 2016-04-08 01:03:56 UTC

Description Dan Sneddon 2016-03-10 22:46:56 UTC
Description of problem:
os-net-config will ensure that bridges (and the member interfaces) are brought up before the VLANs are enabled. If there is no bridge, then the ordering can be incorrect, and os-net-config might try to bring up a member VLAN before the bond that the VLAN is on is up.

Version-Release number of selected component (if applicable):
os-net-config-0.1.6-1.el7ost

How reproducible:
Often, but less than 100%

Steps to Reproduce:
1. Create NIC configs with a Linux bond with member VLANs but no bridge
2. Deploy
3.

Actual results:
It is possible that one or more of the VLANs may be brought up before the bond. This can cause the VLAN to fail to be enabled, and the operator must manually enable the VLAN after deployment.

Expected results:
The VLANs should be enabled by os-net-config

Additional info:
I have a patch for this up for review here: https://review.openstack.org/291420

Comment 7 Dan Sneddon 2016-04-06 20:56:50 UTC
Here is a controller.yaml that would trigger the bug:

### Begin controller.yaml #######
heat_template_version: 2015-04-30

description: >
  Software Config to drive os-net-config with 2 bonded nics on a bridge
  with VLANs attached for the controller role.

parameters:
  ControlPlaneIp:
    default: ''
    description: IP address/subnet on the ctlplane network
    type: string
  ExternalIpSubnet:
    default: ''
    description: IP address/subnet on the external network
    type: string
  InternalApiIpSubnet:
    default: ''
    description: IP address/subnet on the internal API network
    type: string
  StorageIpSubnet:
    default: ''
    description: IP address/subnet on the storage network
    type: string
  StorageMgmtIpSubnet:
    default: ''
    description: IP address/subnet on the storage mgmt network
    type: string
  TenantIpSubnet:
    default: ''
    description: IP address/subnet on the tenant network
    type: string
  ManagementIpSubnet: # Only populated when including environments/network-management.yaml
    default: ''
    description: IP address/subnet on the management network
    type: string
  BondInterfaceOvsOptions:
    default: 'bond_mode=active-backup'
    description: The ovs_options string for the bond interface. Set things like
                 lacp=active and/or bond_mode=balance-slb using this option.
    type: string
  ExternalNetworkVlanID:
    default: 10
    description: Vlan ID for the external network traffic.
    type: number
  InternalApiNetworkVlanID:
    default: 20
    description: Vlan ID for the internal_api network traffic.
    type: number
  StorageNetworkVlanID:
    default: 30
    description: Vlan ID for the storage network traffic.
    type: number
  StorageMgmtNetworkVlanID:
    default: 40
    description: Vlan ID for the storage mgmt network traffic.
    type: number
  TenantNetworkVlanID:
    default: 50
    description: Vlan ID for the tenant network traffic.
    type: number
  ManagementNetworkVlanID:
    default: 60
    description: Vlan ID for the management network traffic.
  ExternalInterfaceDefaultRoute:
    default: '10.0.0.1'
    description: default route for the external network
    type: string
  ControlPlaneSubnetCidr: # Override this via parameter_defaults
    default: '24'
    description: The subnet CIDR of the control plane network.
    type: string
  DnsServers: # Override this via parameter_defaults
    default: []
    description: A list of DNS servers (2 max for some implementations) that will be added to resolv.conf.
    type: comma_delimited_list
  EC2MetadataIp: # Override this via parameter_defaults
    description: The IP address of the EC2 metadata server.
    type: string

resources:
  OsNetConfigImpl:
    type: OS::Heat::StructuredConfig
    properties:
      group: os-apply-config
      config:
        os_net_config:
          network_config:
            -
              type: interface
              name: nic1
              use_dhcp: false
              addresses:
                -
                  ip_netmask:
                    list_join:
                      - '/'
                      - - {get_param: ControlPlaneIp}
                        - {get_param: ControlPlaneSubnetCidr}
              routes:
                -
                  ip_netmask: 169.254.169.254/32
                  next_hop: {get_param: EC2MetadataIp}
            -
              type: linux_bond
              name: bond1
              bonding_options: {get_param: BondInterfaceOvsOptions}
              dns_servers: {get_param: DnsServers}
              members:
                -
                  type: interface
                  name: nic2
                  primary: true
                -
                  type: interface
                  name: nic3
            -
              type: vlan
              device: bond1
              vlan_id: {get_param: ExternalNetworkVlanID}
              addresses:
                -
                  ip_netmask: {get_param: ExternalIpSubnet}
              routes:
                -
                  default: true
                  next_hop: {get_param: ExternalInterfaceDefaultRoute}
            -
              type: vlan
              device: bond1
              vlan_id: {get_param: InternalApiNetworkVlanID}
              addresses:
                -
                  ip_netmask: {get_param: InternalApiIpSubnet}
            -
              type: vlan
              device: bond1
              vlan_id: {get_param: StorageNetworkVlanID}
              addresses:
                -
                  ip_netmask: {get_param: StorageIpSubnet}
            -
              type: vlan
              device: bond1
              vlan_id: {get_param: StorageMgmtNetworkVlanID}
              addresses:
                -
                  ip_netmask: {get_param: StorageMgmtIpSubnet}
            -
              type: vlan
              device: bond1
              vlan_id: {get_param: TenantNetworkVlanID}
              addresses:
                -
                  ip_netmask: {get_param: TenantIpSubnet}
            # Uncomment when including environments/network-management.yaml
            #-
            #  type: vlan
            #  device: bond1
            #  vlan_id: {get_param: ManagementNetworkVlanID}
            #  addresses:
            #    -
            #      ip_netmask: {get_param: ManagementIpSubnet}

outputs:
  OS::stack_id:
    description: The OsNetConfigImpl resource.
    value: {get_resource: OsNetConfigImpl}
##### End controller.yaml #####

Also, here is a config.yaml that can be fed to os-net-config to test. This is a YAML translation of the /etc/os-net-config/config.json from a Controller node with this configuration. When the bug was triggered, one of the VLANs would be brought up before the bond and the interfaces. This can be tested by copying this file to an overcloud node and running "sudo os-net-config --noop --debug -c config.yaml" and looking at the order in which the interfaces would be brought up (it's just a dry run, nothing will be changed). If the bond and the bond slave interfaces are brought up first, then the bug is fixed. Note that this must be tested on a controller with at least 3 NICs, otherwise the nic1, nic2, nic3 abstractions will fail.

#### Begin config.yaml #####
network_config: 
  - 
    routes: 
      - 
        ip_netmask: "169.254.169.254/32"
        next_hop: "192.0.2.1"
    use_dhcp: false
    type: "interface"
    name: "nic1"
    addresses: 
      - 
        ip_netmask: "192.0.2.10/24"
  - 
    type: "linux_bond"
    name: "bond1"
    members: 
      - 
        type: "interface"
        name: "nic2"
        primary: true
      - 
        type: "interface"
        name: "nic3"
    bonding_options: "mode=active-backup"
  - 
    device: "bond1"
    routes: 
      - 
        ip_netmask: "0.0.0.0/0"
        next_hop: "10.0.0.1"
    type: "vlan"
    addresses: 
      - 
        ip_netmask: "10.0.0.5/24"
    vlan_id: 10
  - 
    device: "bond1"
    type: "vlan"
    addresses: 
      - 
        ip_netmask: "172.16.2.7/24"
    vlan_id: 20
  - 
    device: "bond1"
    type: "vlan"
    addresses: 
      - 
        ip_netmask: "172.16.1.7/24"
    vlan_id: 30
  - 
    device: "bond1"
    type: "vlan"
    addresses: 
      - 
        ip_netmask: "172.16.3.5/24"
    vlan_id: 40
  - 
    device: "bond1"
    type: "vlan"
    addresses: 
      - 
        ip_netmask: "172.16.0.4/24"
    vlan_id: 50

#### End config.yaml #####

Comment 9 errata-xmlrpc 2016-04-07 21:49:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0604.html


Note You need to log in before you can comment on or make changes to this bug.