Description of problem: os-net-config will ensure that bridges (and the member interfaces) are brought up before the VLANs are enabled. If there is no bridge, then the ordering can be incorrect, and os-net-config might try to bring up a member VLAN before the bond that the VLAN is on is up. Version-Release number of selected component (if applicable): os-net-config-0.1.6-1.el7ost How reproducible: Often, but less than 100% Steps to Reproduce: 1. Create NIC configs with a Linux bond with member VLANs but no bridge 2. Deploy 3. Actual results: It is possible that one or more of the VLANs may be brought up before the bond. This can cause the VLAN to fail to be enabled, and the operator must manually enable the VLAN after deployment. Expected results: The VLANs should be enabled by os-net-config Additional info: I have a patch for this up for review here: https://review.openstack.org/291420
Here is a controller.yaml that would trigger the bug: ### Begin controller.yaml ####### heat_template_version: 2015-04-30 description: > Software Config to drive os-net-config with 2 bonded nics on a bridge with VLANs attached for the controller role. parameters: ControlPlaneIp: default: '' description: IP address/subnet on the ctlplane network type: string ExternalIpSubnet: default: '' description: IP address/subnet on the external network type: string InternalApiIpSubnet: default: '' description: IP address/subnet on the internal API network type: string StorageIpSubnet: default: '' description: IP address/subnet on the storage network type: string StorageMgmtIpSubnet: default: '' description: IP address/subnet on the storage mgmt network type: string TenantIpSubnet: default: '' description: IP address/subnet on the tenant network type: string ManagementIpSubnet: # Only populated when including environments/network-management.yaml default: '' description: IP address/subnet on the management network type: string BondInterfaceOvsOptions: default: 'bond_mode=active-backup' description: The ovs_options string for the bond interface. Set things like lacp=active and/or bond_mode=balance-slb using this option. type: string ExternalNetworkVlanID: default: 10 description: Vlan ID for the external network traffic. type: number InternalApiNetworkVlanID: default: 20 description: Vlan ID for the internal_api network traffic. type: number StorageNetworkVlanID: default: 30 description: Vlan ID for the storage network traffic. type: number StorageMgmtNetworkVlanID: default: 40 description: Vlan ID for the storage mgmt network traffic. type: number TenantNetworkVlanID: default: 50 description: Vlan ID for the tenant network traffic. type: number ManagementNetworkVlanID: default: 60 description: Vlan ID for the management network traffic. ExternalInterfaceDefaultRoute: default: '10.0.0.1' description: default route for the external network type: string ControlPlaneSubnetCidr: # Override this via parameter_defaults default: '24' description: The subnet CIDR of the control plane network. type: string DnsServers: # Override this via parameter_defaults default: [] description: A list of DNS servers (2 max for some implementations) that will be added to resolv.conf. type: comma_delimited_list EC2MetadataIp: # Override this via parameter_defaults description: The IP address of the EC2 metadata server. type: string resources: OsNetConfigImpl: type: OS::Heat::StructuredConfig properties: group: os-apply-config config: os_net_config: network_config: - type: interface name: nic1 use_dhcp: false addresses: - ip_netmask: list_join: - '/' - - {get_param: ControlPlaneIp} - {get_param: ControlPlaneSubnetCidr} routes: - ip_netmask: 169.254.169.254/32 next_hop: {get_param: EC2MetadataIp} - type: linux_bond name: bond1 bonding_options: {get_param: BondInterfaceOvsOptions} dns_servers: {get_param: DnsServers} members: - type: interface name: nic2 primary: true - type: interface name: nic3 - type: vlan device: bond1 vlan_id: {get_param: ExternalNetworkVlanID} addresses: - ip_netmask: {get_param: ExternalIpSubnet} routes: - default: true next_hop: {get_param: ExternalInterfaceDefaultRoute} - type: vlan device: bond1 vlan_id: {get_param: InternalApiNetworkVlanID} addresses: - ip_netmask: {get_param: InternalApiIpSubnet} - type: vlan device: bond1 vlan_id: {get_param: StorageNetworkVlanID} addresses: - ip_netmask: {get_param: StorageIpSubnet} - type: vlan device: bond1 vlan_id: {get_param: StorageMgmtNetworkVlanID} addresses: - ip_netmask: {get_param: StorageMgmtIpSubnet} - type: vlan device: bond1 vlan_id: {get_param: TenantNetworkVlanID} addresses: - ip_netmask: {get_param: TenantIpSubnet} # Uncomment when including environments/network-management.yaml #- # type: vlan # device: bond1 # vlan_id: {get_param: ManagementNetworkVlanID} # addresses: # - # ip_netmask: {get_param: ManagementIpSubnet} outputs: OS::stack_id: description: The OsNetConfigImpl resource. value: {get_resource: OsNetConfigImpl} ##### End controller.yaml ##### Also, here is a config.yaml that can be fed to os-net-config to test. This is a YAML translation of the /etc/os-net-config/config.json from a Controller node with this configuration. When the bug was triggered, one of the VLANs would be brought up before the bond and the interfaces. This can be tested by copying this file to an overcloud node and running "sudo os-net-config --noop --debug -c config.yaml" and looking at the order in which the interfaces would be brought up (it's just a dry run, nothing will be changed). If the bond and the bond slave interfaces are brought up first, then the bug is fixed. Note that this must be tested on a controller with at least 3 NICs, otherwise the nic1, nic2, nic3 abstractions will fail. #### Begin config.yaml ##### network_config: - routes: - ip_netmask: "169.254.169.254/32" next_hop: "192.0.2.1" use_dhcp: false type: "interface" name: "nic1" addresses: - ip_netmask: "192.0.2.10/24" - type: "linux_bond" name: "bond1" members: - type: "interface" name: "nic2" primary: true - type: "interface" name: "nic3" bonding_options: "mode=active-backup" - device: "bond1" routes: - ip_netmask: "0.0.0.0/0" next_hop: "10.0.0.1" type: "vlan" addresses: - ip_netmask: "10.0.0.5/24" vlan_id: 10 - device: "bond1" type: "vlan" addresses: - ip_netmask: "172.16.2.7/24" vlan_id: 20 - device: "bond1" type: "vlan" addresses: - ip_netmask: "172.16.1.7/24" vlan_id: 30 - device: "bond1" type: "vlan" addresses: - ip_netmask: "172.16.3.5/24" vlan_id: 40 - device: "bond1" type: "vlan" addresses: - ip_netmask: "172.16.0.4/24" vlan_id: 50 #### End config.yaml #####
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-0604.html