>> Description of problem: RHEL OSP require a minimum of 2 NICs: "..., One NIC for the Provisioning network on the native VLAN and the other NIC for tagged VLANs that use subnets for the different Overcloud network types." Most customers do not want to use single NICs but wants to use bonds to secure production environment. So the minimum is 3 NICs when the customer use 1 bond: - NIC1 for Provisioning network - NIC2 and NIC3 for tagged VLANs For certain type of hardware (chassis) it's very expensive to add a third NIC or the customer who has 4 NICs wants to use 2 bonds to split management data and customer data. >> RFE Multiple customers asked to use only 1 bond to deploy OpenStack. >> Gap analysis for RHEL OSP-d: The scenario could be: 1. PXE boot server from first NIC. 2. install OS via PXE. 3. os-net-config reconfigure the networking to move the NIC that used to PXE boot to be part of a bond. 4. Install the services on top of the OS. For an LACP bond to work in this scenario, there will have to be switch support to fall back to single link(s) if no LACP is negotiated. That will allow PXE boot to happen over one link, then the bond can be established later. It also might be possible to do a software bond that doesn't require switch support, using a mode of OVS bonding that plays some tricks with ARP to support load balancing two independent links. I hope to test this beginning today, but again there are no guarantees. >> Network hardware contraints For a customer GPS has used some specific features from Arista Fall-back to use only one bond for RHEL OSP 6 based: https://eos.arista.com/configuring-port-channel-lacp-fallback-on-arista-switches/ "The LACP Fallback mode in Arista switches allows an active LACP interface to establish a port-channel (LAG) before it receives LACP PDUs from its peer. This feature is useful in environments where customers have Preboot Execution Environment (PXE) Servers connected with a LACP Port Channel to the switch." The solution have to worked not only with Arista, I've tried to prepare some Cisco and Juniper setup to work like the Arista feature. Here is the possible configuration tested on various Cisco equipment (tested from old 2950 to 4500-X) and Juniper. The main idea is to set LACP passive mode on the Cisco side. So link aggregation occurs only if negotiation is successful. We also set a ip helper option on the first interface used as a fallback: interface GigabitEthernet1/1 no ip address switchport channel-group 1 mode passive ip helper-address 10.10.10.8 interface GigabitEthernet2/1 no ip address switchport channel-group 1 mode passive Doing the same test in a Juniper MX960 (JUNOS 11.4R7.5) aggregated-ether-options { lacp { passive; } } helpers { bootp { server 10.10.10.8; interface { vlan.20; } } }
Requesting feedback from Dan Sneddon, as he's the OSPd networking expert.
I just tested it with active-backup as the bonding mode and it works fine. My understanding is that if the switches support something like the Arista LACP fallback this should work fine. This is the template I tested with for the controller NICs: resources: OsNetConfigImpl: type: OS::Heat::StructuredConfig properties: group: os-apply-config config: os_net_config: network_config: - type: ovs_bridge name: br-provisioning use_dhcp: false dns_servers: {get_param: DnsServers} addresses: - ip_netmask: list_join: - '/' - - {get_param: ControlPlaneIp} - {get_param: ControlPlaneSubnetCidr} routes: - ip_netmask: 169.254.169.254/32 next_hop: {get_param: EC2MetadataIp} members: - type: ovs_bond name: bond-prov ovs_options: bond_mode=active-backup members: - type: interface name: nic1 primary: true - type: interface name: nic2 - type: vlan vlan_id: {get_param: ExternalNetworkVlanID} addresses: - ip_netmask: {get_param: ExternalIpSubnet} routes: - defroute: true next_hop: {get_param: ExternalInterfaceDefaultRoute} - type: vlan vlan_id: {get_param: InternalApiNetworkVlanID} addresses: - ip_netmask: {get_param: InternalApiIpSubnet} - type: vlan vlan_id: {get_param: StorageNetworkVlanID} addresses: - ip_netmask: {get_param: StorageIpSubnet} - type: vlan vlan_id: {get_param: TenantNetworkVlanID} addresses: - ip_netmask: {get_param: TenantIpSubnet} - type: interface name: nic3 use_dhcp: false defroute: false As you can see all the services are in the same bond which uses has nic1 as one of its two bond slaves. This test worked out of the box.
(In reply to Ramon Acedo from comment #7) > I just tested it with active-backup as the bonding mode and it works fine. > > My understanding is that if the switches support something like the Arista > LACP fallback this should work fine. Ramon is correct. Active/backup links are confirmed to be working. LACP links with LACP fallback should also work fine, and we have reports that they do, but we haven't done that testing in house yet. Note that LACP fallback is only supported on a limited number of switch makes/models, and the configuration can be a little tricky. Testing should be done on site to ensure that the configuration is allowing for introspection and deployment PXE boot.
Dan, based on the information you wrote above, is there something else left to be implemented or can we move it to QA only? Thanks
(In reply to Jaromir Coufal from comment #9) > Dan, based on the information you wrote above, is there something else left > to be implemented or can we move it to QA only? Thanks This works. It is fully implemented. We have customers using this configuration in production. If you want to send it to QE, go ahead. It will pass, and if they are having trouble then I can help them. Part of the problem may be that this bug applies when using this configuration (and the associated workarounds are required): https://bugzilla.redhat.com/show_bug.cgi?id=1234601