Description of problem: Using OFI I am attempting to deploy neutron that using separate NICs/networks for: - management/APIs - neutron tenant traffic - neutron L3 traffic - storage I was able to do this using VLANs. With VLANs the tenant interface does not need an assigned IP. Next I attempted to use VxLAN. For this the tenant interface does need an IP. before letting puppet run, the NIC was assigned a static IP. After puppet, 4 of the 5 systems had set the bridge (br-tenant) that using the original NIC as a slave, was created is using DHCP. Tenant traffic did not flow with this config. I redeployed again, this time I manually changed the ifcfg-tenant files to have a static IP before testing and did not see any issues. I asked some engineers about this and here is some of the email exchange: > >If I could ask for some ideas/assistance when you have time/cycles > >related to deploying Neutron/vxlan using OFI. Looking at the puppet, I > >believe the tunnel bridge device should come from the ip_lookup of > >ovs_{network,iface}. I have network set to '' and iface was teh NIC > Steve, is this the config setting for local_ip for the ovs agent to > use? If so, looking at the puppet in that area, we look it up with: > > $local_ip = find_ip("$ovs_tunnel_network", > ["$ovs_tunnel_iface","$external_network_bridge"], > "") > So, on the first run, it will resolve to ovs_tunnel_iface, and once > the IP is moved, it should use external_network_bridge. This all > relies on system information provided by facter, so it might be worth > verifying on the problem nodes that 'facter -p' returns what you > expect to see. > I am confused as to why this would be the external_network_bridge? Since the OVS tunnel is for the tenant traffic, you would want to keep that private. My external bridge is br-ex. I understand some small/VMs configs may use a single bridge for tenant and L3, but wonder how real this is. I checked my facts and there is no address for br-ex on any system, including the one that was configured correctly. Also, I manually added the IP from the NIC used for tenant traffic to the bridge used for internal and my tests passed. spr > >that had the IP before running puppet (eventually become bridge slave), > >however for my 5 nodes only one got the static IP and the 4 other > >resorted to DHCP (which is not configured). I confirmed that if I > >manually set them all statically, neutron worked as expected. > > > >On the node that it used the existing IP, I did find one slight > >difference in the puppet output as compared to the rest. > > > >^[[0;36mDebug: Executing '/usr/bin/ovs-vsctl add-port br-tennant p3p1'^[[0m > >^[[0;36mDebug: Executing '/usr/sbin/ip addr show p3p1'^[[0m > >^[[0;36mDebug: Executing '/usr/sbin/ifdown br-tennant'^[[0m > > The code for this is in > /usr/share/openstack-puppet-modules/modules/vswitch/lib/puppet/provider/vs_bridge/ovs.rb > and > /usr/share/openstack-puppet-modules/modules/vswitch/lib/puppet/provider/vs_port/ovs* > (there are 3 different impls based on your system, you likely have > ovs_redhat.rb). Look at the create method in each. My guess is that > the bridge doesnt get created, and thus the call to create the port > never executes that addr show. Not sure if that helps at all. > > -j > > > > > >On the other systems the "/ip addr show" line is missing. I did a > >recursive search under /usr/share for that string and didn't find it. I > >am now grepping from / but it has been going for a while. > > > >spr > > Version-Release number of selected component (if applicable): [root@ospha-inst ml2]# yum list installed | grep -e foreman -e puppet foreman.noarch 1.6.0.49-6.el7ost @RH7-RHOS-6.0-Installer foreman-installer.noarch 1:1.6.0-0.2.RC1.el7ost foreman-postgresql.noarch 1.6.0.49-6.el7ost @RH7-RHOS-6.0-Installer foreman-proxy.noarch 1.6.0.30-5.el7ost @RH7-RHOS-6.0-Installer foreman-selinux.noarch 1.6.0.14-1.el7sat @RH7-RHOS-6.0-Installer openstack-foreman-installer.noarch 3.0.13-1.el7ost @RH7-RHOS-6.0-Installer openstack-puppet-modules.noarch 2014.2.8-2.el7ost @RH7-RHOS-6.0-Installer puppet.noarch 3.6.2-2.el7 @RH7-RHOS-6.0-Installer puppet-server.noarch 3.6.2-2.el7 @RH7-RHOS-6.0-Installer ruby193-rubygem-foreman_openstack_simplify.noarch rubygem-foreman_api.noarch 0.1.11-6.el7sat @RH7-RHOS-6.0-Installer rubygem-hammer_cli_foreman.noarch 0.1.1-16.el7sat @RH7-RHOS-6.0-Installer rubygem-hammer_cli_foreman-doc.noarch [root@ospha-inst ml2]# How reproducible: each deployment I attempted so far Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I wanted to add to this bug that I have seen in a VLAN tenant network based deployment not all the controllers' interfaces for br-ex were setup correctly. In my deployment, I have 5 controllers, and 2 of the 5 were missing: OVSDHCPINTERFACES=<external network interface> OVS_EXTRA="set bridge br-ex other-config:hwaddr=<above interface's mac>" e.g. OVSDHCPINTERFACES=eno1 OVS_EXTRA="set bridge br-ex other-config:hwaddr=b8:ca:3a:61:42:d0" Furthermore, the same settings (OVSDHCPINTERFACES and OVS_EXTRA) were missing off the br-eno2 interface definition (eno2 is the physical interface associated with tenant traffic on my deployment). Without the OVS options in the /etc/sysconfig/network-scripts/ifcfg-br-eno2, the br-eno2 interface does not come up with an address. And even with the VLAN tenant network type, I was unable to ping guests once they were launched from a host on the tenant network VLAN (outside of OSP). Once the options were added and controllers rebooted, I was able to network on the tenant network properly. Note that my deployment is OSP5.
This looks to me like something that puppet-vswitch provider should be handling. Gilles, you have worked on that area before, any thoughts, or am I off base here?
(In reply to Kambiz Aghaiepour from comment #5) > Note that my deployment is OSP5. The vswitch providers have changed between OSP5 and OSP6 and depending on OSP5 version as well, fixing earlier bugs. (In reply to Jason Guiditta from comment #6) > This looks to me like something that puppet-vswitch provider should be > handling. I'm not sure such scenario, as described in comment #0, is covered by OFI. The vswitch providers vs_bridge/vs_port when defined will create an OVS bridge and attach a port (interface) to it, making it resilient by adding ifcfg accordingly. This is normally happening by default on a neutron network l3 agent. The rest is beyond vswitch scope.
We are seeing it multiple times on HA controller nodes on node reboot. Need to bump its priority to be fixed in A2.
From what I understand about the initial problem's description, there is no issue here, unless other behaviour is expected from either puppet-vswitch or OFI, in which case I'd suggest to create an RFE accordingly. The puppet-vswitch actual default behaviour is: If the physical interface to be attached to the bridge exists but has no link (interface is down) then the bridge is configured with DHCP because there is no IP address to transfer over from the physical interface. On the contrary if the link is up, the existing physical interface's configuration is associated to the bridge configuration, whether it's static or dynamic. In all cases at the end of the process no IP address (neither static or dynamic) is available the physical interface. This behaviour might be confusing especially when no IP address is desired on the physical address. (In reply to arkady kanevsky from comment #8) > We are seeing it multiple times on HA controller nodes on node reboot. > Need to bump its priority to be fixed in A2. Is it, after reboot, a bridge interface ends up defined as DHCP but expected as static? If yes the assign an IP to the physical interface before hand. If no, could you please provide more information and describe in details what you're seeing and what's expected.
We are seeing the following. After the install is completed, the interfaces information for IP is removed and not set to DHCP. We discovered this on reboot, the interface did not come up and connectivity is not there. We do not know when it happens.
Addressing comments 7 & 9. I have been using OFI in support of the Dell solution for several releases. #7 Can you explain "I'm not sure such scenario, as described in comment #0, is covered by OFI." since I've been doing this for a while. If it is having separate NIC for tenant and L3 I show you it. #9 The br-ex is not starting. That is a problem. This is the discussion Jay and I had on it. As Jay said in comment 0 > So, on the first run, it will resolve to ovs_tunnel_iface, and once > the IP is moved, it should use external_network_bridge. I question what the tunnel interface would be expecting tunnel traffic on the external bridge. As I said, my config keeps these on separate NICs as we have been doing since OSP3.
Could you please provide: The configuration of the network interfaces prior and after installation The openstack configuration used for installation.
Verified on A2. I deployed HA-neutron (3 controllers, 1 compute) with separate subnets for tenant/external/public-api,admin,management. the tenant subnet configured with ipam=none/boot-mod=dhcp. the deployment didn't had any problem related to puppet waiting for ip for the tenant. After deployment finished tried to run some instances and boot the controllers, the system boot-up and the bond kept the ip for the bond interface. rhel-osp-installer-client-0.5.7-1.el7ost.noarch foreman-installer-1.6.0-0.3.RC1.el7ost.noarch openstack-foreman-installer-3.0.17-1.el7ost.noarch rhel-osp-installer-0.5.7-1.el7ost.noarch puppet-3.6.2-2.el7.noarch puppet-server-3.6.2-2.el7.noarch openstack-puppet-modules-2014.2.8-2.el7ost.noarch
Can you also make sure you can ssh into a deployed instance on public IP address? And ssh between 2 instances on the same project on private IP addresses.
I ran Rally boot-run-command on it, rally create a vm, then ssh into it using paramiko and running a script. the tests completed good.
Asaf, Can you confirm if this was validated on bare metal or a virtualized setup?
The setup is built of BM hosts connected with BOND of 2X10G using Trunk ( Controllers and Compute). External network/public/tenant/Admin API networks runs on these bonds using different VLANs. Tenant network (VXLAN) is using External DHCP over Bond Native VLAN. Host provision network use different 1G interface Ofer
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0791.html