1190185 – OFI not reliably setting IP for tenant bridge when using tunnels

Bug 1190185 - OFI not reliably setting IP for tenant bridge when using tunnels

Summary: OFI not reliably setting IP for tenant bridge when using tunnels

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-foreman-installer
Sub Component:
Version:	6.0 (Juno)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	z2
Target Release:	Installer
Assignee:	Jason Guiditta
QA Contact:	Asaf Hirshberg
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1171850
TreeView+	depends on / blocked

Reported:	2015-02-06 15:13 UTC by Steve Reichard
Modified:	2023-02-22 23:02 UTC (History)
CC List:	19 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-04-07 15:08:16 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:0791	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux OpenStack Platform Installer update	2015-04-07 19:07:29 UTC

Description Steve Reichard 2015-02-06 15:13:32 UTC

Description of problem:

Using OFI I am attempting to deploy neutron that using separate NICs/networks for:
- management/APIs
- neutron tenant traffic
- neutron L3 traffic
- storage

I was able to do this using VLANs.   With VLANs the tenant interface does not need an assigned IP.  Next I attempted to use VxLAN.  For this the tenant interface does need an IP.  before letting puppet run, the NIC was assigned a static IP.   After puppet, 4 of the 5 systems had set the bridge (br-tenant) that using the original NIC as a slave, was created is using DHCP.

Tenant traffic did not flow with this config.  

I redeployed again, this time I manually changed the ifcfg-tenant files to have a static IP  before testing and did not see any issues.


I asked some engineers about this and here is some of the email exchange:



> >If I could ask for some ideas/assistance when you have time/cycles
> >related to deploying Neutron/vxlan using OFI.  Looking at the puppet, I
> >believe the tunnel bridge device should come from the ip_lookup of
> >ovs_{network,iface}.  I have network set to '' and iface was teh NIC
> Steve, is this the config setting for local_ip for the ovs agent to
> use?  If so, looking at the puppet in that area, we look it up with:
> 
>  $local_ip = find_ip("$ovs_tunnel_network",
>                        ["$ovs_tunnel_iface","$external_network_bridge"],
>                         "")
> So, on the first run, it will resolve to ovs_tunnel_iface, and once
> the IP is moved, it should use external_network_bridge.  This all
> relies on system information provided by facter, so it might be worth
> verifying on the problem nodes that 'facter -p' returns what you
> expect to see.
> 

I am confused as to why this would be the external_network_bridge? Since
the OVS tunnel is for the tenant traffic, you would want to keep that
private.  My external bridge is br-ex.   I understand some small/VMs
configs may use a single bridge for tenant and L3, but wonder how real
this is.

I checked my facts and there is no address for br-ex on any system,
including the one that was configured correctly.

Also, I manually added the IP from the NIC used for tenant traffic to
the bridge used for internal and my tests passed.

spr


> >that had the IP before running puppet (eventually become bridge slave),
> >however for my 5 nodes only one got the static IP and the 4 other
> >resorted to DHCP (which is not configured).   I confirmed that if I
> >manually set them all statically, neutron worked as expected.
> >
> >On the node that it used the existing IP, I did find one slight
> >difference in the puppet output as compared to the rest.
> >
> >^[[0;36mDebug: Executing '/usr/bin/ovs-vsctl add-port br-tennant p3p1'^[[0m
> >^[[0;36mDebug: Executing '/usr/sbin/ip addr show p3p1'^[[0m
> >^[[0;36mDebug: Executing '/usr/sbin/ifdown br-tennant'^[[0m
> 
> The code for this is in
> /usr/share/openstack-puppet-modules/modules/vswitch/lib/puppet/provider/vs_bridge/ovs.rb
> and
> /usr/share/openstack-puppet-modules/modules/vswitch/lib/puppet/provider/vs_port/ovs*
> (there are 3 different impls based on your system, you likely have
> ovs_redhat.rb).  Look at the create method in each.  My guess is that
> the bridge doesnt get created, and thus the call to create the port
> never executes that addr show.  Not sure if that helps at all.
> 
> -j
> >
> >
> >On the other systems the "/ip addr show" line is missing.  I did a
> >recursive search under /usr/share for that string and didn't find it. I
> >am now grepping from / but it has been going for a while.
> >
> >spr
> >



Version-Release number of selected component (if applicable):

[root@ospha-inst ml2]# yum list installed | grep -e foreman -e puppet
foreman.noarch                       1.6.0.49-6.el7ost   @RH7-RHOS-6.0-Installer
foreman-installer.noarch             1:1.6.0-0.2.RC1.el7ost
foreman-postgresql.noarch            1.6.0.49-6.el7ost   @RH7-RHOS-6.0-Installer
foreman-proxy.noarch                 1.6.0.30-5.el7ost   @RH7-RHOS-6.0-Installer
foreman-selinux.noarch               1.6.0.14-1.el7sat   @RH7-RHOS-6.0-Installer
openstack-foreman-installer.noarch   3.0.13-1.el7ost     @RH7-RHOS-6.0-Installer
openstack-puppet-modules.noarch      2014.2.8-2.el7ost   @RH7-RHOS-6.0-Installer
puppet.noarch                        3.6.2-2.el7         @RH7-RHOS-6.0-Installer
puppet-server.noarch                 3.6.2-2.el7         @RH7-RHOS-6.0-Installer
ruby193-rubygem-foreman_openstack_simplify.noarch
rubygem-foreman_api.noarch           0.1.11-6.el7sat     @RH7-RHOS-6.0-Installer
rubygem-hammer_cli_foreman.noarch    0.1.1-16.el7sat     @RH7-RHOS-6.0-Installer
rubygem-hammer_cli_foreman-doc.noarch
[root@ospha-inst ml2]# 




How reproducible:

each deployment I attempted so far

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 5 Kambiz Aghaiepour 2015-02-13 16:25:46 UTC

I wanted to add to this bug that I have seen in a VLAN tenant network based deployment not all the controllers' interfaces for br-ex were setup correctly. In my deployment, I have 5 controllers, and 2 of the 5 were missing:

OVSDHCPINTERFACES=<external network interface>
OVS_EXTRA="set bridge br-ex other-config:hwaddr=<above interface's mac>"

e.g.

OVSDHCPINTERFACES=eno1
OVS_EXTRA="set bridge br-ex other-config:hwaddr=b8:ca:3a:61:42:d0"

Furthermore, the same settings (OVSDHCPINTERFACES and OVS_EXTRA) were missing off the br-eno2 interface definition (eno2 is the physical interface associated with tenant traffic on my deployment).

Without the OVS options in the /etc/sysconfig/network-scripts/ifcfg-br-eno2, the br-eno2 interface does not come up with an address.  And even with the VLAN tenant network type, I was unable to ping guests once they were launched from a host on the tenant network VLAN (outside of OSP).  Once the options were added and controllers rebooted, I was able to network on the tenant network properly. 

Note that my deployment is OSP5.

Comment 6 Jason Guiditta 2015-02-16 22:11:57 UTC

This looks to me like something that puppet-vswitch provider should be handling.  Gilles, you have worked on that area before, any thoughts, or am I off base here?

Comment 7 Gilles Dubreuil 2015-02-17 04:30:24 UTC

(In reply to Kambiz Aghaiepour from comment #5)

> Note that my deployment is OSP5.

The vswitch providers have changed between OSP5 and OSP6 and depending on OSP5 version as well, fixing earlier bugs.

(In reply to Jason Guiditta from comment #6)
> This looks to me like something that puppet-vswitch provider should be
> handling.  

I'm not sure such scenario, as described in comment #0, is covered by OFI.
The vswitch providers vs_bridge/vs_port when defined will create an OVS bridge and attach a port (interface) to it, making it resilient by adding ifcfg accordingly. This is normally happening by default on a neutron network l3 agent.
The rest is beyond vswitch scope.

Comment 8 arkady kanevsky 2015-03-05 22:45:43 UTC

We are seeing it multiple times on HA controller nodes on node reboot.
Need to bump its priority to be fixed in A2.

Comment 9 Gilles Dubreuil 2015-03-06 01:58:02 UTC

From what I understand about the initial problem's description, there is no issue here, unless other behaviour is expected from either puppet-vswitch or OFI, in which case I'd suggest to create an RFE accordingly. 

The puppet-vswitch actual default behaviour is:

If the physical interface to be attached to the bridge exists but has no link (interface is down) then the bridge is configured with DHCP because there is no IP address to transfer over from the physical interface.

On the contrary if the link is up, the existing physical interface's configuration is associated to the bridge configuration, whether it's static or dynamic.

In all cases at the end of the process no IP address (neither static or dynamic) is available the physical interface.

This behaviour might be confusing especially when no IP address is desired on the physical address.


(In reply to arkady kanevsky from comment #8)
> We are seeing it multiple times on HA controller nodes on node reboot.
> Need to bump its priority to be fixed in A2.

Is it, after reboot, a bridge interface ends up defined as DHCP but expected as static? 

If yes the assign an IP to the physical interface before hand.

If no, could you please provide more information and describe in details what you're seeing and what's expected.

Comment 10 Randy Perryman 2015-03-11 21:37:00 UTC

We are seeing the following.  After the install is completed, the interfaces information for IP is removed and not set to DHCP.   

We discovered this on reboot, the interface did not come up and connectivity is not there.  We do not know when it happens.

Comment 11 Steve Reichard 2015-03-11 21:55:20 UTC

Addressing comments 7 & 9.

I have been using OFI in support of the Dell solution for several releases.

#7
Can you explain "I'm not sure such scenario, as described in comment #0, is covered by OFI."  since I've been doing this for a while.  If it is having separate NIC for tenant and L3 I show you it.


#9
The br-ex is not starting. That is a problem.   This is the discussion Jay and I 
had on it.  As Jay said in comment 0

> So, on the first run, it will resolve to ovs_tunnel_iface, and once
> the IP is moved, it should use external_network_bridge. 

I question what the tunnel interface would be expecting tunnel traffic on the external bridge.  As I said, my config keeps these on separate NICs as we have been doing since OSP3.

Comment 12 Gilles Dubreuil 2015-03-13 01:09:02 UTC

Could you please provide:

The configuration of the network interfaces prior and after installation

The openstack configuration used for installation.

Comment 15 Asaf Hirshberg 2015-03-16 13:48:40 UTC

Verified on A2.

I deployed HA-neutron (3 controllers, 1 compute) with separate subnets
for tenant/external/public-api,admin,management. the tenant subnet configured with ipam=none/boot-mod=dhcp. the deployment didn't had any problem related to puppet waiting for ip for the tenant.
After deployment finished tried to run some instances and boot the controllers, the system boot-up and the bond kept the ip for the bond interface.


rhel-osp-installer-client-0.5.7-1.el7ost.noarch
foreman-installer-1.6.0-0.3.RC1.el7ost.noarch
openstack-foreman-installer-3.0.17-1.el7ost.noarch
rhel-osp-installer-0.5.7-1.el7ost.noarch
puppet-3.6.2-2.el7.noarch
puppet-server-3.6.2-2.el7.noarch
openstack-puppet-modules-2014.2.8-2.el7ost.noarch

Comment 16 arkady kanevsky 2015-03-16 13:52:55 UTC

Can you also make sure you can ssh into a deployed instance on public IP address?
And ssh between 2 instances on the same project on private IP addresses.

Comment 17 Asaf Hirshberg 2015-03-16 13:58:38 UTC

I ran Rally boot-run-command on it, rally create a vm, then ssh into it using paramiko and running a script. the tests completed good.

Comment 18 Mike Orazi 2015-03-16 14:50:04 UTC

Asaf,

Can you confirm if this was validated on bare metal or a virtualized setup?

Comment 20 Ofer Blaut 2015-03-16 14:59:59 UTC

The setup is  built of BM hosts connected with BOND of 2X10G using Trunk ( Controllers and Compute).

External network/public/tenant/Admin API networks runs on these bonds using different VLANs.

Tenant network (VXLAN) is using External DHCP over Bond Native VLAN.
Host provision network use different 1G interface 

Ofer

Comment 22 errata-xmlrpc 2015-04-07 15:08:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0791.html

Note You need to log in before you can comment on or make changes to this bug.