Bug 1198032 - VRRP_Instance are on MASTER STATE on all controllers.
Summary: VRRP_Instance are on MASTER STATE on all controllers.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhel-osp-installer
Version: 6.0 (Juno)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z2
: Installer
Assignee: Jason Guiditta
QA Contact: Asaf Hirshberg
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-03-03 09:31 UTC by Asaf Hirshberg
Modified: 2023-02-22 23:02 UTC (History)
15 users (show)

Fixed In Version: openstack-foreman-installer-3.0.17-1.el7ost
Doc Type: Known Issue
Doc Text:
When using the Red Hat Enterprise Linux OpenStack Platform installer to deploy Layer 3 High Availability, a known issue currently exists where Puppet will overwrite the host value in neutron.conf with 'neutron-n-0'. As a result, all HA routers are configured with the 'master' router state. As a workaround, after the installation and before any virtual routers were created, manually run the following commands on each of the Controller nodes: # systemctl stop puppet # systemctl disable puppet # pcs resource disable neutron-scale; sleep 20; pcs resource enable neutron-scale
Clone Of:
Environment:
Last Closed: 2015-04-07 15:08:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs from one host (245.18 KB, text/plain)
2015-03-03 17:03 UTC, Ofer Blaut
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0791 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux OpenStack Platform Installer update 2015-04-07 19:07:29 UTC

Description Asaf Hirshberg 2015-03-03 09:31:18 UTC
Description of problem:
I deployed HA-neutron on bare-metal with bond 802.3ad using latest staypuft puddle. when checked the router state (vrrp) i saw that on all controller the state is master.

[root@mac441ea173366b ~]# cat /var/lib/neutron/ha_confs/ed9f9ebf-ca42-4f61-9ea6-2369ef69c268/state 
master[root@mac441ea173366b ~]#

[root@mac441ea1733991 ~]# cat /var/lib/neutron/ha_confs/ed9f9ebf-ca42-4f61-9ea6-2369ef69c268/state 
master[root@mac441ea1733991 ~]# 

[root@mac441ea1733d43 ~]# cat /var/lib/neutron/ha_confs/ed9f9ebf-ca42-4f61-9ea6-2369ef69c268/state 
master[root@mac441ea1733d43 ~]#

How reproducible:
2/2


Expected results:
only 1 router should be on master state, other on backup mode.

Comment 5 Leonid Natapov 2015-03-03 11:31:37 UTC
Happens also on setup without bonding.

Comment 6 Assaf Muller 2015-03-03 14:54:28 UTC
I poked around in the setup and this is what I found. HA routers send VRRP traffic over what we call 'HA ports'. The HA routers are supposed to be able to ping each other over these interfaces but in this case they can't. Thus, no VRRP traffic and everyone is master.

It looks like binding has failed on 2 out of the 3 machines involved for these HA ports (ovs-vsctl show, shows VLAN 4095). The Neutron server has warnings about binding failures for these ports as would be expected. One reason you'd get binding failures is because of a mismatch between the 'host' values in the different agents and Neutron server within a single machine.

I checked, and the host values are not configured properly.

Host 1:
http://pastebin.com/fR6yUHAE

Host 2:
http://pastebin.com/PSENgvkb

Host 3:
http://pastebin.com/4BhJxX0h

Comment 7 Miguel Angel Ajo 2015-03-03 14:57:06 UTC
neutron scale configures /etc/neutron/neutron.conf the same way it does for the other agents, so my guess here is that some puppet module is altering the host id for neutron.conf too? (and setting the host=neutron-n-0 instead of "host = neutron-n-$i")

Can we get the puppet agent log in the host to see what's happening?

Comment 8 Assaf Muller 2015-03-03 15:01:28 UTC
find /var/log -name "*puppet*"
/var/log/puppet

ll /var/log/puppet shows it's empty.

Comment 9 Miguel Angel Ajo 2015-03-03 15:05:33 UTC
Ok, lets see if a pcs resource disable neutron-scale; sleep 20; pcs resource enable neutron-scale  rewrites /etc/neutron/neutron.conf to the proper value?

I'm trying to understand if something comes after and breaks the value.

Comment 10 Assaf Muller 2015-03-03 15:16:42 UTC
disabling and enabling neutron-scale via pcs wrote the proper host values to all neutron confs. At this point creating a new HA router succeeds. The question remains how did this setup end up with the wrong host value written to neutron.conf.

Comment 11 Ofer Blaut 2015-03-03 17:03:35 UTC
Created attachment 997592 [details]
logs from one host

Comment 12 Assaf Muller 2015-03-03 17:16:29 UTC
This line appears a couple of times in the puppet logs Ofer attached:

Mar  3 09:02:20 mac441ea173366b puppet-agent[23903]: (/Stage[main]/Quickstack::Neutron::All/Neutron_config[DEFAULT/host]/value) value changed 'neutron-n-2' to 'neutron-n-0'

That shouldn't be happening.

Comment 13 Ofer Blaut 2015-03-03 17:36:14 UTC
as you said:

some other puppet change the host after it is being set by NeutronScale

3 12:03:35 mac441ea1733991 NeutronScale(neutron-scale:1)[6954]: INFO: neutron-scale: host neutron-n-1 set for /etc/neutron/dhcp_agent.ini
Mar  3 12:03:36 mac441ea1733991 NeutronScale(neutron-scale:1)[6954]: INFO: neutron-scale: host neutron-n-1 set for /etc/neutron/fwaas_driver.ini
Mar  3 12:03:36 mac441ea1733991 NeutronScale(neutron-scale:1)[6954]: INFO: neutron-scale: host neutron-n-1 set for /etc/neutron/l3_agent.ini
Mar  3 12:03:36 mac441ea1733991 NeutronScale(neutron-scale:1)[6954]: INFO: neutron-scale: host neutron-n-1 set for /etc/neutron/lbaas_agent.ini
Mar  3 12:03:36 mac441ea1733991 NeutronScale(neutron-scale:1)[6954]: INFO: neutron-scale: host neutron-n-1 set for /etc/neutron/metadata_agent.ini
Mar  3 12:03:36 mac441ea1733991 NeutronScale(neutron-scale:1)[6954]: INFO: neutron-scale: host neutron-n-1 set for /etc/neutron/neutron.conf
Mar  3 12:03:37 mac441ea1733991 NeutronScale(neutron-scale:1)[6954]: INFO: neutron-scale: host neutron-n-1 set for /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
Mar  3 12:03:47 mac441ea1733991 crmd[3013]: notice: process_lrm_event: Operation neutron-netns-cleanup_start_0: ok (node=pcmk-mac441ea1733991, call=350, rc=0, cib-update=118, confirmed=true)
Mar  3 12:03:47 mac441ea1733991 crmd[3013]: notice: process_lrm_event: Operation neutron-netns-cleanup_monitor_10000: ok (node=pcmk-mac441ea1733991, call=353, rc=0, cib-update=119, confirmed=false)
Mar  3 12:05:37 mac441ea1733991 puppet-agent[3073]: (/Stage[main]/Quickstack::Neutron::All/Neutron_config[DEFAULT/host]/value) value changed 'neutron-n-1' to 'neutron-n-0'

Comment 15 Jason Guiditta 2015-03-03 17:52:00 UTC
(In reply to Assaf Muller from comment #12)
> This line appears a couple of times in the puppet logs Ofer attached:
> 
> Mar  3 09:02:20 mac441ea173366b puppet-agent[23903]:
> (/Stage[main]/Quickstack::Neutron::All/Neutron_config[DEFAULT/host]/value)
> value changed 'neutron-n-2' to 'neutron-n-0'
> 
> That shouldn't be happening.

This is left over from OSP 5 when we did not have neutron scale to set the host value, I guess I missed removing it when we moved the OSP 6. I am moving this to A2 though, A1 is done.  Should be as simple as removing 3 line from that manifest you reference here, just need to test it to make sure things still get set up without it.

Comment 16 Nir Yechiel 2015-03-03 18:36:51 UTC
We won't delay the A1 release, but handle this as a single post update for A1

Comment 20 Jason Guiditta 2015-03-04 18:19:11 UTC
Patch posted:
https://github.com/redhat-openstack/astapor/pull/484

Comment 21 Jason Guiditta 2015-03-05 13:49:27 UTC
Merged

Comment 25 Asaf Hirshberg 2015-03-15 18:23:35 UTC
Verified on A2.not reproduced. used same deployment(3 controllers, 1 compute)

rhel-osp-installer-client-0.5.7-1.el7ost.noarch
foreman-installer-1.6.0-0.3.RC1.el7ost.noarch
openstack-foreman-installer-3.0.17-1.el7ost.noarch
rhel-osp-installer-0.5.7-1.el7ost.noarch

Comment 27 errata-xmlrpc 2015-04-07 15:08:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0791.html


Note You need to log in before you can comment on or make changes to this bug.