Bug 1198032
Summary: | VRRP_Instance are on MASTER STATE on all controllers. | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Asaf Hirshberg <ahirshbe> | ||||
Component: | rhel-osp-installer | Assignee: | Jason Guiditta <jguiditt> | ||||
Status: | CLOSED ERRATA | QA Contact: | Asaf Hirshberg <ahirshbe> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 6.0 (Juno) | CC: | aberezin, adahms, ahirshbe, amuller, dmacpher, jguiditt, lnatapov, majopela, mangelajo, mburns, mlopes, nyechiel, oblaut, rhos-maint, yeylon | ||||
Target Milestone: | z2 | Keywords: | ZStream | ||||
Target Release: | Installer | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | openstack-foreman-installer-3.0.17-1.el7ost | Doc Type: | Known Issue | ||||
Doc Text: |
When using the Red Hat Enterprise Linux OpenStack Platform installer to deploy Layer 3 High Availability, a known issue currently exists where Puppet will overwrite the host value in neutron.conf with 'neutron-n-0'. As a result, all HA routers are configured with the 'master' router state.
As a workaround, after the installation and before any virtual routers were created, manually run the following commands on each of the Controller nodes:
# systemctl stop puppet
# systemctl disable puppet
# pcs resource disable neutron-scale; sleep 20; pcs resource enable neutron-scale
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2015-04-07 15:08:48 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Asaf Hirshberg
2015-03-03 09:31:18 UTC
Happens also on setup without bonding. I poked around in the setup and this is what I found. HA routers send VRRP traffic over what we call 'HA ports'. The HA routers are supposed to be able to ping each other over these interfaces but in this case they can't. Thus, no VRRP traffic and everyone is master. It looks like binding has failed on 2 out of the 3 machines involved for these HA ports (ovs-vsctl show, shows VLAN 4095). The Neutron server has warnings about binding failures for these ports as would be expected. One reason you'd get binding failures is because of a mismatch between the 'host' values in the different agents and Neutron server within a single machine. I checked, and the host values are not configured properly. Host 1: http://pastebin.com/fR6yUHAE Host 2: http://pastebin.com/PSENgvkb Host 3: http://pastebin.com/4BhJxX0h neutron scale configures /etc/neutron/neutron.conf the same way it does for the other agents, so my guess here is that some puppet module is altering the host id for neutron.conf too? (and setting the host=neutron-n-0 instead of "host = neutron-n-$i") Can we get the puppet agent log in the host to see what's happening? find /var/log -name "*puppet*" /var/log/puppet ll /var/log/puppet shows it's empty. Ok, lets see if a pcs resource disable neutron-scale; sleep 20; pcs resource enable neutron-scale rewrites /etc/neutron/neutron.conf to the proper value? I'm trying to understand if something comes after and breaks the value. disabling and enabling neutron-scale via pcs wrote the proper host values to all neutron confs. At this point creating a new HA router succeeds. The question remains how did this setup end up with the wrong host value written to neutron.conf. Created attachment 997592 [details]
logs from one host
This line appears a couple of times in the puppet logs Ofer attached: Mar 3 09:02:20 mac441ea173366b puppet-agent[23903]: (/Stage[main]/Quickstack::Neutron::All/Neutron_config[DEFAULT/host]/value) value changed 'neutron-n-2' to 'neutron-n-0' That shouldn't be happening. as you said: some other puppet change the host after it is being set by NeutronScale 3 12:03:35 mac441ea1733991 NeutronScale(neutron-scale:1)[6954]: INFO: neutron-scale: host neutron-n-1 set for /etc/neutron/dhcp_agent.ini Mar 3 12:03:36 mac441ea1733991 NeutronScale(neutron-scale:1)[6954]: INFO: neutron-scale: host neutron-n-1 set for /etc/neutron/fwaas_driver.ini Mar 3 12:03:36 mac441ea1733991 NeutronScale(neutron-scale:1)[6954]: INFO: neutron-scale: host neutron-n-1 set for /etc/neutron/l3_agent.ini Mar 3 12:03:36 mac441ea1733991 NeutronScale(neutron-scale:1)[6954]: INFO: neutron-scale: host neutron-n-1 set for /etc/neutron/lbaas_agent.ini Mar 3 12:03:36 mac441ea1733991 NeutronScale(neutron-scale:1)[6954]: INFO: neutron-scale: host neutron-n-1 set for /etc/neutron/metadata_agent.ini Mar 3 12:03:36 mac441ea1733991 NeutronScale(neutron-scale:1)[6954]: INFO: neutron-scale: host neutron-n-1 set for /etc/neutron/neutron.conf Mar 3 12:03:37 mac441ea1733991 NeutronScale(neutron-scale:1)[6954]: INFO: neutron-scale: host neutron-n-1 set for /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini Mar 3 12:03:47 mac441ea1733991 crmd[3013]: notice: process_lrm_event: Operation neutron-netns-cleanup_start_0: ok (node=pcmk-mac441ea1733991, call=350, rc=0, cib-update=118, confirmed=true) Mar 3 12:03:47 mac441ea1733991 crmd[3013]: notice: process_lrm_event: Operation neutron-netns-cleanup_monitor_10000: ok (node=pcmk-mac441ea1733991, call=353, rc=0, cib-update=119, confirmed=false) Mar 3 12:05:37 mac441ea1733991 puppet-agent[3073]: (/Stage[main]/Quickstack::Neutron::All/Neutron_config[DEFAULT/host]/value) value changed 'neutron-n-1' to 'neutron-n-0' (In reply to Assaf Muller from comment #12) > This line appears a couple of times in the puppet logs Ofer attached: > > Mar 3 09:02:20 mac441ea173366b puppet-agent[23903]: > (/Stage[main]/Quickstack::Neutron::All/Neutron_config[DEFAULT/host]/value) > value changed 'neutron-n-2' to 'neutron-n-0' > > That shouldn't be happening. This is left over from OSP 5 when we did not have neutron scale to set the host value, I guess I missed removing it when we moved the OSP 6. I am moving this to A2 though, A1 is done. Should be as simple as removing 3 line from that manifest you reference here, just need to test it to make sure things still get set up without it. We won't delay the A1 release, but handle this as a single post update for A1 Patch posted: https://github.com/redhat-openstack/astapor/pull/484 Merged Verified on A2.not reproduced. used same deployment(3 controllers, 1 compute) rhel-osp-installer-client-0.5.7-1.el7ost.noarch foreman-installer-1.6.0-0.3.RC1.el7ost.noarch openstack-foreman-installer-3.0.17-1.el7ost.noarch rhel-osp-installer-0.5.7-1.el7ost.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0791.html |