Bug 1051047
Summary: | neutron server doesn't reschedule routers when a neutron-l3-agent goes down | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Miguel Angel Ajo <majopela> |
Component: | openstack-neutron | Assignee: | Miguel Angel Ajo <mangelajo> |
Status: | CLOSED ERRATA | QA Contact: | yfried |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.0 | CC: | amoralej, chrisw, dnavale, fdinitto, javier.pena, lpeer, mangelajo, twilson, yeylon |
Target Milestone: | z2 | Keywords: | OtherQA, ZStream |
Target Release: | 4.0 | Flags: | majopela:
needinfo-
|
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-neutron-2013.2.2-1.el6ost | Doc Type: | Known Issue |
Doc Text: |
When you shut down a neutron-l3-agent (or it dies) and you start another neutron-l3-agent in a different node, OpenStack Networking will not reschedule virtual routers from an L3 agent to the second one. The routing or metadata remain tied to the initial L3 agent ID. As a result, you cannot have an HA environment when you have several nodes with L3 agents, with different IDs either in Active/Active or Active/Passive states.
Workaround:
You can use the 'host=' field in the agent configuration file for both L3 agents to keep the same logical ID towards neutron-server.
Two hosts should never run the neutron-l3-agent at the same time with the same 'host=' parameter. And, when one L3 agent is brought down (service stop) the 'neutron-netns-cleanup --forced' script should be used to clean any namespaces and running settings left by the neutron-l3-agent.
Using this workaround, you can have virtual routers rescheduled to a different neutron-l3-agent, as long as they have the same 'host=' logical ID. When you use neutron agent-list, the host field of the neutron-l3-agent will match the 'host=' field from configuration regardless of the actual agent hostname.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2014-03-04 20:13:52 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1061578, 1072381 | ||
Bug Blocks: | 1080561 |
Description
Miguel Angel Ajo
2014-01-09 15:55:28 UTC
Javier, this made it work, although we have found situations like this during testing (it's just a bug/race condition), while you switch the ACTIVE node. https://bugzilla.redhat.com/show_bug.cgi?id=1051615 I have to check upstream, that this setting is intended for what we're doing, and that we should not find any side effect. But, for what I have tested, it does effectively work. Thank you very much, Miguel Ángel We have a workaround per comment#3 but the general scheduling problem is not solved in u/s and would be addressed in Icehouse - https://bugzilla.redhat.com/show_bug.cgi?id=1042396 Upstream confirmation on the the intended usage of the host= parameter. http://lists.openstack.org/pipermail/openstack-dev/2014-January/026020.html I've been testing this configuration with ML2 + OVS + vxlan and I've found that adding host parameter in l3_agent configuration causes problems. When using ML2, when a router is assigned to an L3 agent with a host value different that the hostname, the internal port of the router (the one connected to br-int OVS) is assigned vlsn 4095 and created a flow to drop all packages from this port. I've seen this is done by a port_dead method in the openvswitch agent. Another possible workaround is running something like this from somewhere in a cron, making sure that it does evacuate virtual routers out of down l3 agents to live ones. https://github.com/stackforge/cookbook-openstack-network/blob/master/files/default/neutron-ha-tool.py (In reply to Alfredo Moralejo from comment #6) > I've been testing this configuration with ML2 + OVS + vxlan and I've found > that adding host parameter in l3_agent configuration causes problems. > > When using ML2, when a router is assigned to an L3 agent with a host value > different that the hostname, the internal port of the router (the one > connected to br-int OVS) is assigned vlsn 4095 and created a flow to drop > all packages from this port. I've seen this is done by a port_dead method in > the openvswitch agent. The ml2 plugin uses the value of the binding:host_id port attribute in port binding. The binding:host_id of the l3-agent's ports is set with the host value from the l3-agent config. If this does not match the name the openvswitch-agent uses for the host, a binding cannot be creating. See BZ 1061578. A solution for this particular use case may be to override host with the same value in the openvswitch-agent and l3-agent config files. Not enough baremetal resources ATM. Miguel has volunteered to verify I can confirm that it works, 1) setup two network nodes, and a controller 2) setup host=l3-agent-name (or desired logical name) in l3_agent.ini for both network nodes 3) start l3_agent in network node A 4) ping from a VM to the external network: OK -failover- 5) poweroff A (or /etc/init.d/neutron-l3-agent stop + neutron-netns-forced-cleanup from bz#1051036) 6) start l3_agent in network node B 7) ping from the same vm to the external network: OK -failback- 8) poweron A 9) poweroff B (or stop l3 agent + use cleanup script) 10) start l3 agent on network node A 11) ping from the same VM to the external network: OK Checked with 2013.2.2-1 on RHEL6.5 with 2014-02-17.1 build. node A: [root@rhos4-neutron-n1 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.5 (Santiago) [root@rhos4-neutron-n1 ~]# rpm -qa | grep neutron python-neutron-2013.2.2-1.el6ost.noarch openstack-neutron-2013.2.2-1.el6ost.noarch python-neutronclient-2.3.1-3.el6ost.noarch openstack-neutron-openvswitch-2013.2.2-1.el6ost.noarch node B: [root@rhos4-neutron-n2 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.5 (Santiago) [root@rhos4-neutron-n2 ~]# rpm -qa | grep neutron python-neutron-2013.2.2-1.el6ost.noarch openstack-neutron-2013.2.2-1.el6ost.noarch python-neutronclient-2.3.1-3.el6ost.noarch openstack-neutron-openvswitch-2013.2.2-1.el6ost.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0213.html |