Bug 1457358
Summary: | neutronovsagent container on compute node have forever restarting state after deployment of overcloud | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Artem Hrechanychenko <ahrechan> |
Component: | openstack-tripleo-heat-templates | Assignee: | Brent Eagles <beagles> |
Status: | CLOSED ERRATA | QA Contact: | Toni Freger <tfreger> |
Severity: | urgent | Docs Contact: | Andrew Burden <aburden> |
Priority: | urgent | ||
Version: | 12.0 (Pike) | CC: | afazekas, ahrechan, amuller, beagles, dsariel, jlibosva, jschluet, m.andre, mburns, mcornea, ohochman, rhallise, rhel-osp-director-maint, sasha, tfreger, tvignaud |
Target Milestone: | ga | Keywords: | AutomationBlocker, Reopened, TechPreview, Triaged |
Target Release: | 12.0 (Pike) | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | openstack-tripleo-heat-templates-7.0.0-0.20170901051303.0rc1.el7ost | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-12-13 21:29:42 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1433535 | ||
Bug Blocks: |
Description
Artem Hrechanychenko
2017-05-31 14:48:28 UTC
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release. Checking the ovs agent logs in /var/log/containers/neutron/neutron-openvswitch-agent.log gives more info: 2017-05-31 10:22:44.302 24231 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-460c7f8f-1b03-4546-a873-ce8843df941d - - - - -] Mapping physical network datacentre to bridge br-ex 2017-05-31 10:22:44.302 24231 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-460c7f8f-1b03-4546-a873-ce8843df941d - - - - -] Bridge br-ex for physical network datacentre does not exist. Agent terminated! 2017-05-31 10:22:44.303 24231 ERROR ryu.lib.hub [req-460c7f8f-1b03-4546-a873-ce8843df941d - - - - -] hub: uncaught exception: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 54, in _launch return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_ryuapp.py", line 40, in agent_main_wrapper ovs_agent.main(bridge_classes) File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2167, in main agent = OVSNeutronAgent(bridge_classes, cfg.CONF) File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 183, in __init__ self.setup_physical_bridges(self.bridge_mappings) File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 153, in wrapper return f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1096, in setup_physical_bridges sys.exit(1) SystemExit: 1 reproduced. The bridge is normally created by os-net-config although it can be manually created by the puppet-vswitch module as well I think if it isn't first created by os-net-config. You can look in /var/lib/heat-config/heat-config-script/ and found the os-net-config heat script that would have been used to configure the bridge during provisioning. What does this script say? A couple more things masking the issue here are that /etc/os-net-config/config.json seems to get overwritten by the old element. See here: https://bugs.launchpad.net/tripleo/+bug/1695091 Not directly related to this bug but could be confusing the issue of how things are wired up I think. I had vxlan tenant-natwork, non dvr setup. I do not supposed to have br-ex on the compute node, so it should not be in the bridge mapping. When I just remove the datacentre:br-ex etc/neutron/plugins/ml2/openvswitch_agent.ini:bridge_mappings =tenant:br-isolated it continues to the next bug https://bugzilla.redhat.com/show_bug.cgi?id=1459592 . We should re-test with latest version - can you check it's still reproduce? The issue is still there: openstack-neutron-ml2-11.0.0-0.20170611190934.01cc269.el7ost.noarch openstack-neutron-openvswitch-11.0.0-0.20170611190934.01cc269.el7ost.noarch python-neutron-lib-1.7.0-0.20170529134801.0ee4f4a.el7ost.noarch python-neutron-lbaas-11.0.0-0.20170607184515.55e6c6f.el7ost.noarch openstack-neutron-11.0.0-0.20170611190934.01cc269.el7ost.noarch openstack-neutron-l2gw-agent-10.1.0-0.20170611031418.9d2a82f.el7ost.noarch openstack-neutron-metering-agent-11.0.0-0.20170611190934.01cc269.el7ost.noarch puppet-neutron-11.2.0-0.20170609110344.b4fd4aa.el7ost.noarch openstack-neutron-common-11.0.0-0.20170611190934.01cc269.el7ost.noarch openstack-neutron-linuxbridge-11.0.0-0.20170611190934.01cc269.el7ost.noarch openstack-neutron-sriov-nic-agent-11.0.0-0.20170611190934.01cc269.el7ost.noarch python-neutron-11.0.0-0.20170611190934.01cc269.el7ost.noarch python-neutronclient-6.3.0-0.20170601203754.ba535c6.el7ost.noarch openstack-neutron-lbaas-11.0.0-0.20170607184515.55e6c6f.el7ost.noarch openstack-neutron-openvswitch-agent-docker 2017-06-15.2 I suspect that this is actually caused by br-ex being part of the ovs agent's configuration but the bridge isn't configured on the compute node. I noticed this in my environment a few days ago, but haven't had a chance to get a fix up. A quick workaround if the overcloud is already deployed, log in the compute node(s) and manually created the bridge. e.g. ssh heat-admin@<compute-ip> sudo ovs-vsctl add-br br-ex The agent will come up on the next restart of the container. The instructions in docker/README-containers.md suggests including the "environments/docker-network.yaml" environment file in the deployment command line. This environment file appears to set the compute's network configuration to be the same as the controller. Brent, the content of file docker/README-containers.md is terribly outdated. I wouldn't trust it if I were you. More seriously, I'll update the file to redirect to https://docs.openstack.org/tripleo-docs/latest/install/containers_deployment/index.html which should provide much more accurate information. Note that the core issue is that br-ex wasn't being created by default on compute nodes. If you use a non-default network configuration (network isolation, multiple nics, etc. etc.) the network environment files being used need to take care of creating the br-ex bridge on the compute nodes. Need to understand the relevancy of the bug, since the neutron,ovs moved back to BM. [root@overcloud-compute-0 ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 99e4009a0ed4 192.168.24.1:8787/rhosp12/openstack-nova-compute-docker:2017-07-26.10 "kolla_start" 46 minutes ago Up 46 minutes nova_compute c4eed184f57a 192.168.24.1:8787/rhosp12/openstack-iscsid-docker:2017-07-26.10 "kolla_start" 50 minutes ago Up 50 minutes iscsid e63cadbd5884 192.168.24.1:8787/rhosp12/openstack-nova-libvirt-docker:2017-07-26.10 "kolla_start" 50 minutes ago Up 50 minutes nova_libvirt [root@overcloud-compute-0 ~]# systemctl|grep openv neutron-openvswitch-agent.service loaded active running OpenStack Neutron Open vSwitch Agent openvswitch.service loaded active exited Open vSwitch the openvswitch service is running on BM during OSP12 , therefore it's not a bug . Re-opening. Containerized Neutron will still be available as TP for OSP 12 and is intended for full support in 13, so the bug is still relevant. *** Bug 1470682 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462 |