Hide Forgot
Description of problem: neutronovsagent container on compute node have forever restarting state after deployment of overcloud [heat-admin@overcloud-compute-0 ~]$ sudo docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0e6121bbe0c9 192.168.24.1:8787/rhosp12/openstack-neutron-openvswitch-agent-docker:2017-05-16.6 "kolla_start" 25 minutes ago Restarting (0) 2 minutes ago neutronovsagent 92d0d3dcb950 192.168.24.1:8787/rhosp12/openstack-nova-compute-docker:2017-05-16.6 "kolla_start" 25 minutes ago Up 25 minutes novacompute 01a185801b7e 192.168.24.1:8787/rhosp12/openstack-nova-libvirt-docker:2017-05-16.6 "kolla_start" 33 minutes ago Up 33 minutes nova_libvirt Version-Release number of selected component (if applicable): OSP12 How reproducible: Steps to Reproduce: 1.http://etherpad.corp.redhat.com/testing-osp12-containers, use rhel7.4 for creating vm infrastructure via infrared - --image-url http://download-node-02.eng.bos.redhat.com/brewroot/packages/rhel-guest-image/7.4/135/images/rhel-guest-image-7.4-135.x86_64.qcow2 2.Before deployment of overcloud Apply workarounds for: 1) https://bugzilla.redhat.com/show_bug.cgi?id=1448482 2) https://bugzilla.redhat.com/show_bug.cgi?id=1450370 3) https://bugzilla.redhat.com/show_bug.cgi?id=1452082 4) https://bugzilla.redhat.com/show_bug.cgi?id=1455348 3.Deploy overcloud source /home/stack/stackrc && openstack overcloud deploy --templates /usr/share/openstack-tripleo-heat-templates --libvirt-type kvm -e /home/stack/nodes_data.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml --log-file overcloud_deployment_0.log Actual results: 0e6121bbe0c9 192.168.24.1:8787/rhosp12/openstack-neutron-openvswitch-agent-docker:2017-05-16.6 "kolla_start" 25 minutes ago Restarting (0) 2 minutes ago neutronovsagent Expected results: state of neutronovsagent container is "Up" Additional info: http://pastebin.test.redhat.com/489524 from docker logs of container INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json INFO:__main__:Validating config file INFO:__main__:Kolla config strategy set to: COPY_ALWAYS INFO:__main__:Writing out command to execute INFO:__main__:Setting permission for /var/log/neutron INFO:__main__:Setting permission for /var/log/neutron/neutron-openvswitch-agent.log Running command: '/usr/bin/neutron-openvswitch-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --config-file /etc/neutron/plugins/ml2/ml2_conf.ini' Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be registered in a future release, so please use SIGUSR2 to generate reports. Option "notification_driver" from group "DEFAULT" is deprecated. Use option "driver" from group "oslo_messaging_notifications". Could not load neutron.openstack.common.notifier.rpc_notifier
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.
Checking the ovs agent logs in /var/log/containers/neutron/neutron-openvswitch-agent.log gives more info: 2017-05-31 10:22:44.302 24231 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-460c7f8f-1b03-4546-a873-ce8843df941d - - - - -] Mapping physical network datacentre to bridge br-ex 2017-05-31 10:22:44.302 24231 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-460c7f8f-1b03-4546-a873-ce8843df941d - - - - -] Bridge br-ex for physical network datacentre does not exist. Agent terminated! 2017-05-31 10:22:44.303 24231 ERROR ryu.lib.hub [req-460c7f8f-1b03-4546-a873-ce8843df941d - - - - -] hub: uncaught exception: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 54, in _launch return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_ryuapp.py", line 40, in agent_main_wrapper ovs_agent.main(bridge_classes) File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2167, in main agent = OVSNeutronAgent(bridge_classes, cfg.CONF) File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 183, in __init__ self.setup_physical_bridges(self.bridge_mappings) File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 153, in wrapper return f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1096, in setup_physical_bridges sys.exit(1) SystemExit: 1
reproduced.
The bridge is normally created by os-net-config although it can be manually created by the puppet-vswitch module as well I think if it isn't first created by os-net-config. You can look in /var/lib/heat-config/heat-config-script/ and found the os-net-config heat script that would have been used to configure the bridge during provisioning. What does this script say?
A couple more things masking the issue here are that /etc/os-net-config/config.json seems to get overwritten by the old element. See here: https://bugs.launchpad.net/tripleo/+bug/1695091 Not directly related to this bug but could be confusing the issue of how things are wired up I think.
I had vxlan tenant-natwork, non dvr setup. I do not supposed to have br-ex on the compute node, so it should not be in the bridge mapping. When I just remove the datacentre:br-ex etc/neutron/plugins/ml2/openvswitch_agent.ini:bridge_mappings =tenant:br-isolated it continues to the next bug https://bugzilla.redhat.com/show_bug.cgi?id=1459592 .
We should re-test with latest version - can you check it's still reproduce?
The issue is still there: openstack-neutron-ml2-11.0.0-0.20170611190934.01cc269.el7ost.noarch openstack-neutron-openvswitch-11.0.0-0.20170611190934.01cc269.el7ost.noarch python-neutron-lib-1.7.0-0.20170529134801.0ee4f4a.el7ost.noarch python-neutron-lbaas-11.0.0-0.20170607184515.55e6c6f.el7ost.noarch openstack-neutron-11.0.0-0.20170611190934.01cc269.el7ost.noarch openstack-neutron-l2gw-agent-10.1.0-0.20170611031418.9d2a82f.el7ost.noarch openstack-neutron-metering-agent-11.0.0-0.20170611190934.01cc269.el7ost.noarch puppet-neutron-11.2.0-0.20170609110344.b4fd4aa.el7ost.noarch openstack-neutron-common-11.0.0-0.20170611190934.01cc269.el7ost.noarch openstack-neutron-linuxbridge-11.0.0-0.20170611190934.01cc269.el7ost.noarch openstack-neutron-sriov-nic-agent-11.0.0-0.20170611190934.01cc269.el7ost.noarch python-neutron-11.0.0-0.20170611190934.01cc269.el7ost.noarch python-neutronclient-6.3.0-0.20170601203754.ba535c6.el7ost.noarch openstack-neutron-lbaas-11.0.0-0.20170607184515.55e6c6f.el7ost.noarch openstack-neutron-openvswitch-agent-docker 2017-06-15.2
I suspect that this is actually caused by br-ex being part of the ovs agent's configuration but the bridge isn't configured on the compute node. I noticed this in my environment a few days ago, but haven't had a chance to get a fix up.
A quick workaround if the overcloud is already deployed, log in the compute node(s) and manually created the bridge. e.g. ssh heat-admin@<compute-ip> sudo ovs-vsctl add-br br-ex The agent will come up on the next restart of the container.
The instructions in docker/README-containers.md suggests including the "environments/docker-network.yaml" environment file in the deployment command line. This environment file appears to set the compute's network configuration to be the same as the controller.
Brent, the content of file docker/README-containers.md is terribly outdated. I wouldn't trust it if I were you. More seriously, I'll update the file to redirect to https://docs.openstack.org/tripleo-docs/latest/install/containers_deployment/index.html which should provide much more accurate information.
Note that the core issue is that br-ex wasn't being created by default on compute nodes. If you use a non-default network configuration (network isolation, multiple nics, etc. etc.) the network environment files being used need to take care of creating the br-ex bridge on the compute nodes.
Need to understand the relevancy of the bug, since the neutron,ovs moved back to BM. [root@overcloud-compute-0 ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 99e4009a0ed4 192.168.24.1:8787/rhosp12/openstack-nova-compute-docker:2017-07-26.10 "kolla_start" 46 minutes ago Up 46 minutes nova_compute c4eed184f57a 192.168.24.1:8787/rhosp12/openstack-iscsid-docker:2017-07-26.10 "kolla_start" 50 minutes ago Up 50 minutes iscsid e63cadbd5884 192.168.24.1:8787/rhosp12/openstack-nova-libvirt-docker:2017-07-26.10 "kolla_start" 50 minutes ago Up 50 minutes nova_libvirt [root@overcloud-compute-0 ~]# systemctl|grep openv neutron-openvswitch-agent.service loaded active running OpenStack Neutron Open vSwitch Agent openvswitch.service loaded active exited Open vSwitch
the openvswitch service is running on BM during OSP12 , therefore it's not a bug .
Re-opening. Containerized Neutron will still be available as TP for OSP 12 and is intended for full support in 13, so the bug is still relevant.
*** Bug 1470682 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462