Bug 1457358

Summary: neutronovsagent container on compute node have forever restarting state after deployment of overcloud
Product: Red Hat OpenStack Reporter: Artem Hrechanychenko <ahrechan>
Component: openstack-tripleo-heat-templatesAssignee: Brent Eagles <beagles>
Status: CLOSED ERRATA QA Contact: Toni Freger <tfreger>
Severity: urgent Docs Contact: Andrew Burden <aburden>
Priority: urgent    
Version: 12.0 (Pike)CC: afazekas, ahrechan, amuller, beagles, dsariel, jlibosva, jschluet, m.andre, mburns, mcornea, ohochman, rhallise, rhel-osp-director-maint, sasha, tfreger, tvignaud
Target Milestone: gaKeywords: AutomationBlocker, Reopened, TechPreview, Triaged
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-7.0.0-0.20170901051303.0rc1.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-13 21:29:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1433535    
Bug Blocks:    

Description Artem Hrechanychenko 2017-05-31 14:48:28 UTC
Description of problem:
neutronovsagent container on compute node have forever restarting state after deployment of overcloud

[heat-admin@overcloud-compute-0 ~]$ sudo docker ps
CONTAINER ID        IMAGE                                                                               COMMAND             CREATED             STATUS                         PORTS               NAMES
0e6121bbe0c9        192.168.24.1:8787/rhosp12/openstack-neutron-openvswitch-agent-docker:2017-05-16.6   "kolla_start"       25 minutes ago      Restarting (0) 2 minutes ago                       neutronovsagent
92d0d3dcb950        192.168.24.1:8787/rhosp12/openstack-nova-compute-docker:2017-05-16.6                "kolla_start"       25 minutes ago      Up 25 minutes                                      novacompute
01a185801b7e        192.168.24.1:8787/rhosp12/openstack-nova-libvirt-docker:2017-05-16.6                "kolla_start"       33 minutes ago      Up 33 minutes                                      nova_libvirt


Version-Release number of selected component (if applicable):
OSP12

How reproducible:


Steps to Reproduce:
1.http://etherpad.corp.redhat.com/testing-osp12-containers, use rhel7.4 for creating vm infrastructure via infrared - --image-url http://download-node-02.eng.bos.redhat.com/brewroot/packages/rhel-guest-image/7.4/135/images/rhel-guest-image-7.4-135.x86_64.qcow2

2.Before deployment of overcloud Apply workarounds for:
  1) https://bugzilla.redhat.com/show_bug.cgi?id=1448482
  2) https://bugzilla.redhat.com/show_bug.cgi?id=1450370
  3) https://bugzilla.redhat.com/show_bug.cgi?id=1452082
  4) https://bugzilla.redhat.com/show_bug.cgi?id=1455348

3.Deploy overcloud
source /home/stack/stackrc && openstack overcloud deploy --templates /usr/share/openstack-tripleo-heat-templates --libvirt-type kvm  -e /home/stack/nodes_data.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml --log-file overcloud_deployment_0.log

Actual results:
0e6121bbe0c9        192.168.24.1:8787/rhosp12/openstack-neutron-openvswitch-agent-docker:2017-05-16.6   "kolla_start"       25 minutes ago      Restarting (0) 2 minutes ago                       neutronovsagent


Expected results:
state of  neutronovsagent container is "Up"

Additional info:
http://pastebin.test.redhat.com/489524
from docker logs of container

INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Writing out command to execute
INFO:__main__:Setting permission for /var/log/neutron
INFO:__main__:Setting permission for /var/log/neutron/neutron-openvswitch-agent.log
Running command: '/usr/bin/neutron-openvswitch-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --config-file /etc/neutron/plugins/ml2/ml2_conf.ini'
Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be registered in a future release, so please use SIGUSR2 to generate reports.
Option "notification_driver" from group "DEFAULT" is deprecated. Use option "driver" from group "oslo_messaging_notifications".
Could not load neutron.openstack.common.notifier.rpc_notifier

Comment 1 Red Hat Bugzilla Rules Engine 2017-05-31 14:48:35 UTC
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.

Comment 2 Martin André 2017-05-31 15:22:15 UTC
Checking the ovs agent logs in /var/log/containers/neutron/neutron-openvswitch-agent.log gives more info:

2017-05-31 10:22:44.302 24231 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-460c7f8f-1b03-4546-a873-ce8843df941d - - - - -] Mapping physical network datacentre to bridge br-ex
2017-05-31 10:22:44.302 24231 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-460c7f8f-1b03-4546-a873-ce8843df941d - - - - -] Bridge br-ex for physical network datacentre does not exist. Agent terminated!
2017-05-31 10:22:44.303 24231 ERROR ryu.lib.hub [req-460c7f8f-1b03-4546-a873-ce8843df941d - - - - -] hub: uncaught exception: Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 54, in _launch
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_ryuapp.py", line 40, in agent_main_wrapper
    ovs_agent.main(bridge_classes)
  File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2167, in main
    agent = OVSNeutronAgent(bridge_classes, cfg.CONF)
  File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 183, in __init__
    self.setup_physical_bridges(self.bridge_mappings)
  File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 153, in wrapper
    return f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1096, in setup_physical_bridges
    sys.exit(1)
SystemExit: 1

Comment 3 Alexander Chuzhoy 2017-05-31 17:52:15 UTC
reproduced.

Comment 4 Dan Prince 2017-06-01 20:56:11 UTC
The bridge is normally created by os-net-config although it can be manually created by the puppet-vswitch module as well I think if it isn't first created by os-net-config.

You can look in /var/lib/heat-config/heat-config-script/ and found the os-net-config heat script that would have been used to configure the bridge during provisioning. What does this script say?

Comment 5 Dan Prince 2017-06-01 21:10:23 UTC
A couple more things masking the issue here are that /etc/os-net-config/config.json seems to get overwritten by the old element. See here: 

https://bugs.launchpad.net/tripleo/+bug/1695091

Not directly related to this bug but could be confusing the issue of how things are wired up I think.

Comment 6 Attila Fazekas 2017-06-13 12:50:14 UTC
I had vxlan tenant-natwork, non dvr setup. I do not supposed to have br-ex on the compute node, so it should not be in the bridge mapping.

When I just remove the datacentre:br-ex
etc/neutron/plugins/ml2/openvswitch_agent.ini:bridge_mappings =tenant:br-isolated

it continues to the next bug https://bugzilla.redhat.com/show_bug.cgi?id=1459592 .

Comment 7 Omri Hochman 2017-06-19 15:18:39 UTC
We should re-test with latest version  - can you check it's still reproduce?

Comment 8 Alexander Chuzhoy 2017-06-19 20:23:06 UTC
The issue is still there:
openstack-neutron-ml2-11.0.0-0.20170611190934.01cc269.el7ost.noarch
openstack-neutron-openvswitch-11.0.0-0.20170611190934.01cc269.el7ost.noarch
python-neutron-lib-1.7.0-0.20170529134801.0ee4f4a.el7ost.noarch
python-neutron-lbaas-11.0.0-0.20170607184515.55e6c6f.el7ost.noarch
openstack-neutron-11.0.0-0.20170611190934.01cc269.el7ost.noarch
openstack-neutron-l2gw-agent-10.1.0-0.20170611031418.9d2a82f.el7ost.noarch
openstack-neutron-metering-agent-11.0.0-0.20170611190934.01cc269.el7ost.noarch
puppet-neutron-11.2.0-0.20170609110344.b4fd4aa.el7ost.noarch
openstack-neutron-common-11.0.0-0.20170611190934.01cc269.el7ost.noarch
openstack-neutron-linuxbridge-11.0.0-0.20170611190934.01cc269.el7ost.noarch
openstack-neutron-sriov-nic-agent-11.0.0-0.20170611190934.01cc269.el7ost.noarch
python-neutron-11.0.0-0.20170611190934.01cc269.el7ost.noarch
python-neutronclient-6.3.0-0.20170601203754.ba535c6.el7ost.noarch
openstack-neutron-lbaas-11.0.0-0.20170607184515.55e6c6f.el7ost.noarch


openstack-neutron-openvswitch-agent-docker   2017-06-15.2

Comment 10 Brent Eagles 2017-06-20 16:35:40 UTC
I suspect that this is actually caused by br-ex being part of the ovs agent's configuration but the bridge isn't configured on the compute node. I noticed this in my environment a few days ago, but haven't had a chance to get a fix up.

Comment 12 Brent Eagles 2017-06-21 18:50:42 UTC
A quick workaround if the overcloud is already deployed, log in the compute node(s) and manually created the bridge. e.g.
   ssh heat-admin@<compute-ip>
   sudo ovs-vsctl add-br br-ex

The agent will come up on the next restart of the container.

Comment 13 Brent Eagles 2017-06-21 19:28:34 UTC
The instructions in docker/README-containers.md suggests including the "environments/docker-network.yaml" environment file in the deployment command line. This environment file appears to set the compute's network configuration to be the same as the controller.

Comment 14 Martin André 2017-07-10 14:23:39 UTC
Brent, the content of file docker/README-containers.md is terribly outdated. I wouldn't trust it if I were you.

More seriously, I'll update the file to redirect to https://docs.openstack.org/tripleo-docs/latest/install/containers_deployment/index.html which should provide much more accurate information.

Comment 15 Brent Eagles 2017-07-13 16:55:05 UTC
Note that the core issue is that br-ex wasn't being created by default on compute nodes. If you use a non-default network configuration (network isolation, multiple nics, etc. etc.) the network environment files being used need to take care of creating the br-ex bridge on the compute nodes.

Comment 16 Alexander Chuzhoy 2017-07-27 16:19:26 UTC
Need to understand the relevancy of the bug, since the neutron,ovs moved back to BM.


[root@overcloud-compute-0 ~]# docker ps
CONTAINER ID        IMAGE                                                                   COMMAND             CREATED             STATUS              PORTS               NAMES
99e4009a0ed4        192.168.24.1:8787/rhosp12/openstack-nova-compute-docker:2017-07-26.10   "kolla_start"       46 minutes ago      Up 46 minutes                           nova_compute
c4eed184f57a        192.168.24.1:8787/rhosp12/openstack-iscsid-docker:2017-07-26.10         "kolla_start"       50 minutes ago      Up 50 minutes                           iscsid
e63cadbd5884        192.168.24.1:8787/rhosp12/openstack-nova-libvirt-docker:2017-07-26.10   "kolla_start"       50 minutes ago      Up 50 minutes                           nova_libvirt
[root@overcloud-compute-0 ~]# systemctl|grep openv
  neutron-openvswitch-agent.service                                             loaded active running   OpenStack Neutron Open vSwitch Agent
  openvswitch.service                                                           loaded active exited    Open vSwitch

Comment 17 Omri Hochman 2017-08-04 14:55:11 UTC
the openvswitch service is running on BM during OSP12 , therefore it's not a bug .

Comment 18 Assaf Muller 2017-08-04 16:14:00 UTC
Re-opening. Containerized Neutron will still be available as TP for OSP 12 and is intended for full support in 13, so the bug is still relevant.

Comment 19 Assaf Muller 2017-09-07 22:13:55 UTC
*** Bug 1470682 has been marked as a duplicate of this bug. ***

Comment 31 errata-xmlrpc 2017-12-13 21:29:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462