Bug 1642588

Summary: Director does not start openvswitch on overcloud nodes
Product: Red Hat OpenStack Reporter: Lars Kellogg-Stedman <lars>
Component: openstack-tripleo-heat-templatesAssignee: Brent Eagles <beagles>
Status: CLOSED ERRATA QA Contact: Candido Campos <ccamposr>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: aguetta, amuller, bcafarel, beagles, bhaley, dbecker, dwojewod, ekuris, emacchi, mburns, morazi, pkundal
Target Milestone: z8Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-8.3.1-76.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-03 16:55:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lars Kellogg-Stedman 2018-10-24 17:57:19 UTC
Description of problem:

When performing a split stack install, Director does not start the openvswitch service on overcloud nodes.  This bug is masked on Director-provisioned install because the service is enabled by default on our overcloud images.

Version-Release number of selected component (if applicable):

$ rpm -qa | grep tripleo
openstack-tripleo-image-elements-8.0.1-1.el7ost.noarch
puppet-tripleo-8.3.4-5.el7ost.noarch
openstack-tripleo-common-containers-8.6.3-13.el7ost.noarch
openstack-tripleo-ui-8.3.2-1.el7ost.noarch
openstack-tripleo-puppet-elements-8.0.1-1.el7ost.noarch
openstack-tripleo-common-8.6.3-13.el7ost.noarch
openstack-tripleo-heat-templates-8.0.4-20.el7ost.noarch
ansible-tripleo-ipsec-8.1.1-0.20180308133440.8f5369a.el7ost.noarch
openstack-tripleo-validations-8.4.2-1.el7ost.noarch
python-tripleoclient-9.2.3-4.el7ost.noarch


Additional info:

See also #1642587, in which Director reports a successful deployment even though it failed to start openvswitch.

Comment 1 Lars Kellogg-Stedman 2018-10-24 23:06:17 UTC
Manually installing openvswitch and starting the service prior to the deploy allowed a deploy to complete successfully and configure openvswitch on the overcloud hosts.

Comment 4 Brian Haley 2018-12-13 16:34:45 UTC
*** Bug 1645554 has been marked as a duplicate of this bug. ***

Comment 6 Aviv Guetta 2019-06-02 08:02:40 UTC
Can we have a status update on this bug?

Comment 9 Brent Eagles 2019-06-07 16:13:50 UTC
With https://review.opendev.org/#/c/663989/ the deployment will take care of enabling and starting openvswitch on nodes where the neutron ovs agent is deployed.

Comment 23 errata-xmlrpc 2019-09-03 16:55:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2624

Comment 24 Lars Kellogg-Stedman 2020-02-21 19:43:11 UTC
It looks like we're stilling hitting this with openstack-tripleo-heat-templates-8.4.1-16.el7ost.noarch. Our OSP 13 deploy just completed, but on newly added compute nodes  we see:

[root@neu-17-2-stackcomp ~]# docker ps -f name=neutron
CONTAINER ID        IMAGE                                                                  COMMA
ND                  CREATED             STATUS                         PORTS               NAMES
82f4612e35b1        172.16.0.5:8787/rhosp13/openstack-neutron-openvswitch-agent:13.0-105   "dumb
-init --singl..."   31 minutes ago      Restarting (1) 3 minutes ago                       neutr
on_ovs_agent

And looking at the logs:

[root@neu-17-2-stackcomp ~]# docker logs neutron_ovs_agent 2>&1 | tail -10
    raise self.last_attempt.result()
  File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 422, in result
    return self.__get_result()
  File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 298, in call
    result = fn(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/neutron/agent/ovsdb/native/connection.py", line 67, in do_get_schema_helper
    return idlutils.get_schema_helper(conn, schema_name)
  File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 128, in get_schema_helper
    'err': os.strerror(err)})
Exception: Could not retrieve schema from tcp:127.0.0.1:6640: Connection refused

And on the host:

[root@neu-17-2-stackcomp ~]# systemctl is-active openvswitch
inactive
[root@neu-17-2-stackcomp ~]# systemctl is-enabled openvswitch
disabled

It looks like this hasn't been resolved (or there was some sort of regression)?

Comment 25 Bernard Cafarelli 2020-02-25 16:20:04 UTC
openstack-tripleo-heat-templates-8.4.1 is based on final upstream queens release and includes the fix mentioned here:
https://review.opendev.org/#/c/663989/

You can confirm with docker/services/neutron-ovs-agent.yaml having this section in THT:
      host_prep_tasks:
        list_concat:
          - {get_attr: [NeutronLogging, host_prep_tasks]}
          -
            - name: ensure openvswitch service is enabled
              service:
                name: openvswitch
                state: started
                enabled: yes

Which if I read correctly should cover all cases here. So probably different issue, but I will defer to Brent's opinion here

Comment 26 Brent Eagles 2020-10-09 14:00:19 UTC
I think the patches should have this covered. Is this still happening Lars?

Comment 27 Lars Kellogg-Stedman 2020-10-09 16:13:39 UTC
Brent,

Thanks for checking in. Since this only crops up during initial installs, it's hard to tell. I'm not going to have the chance to try this out myself for a while, so if folks want to go ahead and close this ticket that's fine with me.

Comment 28 Lars Kellogg-Stedman 2020-10-09 16:15:22 UTC
Well, I guess it's already closed :)