Description of problem: After an undercloud upgrade from osp13 to osp14 there are some network agents which do not work. How reproducible: Installed rhos-13 with infrared and manually upgrade to rhos-14 Actual results: (undercloud) [stack@undercloud-0 ~]$ openstack network agent list +--------------------------------------+--------------------+--------------------------------------+-------------------+-------+-------+---------------------------+ | ID | Agent Type | Host | Availability Zone | Alive | State | Binary | +--------------------------------------+--------------------+--------------------------------------+-------------------+-------+-------+---------------------------+ | 0aaadf6f-f5bb-4908-b939-9830cfb314b3 | Baremetal Node | 6315133e-5e19-4bc8-945e-20bc9d6502da | None | :-) | UP | ironic-neutron-agent | | 3fe4ab2c-e5ae-492e-b7a1-0a992ac93a0e | Baremetal Node | 68f4346e-bdcd-4305-9eaf-ed8a4d0c243a | None | :-) | UP | ironic-neutron-agent | | 46b44435-8e7b-4f21-be1f-57fe870b3b7b | DHCP agent | undercloud-0.localdomain | nova | :-) | UP | neutron-dhcp-agent | | 51ba547e-6fc4-4126-ac02-0fd46c6f0a15 | Baremetal Node | ccdd67ed-eb11-4afe-960a-bb4da922fe81 | None | :-) | UP | ironic-neutron-agent | | 56085a0c-7c83-40c5-8b1c-cdf6d407857f | Baremetal Node | d5e8dab9-4e3d-43b8-8be4-123eccff381d | None | :-) | UP | ironic-neutron-agent | | 87dd64f2-7ae2-4da7-bdee-c0bbee2bef7d | DHCP agent | undercloud-0.redhat.local | nova | XXX | UP | neutron-dhcp-agent | | 9218da59-b00e-4ef7-8817-c25d41502209 | L3 agent | undercloud-0.localdomain | nova | :-) | UP | neutron-l3-agent | | c917d357-e11f-417d-aa45-40359b4ea8ce | Open vSwitch agent | undercloud-0.redhat.local | None | XXX | UP | neutron-openvswitch-agent | +--------------------------------------+--------------------+--------------------------------------+-------------------+-------+-------+---------------------------+ (undercloud) [stack@undercloud-0 ~]$ docker ps -a | grep openstack-neutron-openvswitch a5cca059e5f5 rhos-qe-mirror-qeos.usersys.redhat.com:5000/rhosp14/openstack-neutron-openvswitch-agent:latest "kolla_start" 16 minutes ago Restarting (1) 2 minutes ago neutron_ovs_agent There is no openvswitch service running on the undercloud: (undercloud) [stack@undercloud-0 ~]$ ps -aef|grep openvswitch stack 95420 90347 0 06:52 pts/0 00:00:00 grep --color=auto openvswitch There is no system unit to enable openvswitch: (undercloud) [stack@undercloud-0 ~]$ systemctl |grep openvs (undercloud) [stack@undercloud-0 ~]$ (undercloud) [stack@undercloud-0 ~]$ ovs-vsctl -V ovs-vsctl (Open vSwitch) 2.10.0 DB Schema 7.16.1 Additional info: (undercloud) [stack@undercloud-0 ~]$ yum list installed | grep openvswitch openstack-neutron-openvswitch.noarch openvswitch-selinux-extra-policy.noarch openvswitch2.10.x86_64 2.10.0-28.el7fdp.2 @rhelosp-14.0-puddle python-openvswitch2.10.x86_64 2.10.0-28.el7fdp.2 @rhelosp-14.0-puddle python-rhosp-openvswitch.noarch 2.10-0.1.el7ost @rhelosp-14.0-puddle rhosp-openvswitch.noarch 2.10-0.1.el7ost @rhelosp-14.0-puddle Workaround: (undercloud) [stack@undercloud-0 ~]$ sudo /usr/share/openvswitch/scripts/ovs-ctl start Backing up database to /etc/openvswitch/conf.db.backup7.15.[ OK ]2033 Compacting database [ OK ] Converting database schema [ OK ] Starting ovsdb-server [ OK ] system ID not configured, please use --system-id ... failed! Configuring Open vSwitch system IDs [ OK ] Starting ovs-vswitchd [ OK ] Enabling remote OVSDB managers [ OK ] (undercloud) [stack@undercloud-0 ~]$ sudo docker restart a5cca059e5f5 # the id of the openstack-neutron-openvswitch-agent:latest (undercloud) [stack@undercloud-0 ~]$ openstack network agent list +--------------------------------------+--------------------+--------------------------------------+-------------------+-------+-------+---------------------------+ | ID | Agent Type | Host | Availability Zone | Alive | State | Binary | +--------------------------------------+--------------------+--------------------------------------+-------------------+-------+-------+---------------------------+ | 0aaadf6f-f5bb-4908-b939-9830cfb314b3 | Baremetal Node | 6315133e-5e19-4bc8-945e-20bc9d6502da | None | :-) | UP | ironic-neutron-agent | | 1db18831-6b39-4ab5-8cca-b53addbb49f6 | DHCP agent | undercloud-0.localdomain | nova | :-) | UP | neutron-dhcp-agent | | 3fe4ab2c-e5ae-492e-b7a1-0a992ac93a0e | Baremetal Node | 68f4346e-bdcd-4305-9eaf-ed8a4d0c243a | None | :-) | UP | ironic-neutron-agent | | 51ba547e-6fc4-4126-ac02-0fd46c6f0a15 | Baremetal Node | ccdd67ed-eb11-4afe-960a-bb4da922fe81 | None | :-) | UP | ironic-neutron-agent | | 56085a0c-7c83-40c5-8b1c-cdf6d407857f | Baremetal Node | d5e8dab9-4e3d-43b8-8be4-123eccff381d | None | :-) | UP | ironic-neutron-agent | | 838a177b-e21b-43a8-a2a5-545103c457ed | L3 agent | undercloud-0.localdomain | nova | :-) | UP | neutron-l3-agent | | 87dd64f2-7ae2-4da7-bdee-c0bbee2bef7d | DHCP agent | undercloud-0.redhat.local | nova | XXX | UP | neutron-dhcp-agent | | c917d357-e11f-417d-aa45-40359b4ea8ce | Open vSwitch agent | undercloud-0.redhat.local | None | XXX | UP | neutron-openvswitch-agent | | daa3f5ba-c385-4c85-9436-1b581109419f | Open vSwitch agent | undercloud-0.localdomain | None | :-) | UP | neutron-openvswitch-agent | +--------------------------------------+--------------------+--------------------------------------+-------------------+-------+-------+---------------------------+
Please add an sosreport, or minimally logs of the OVS agent.
Also can you please add the exact steps you used to upgrade the cloud? Preferably exact input from 'history'.
I used this script [1] to build up the environment on osp13 To do the undercloud upgrade I used this ansible-playbook [2] [1] https://gitlab.cee.redhat.com/jbadiapa/backup-and-restore-ffwd/blob/B_R_osp13/Controller-Backup-env.sh [2] https://gitlab.cee.redhat.com/jbadiapa/backup-and-restore-ffwd/blob/B_R_osp13/undercloud-upgrade.yaml These lines were on /var/log/neutron/server.log 2019-04-25 06:17:47.927 976 DEBUG neutron_lib.callbacks.manager [req-26431881-263a-4383-9bd4-b1ea33a1b56a - - - - -] Notify callbacks ['neutron.plugins.ml2.plugin.Ml2Plugin._retry_binding_revived_agents-1695273', 'neutron.services.segments.db._update_segment_host_mapping_for_agent--9223363244693201984'] for agent, after_update _notify_loop /usr/lib/python2.7/site-packages/neutron_lib/callbacks/manager.py:167 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines [req-6b1eb7dc-e747-4fbc-ba1d-c4d721ba9ea2 - - - - -] Database connection was found disconnected; reconnecting: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: u'SELECT 1'] (Background on this error at: http://sqlalche.me/e/e3q8) 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines Traceback (most recent call last): 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py", line 73, in _connect_ping_listener 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines connection.scalar(select([1])) 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 880, in scalar 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines return self.execute(object, *multiparams, **params).scalar() 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 948, in execute 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines return meth(self, multiparams, params) 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines return connection._execute_clauseelement(self, multiparams, params) 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines compiled_sql, distilled_params 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines context) 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1409, in _handle_dbapi_exception 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines util.raise_from_cause(newraise, exc_info) 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines reraise(type(exception), exception, tb=exc_tb, cause=cause) 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines context) 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 507, in do_execute 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines cursor.execute(statement, parameters) 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 166, in execute 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines result = self._query(query) 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 322, in _query 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines conn.query(q) 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 856, in query 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines self._affected_rows = self._read_query_result(unbuffered=unbuffered) 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1057, in _read_query_result 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines result.read() 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1340, in read 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines first_packet = self.connection._read_packet() 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 998, in _read_packet 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines "Lost connection to MySQL server during query") 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engines DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: u'SELECT 1'] (Background on this error at: http://sqlalche.me/e/e3q8) 2019-04-25 06:17:52.948 980 ERROR oslo_db.sqlalchemy.engine
Hi, I've also found the very same issue. The steps to reproduce it are: 1. Configure rhos-release -P 14 -p passed_phase2 in the uc 2. Update python-tripleoclient 3. Generate the containers-prepare-parameters.yaml and configure it with the reight parameters 4. Update the undercloud.conf 5. Run openstack undercloud upgrade What I found was that any of the OC nodes was reachable after the UC upgrade even though the undercloud had successfuly upgraded. When checking the docker containers, we could see that neutron_ovs_agent was constantly restarting: 375437b3e3fa 192.168.24.1:8787/rhosp14/openstack-neutron-openvswitch-agent:14.0-102 "kolla_start" 25 minutes ago Restarting (1) 10 minutes ago neutron_ovs_agent And the logs from the container display: ++ cat /run_command + CMD=/neutron_ovs_agent_launcher.sh + ARGS= + [[ ! -n '' ]] + . kolla_extend_start ++ [[ ! -d /var/log/kolla/neutron ]] +++ stat -c %a /var/log/kolla/neutron ++ [[ 2755 != \7\5\5 ]] ++ chmod 755 /var/log/kolla/neutron ++ . /usr/local/bin/kolla_neutron_extend_start Running command: '/neutron_ovs_agent_launcher.sh' + echo 'Running command: '\''/neutron_ovs_agent_launcher.sh'\''' + exec /neutron_ovs_agent_launcher.sh + /usr/bin/python -m neutron.cmd.destroy_patch_ports --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/pl ugins/ml2/openvswitch_agent.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-openvswitch-agent Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/neutron/cmd/destroy_patch_ports.py", line 83, in <module> main() File "/usr/lib/python2.7/site-packages/neutron/cmd/destroy_patch_ports.py", line 78, in main port_cleaner = PatchPortCleaner(cfg.CONF) File "/usr/lib/python2.7/site-packages/neutron/cmd/destroy_patch_ports.py", line 44, in __init__ for bridge in mappings.values()] File "/usr/lib/python2.7/site-packages/neutron/agent/common/ovs_lib.py", line 218, in __init__ super(OVSBridge, self).__init__() File "/usr/lib/python2.7/site-packages/neutron/agent/common/ovs_lib.py", line 117, in __init__ self.ovsdb = ovsdb_api.from_config(self) File "/usr/lib/python2.7/site-packages/neutron/agent/ovsdb/api.py", line 31, in from_config return iface.api_factory(context) File "/usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py", line 49, in api_factory idl=n_connection.idl_factory(), File "/usr/lib/python2.7/site-packages/neutron/agent/ovsdb/native/connection.py", line 69, in idl_factory helper = do_get_schema_helper() File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 241, in wrapped_f return self.call(f, *args, **kw) File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 330, in call start_time=start_time) File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 297, in iter raise retry_exc.reraise() File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 136, in reraise raise self.last_attempt.result() File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 422, in result return self.__get_result() File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 333, in call result = fn(*args, **kwargs) File "/usr/lib/python2.7/site-packages/neutron/agent/ovsdb/native/connection.py", line 67, in do_get_schema_helper return idlutils.get_schema_helper(conn, schema_name) File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 128, in get_schema_helper 'err': os.strerror(err)}) Exception: Could not retrieve schema from tcp:127.0.0.1:6640: Connection refused : And indeed, the openvswitch service was stopped in the undercloud: (undercloud) [stack@undercloud-0 ~]$ sudo systemctl status openvswitch ● openvswitch.service - Open vSwitch Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; disabled; vendor preset: disabled) Active: inactive (dead) since Mon 2019-04-29 05:57:43 EDT; 48min ago Main PID: 1703 (code=exited, status=0/SUCCESS) Apr 28 10:29:30 undercloud-0.redhat.local systemd[1]: Starting Open vSwitch... Apr 28 10:29:30 undercloud-0.redhat.local systemd[1]: Started Open vSwitch. Apr 29 05:57:43 undercloud-0.redhat.local systemd[1]: Stopping Open vSwitch... Apr 29 05:57:43 undercloud-0.redhat.local systemd[1]: Stopped Open vSwitch. When trying to bring it up manually, it did fail with the following trace: Apr 29 06:55:36 undercloud-0.redhat.local systemd[1]: Starting Open vSwitch Database Unit... -- Subject: Unit ovsdb-server.service has begun start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit ovsdb-server.service has begun starting up. Apr 29 06:55:37 undercloud-0.redhat.local ovs-ctl[157011]: Backing up database to /etc/openvswitch/conf.db.backup7.15.1-3682332033 [ OK ] Apr 29 06:55:37 undercloud-0.redhat.local ovs-ctl[157011]: Compacting database [ OK ] Apr 29 06:55:37 undercloud-0.redhat.local ovs-ctl[157011]: Converting database schema [ OK ] Apr 29 06:55:37 undercloud-0.redhat.local dockerd-current[13210]: /usr/lib/python2.7/site-packages/webob/acceptparse.py:1297: DeprecationWarning: The behavior of .best_match Apr 29 06:55:37 undercloud-0.redhat.local dockerd-current[13210]: DeprecationWarning, Apr 29 06:55:37 undercloud-0.redhat.local ovsdb-server[157094]: ovs|00002|daemon_unix|EMER|/var/run/openvswitch/ovsdb-server.pid.tmp: create failed (Permission denied) Apr 29 06:55:37 undercloud-0.redhat.local ovs-ctl[157011]: Starting ovsdb-server ovsdb-server: /var/run/openvswitch/ovsdb-server.pid.tmp: create failed (Permission denied) Apr 29 06:55:37 undercloud-0.redhat.local ovs-ctl[157011]: [FAILED] Apr 29 06:55:37 undercloud-0.redhat.local systemd[1]: ovsdb-server.service: control process exited, code=exited status=1 Apr 29 06:55:37 undercloud-0.redhat.local systemd[1]: Failed to start Open vSwitch Database Unit. However, the execution of sudo /usr/share/openvswitch/scripts/ovs-ctl start did help bringing up ovs: (undercloud) [stack@undercloud-0 ~]$ sudo /usr/share/openvswitch/scripts/ovs-ctl start ovsdb-server is already running. Starting ovs-vswitchd [ OK ] Enabling remote OVSDB managers [ OK ] And now the container is healthy again and the OC nodes reachable. Should we include any post_upgrade task to execute such a script (ovs-ctl start) in the openvswitch service template or tripleo-packages?
Created attachment 1574398 [details] preverification fix proposal log upgrade
It would be great to get this bug included in the next Z3 release, as it impacts the whole undercloud connectivity one the undercloud upgrade is run.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1672