Bug 1798669

Summary: [OSP13] deployment fails during TASK [Discovering nova hosts] DBDuplicateEntry Duplicate entry uniq_host_mappings0host
Product: Red Hat OpenStack Reporter: ggrimaux
Component: openstack-tripleo-commonAssignee: Martin Schuppert <mschuppe>
Status: CLOSED ERRATA QA Contact: Paras Babbar <pbabbar>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: eglynn, ipetrova, mburns, mschuppe, pbabbar, slinaber, stephenfin
Target Milestone: z11Keywords: Regression, Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-common-8.7.1-10.el7ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-10 11:24:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description ggrimaux 2020-02-05 18:48:36 UTC
Description of problem:
Client is facing the following random (so might be a race condition) issue while deploying is overcloud:

(the full stack trace will be put in the next private comment (because of PII))

TASK [Discovering nova hosts] ******:********************************************
Tuesday 04 February 2020  12:50:12 +0000 (0:00:00.473)       0:00:04.655 ******
fatal: [IP.33 -> IP.21]: FAILED! => {"changed": false, "cmd": ["docker", "exec", "nova_compute", "nova-manage", "cell_v2", "discover_hosts", "--by-service"], "delta": "0:00:03.084736", "end": "2020-02-04 12:50:15.893532", "msg": "non-zero return code", "rc": 1, "start": "2020-02-04 12:50:12.808796", "stderr": "", "stderr_lines": [], "stdout": "An error has occurred:\
Traceback (most recent call last):\
  File \\"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\\", line 1657, in main\
    ret = fn(*fn_args, **fn_kwargs)\
  File \\"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\\", line 1323, in discover_hosts\
    by_service)\
  File \\"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\\", line 265, in discover_hosts\
    by_service)\
  File \\"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\\", line 224, in _check_and_create_host_mappings\
    status_fn)\
  File \\"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\\", line 211, in _check_and_create_service_host_mappings\
    host_mapping.create()\
  File \\"/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py\\", line 226, in wrapper\
    return fn(self, *args, **kwargs)\
  File \\"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\\", line 114, in create\
    db_mapping = self._create_in_db(self._context, changes)\
  File \\"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\\", line 988, in wrapper\
    return fn(*args, **kwargs)\
  File \\"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\\", line 107, in _create_in_db\
    return _apply_updates(context, db_mapping, updates)\
  File \\"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\\", line 33, in _apply_updates\
    db_mapping.save(context.session)\
  File \\"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/models.py\\", line 50, in save\
    session.flush()\
  File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py\\", line 2243, in flush\
    self._flush(objects)\
  File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py\\", line 2369, in _flush\
    transaction.rollback(_capture_exception=True)\
  File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py\\", line 66, in __exit__\
    compat.reraise(exc_type, exc_value, exc_tb)\
  File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py\\", line 2333, in _flush\
    flush_context.execute()\
  File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py\\", line 391, in execute\
    rec.execute(self)\
  File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py\\", line 556, in execute\
    uow\
  File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py\\", line 181, in save_obj\
    mapper, table, insert)\
  File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py\\", line 866, in _emit_insert_statements\
    execute(statement, params)\
  File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py\\", line 948, in execute\
    return meth(self, multiparams, params)\
  File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py\\", line 269, in _execute_on_connection\
    return connection._execute_clauseelement(self, multiparams, params)\
  File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py\\", line 1060, in _execute_clauseelement\
    compiled_sql, distilled_params\
  File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py\\", line 1200, in _execute_context\
    context)\
  File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py\\", line 1409, in _handle_dbapi_exception\
    util.raise_from_cause(newraise, exc_info)\
  File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py\\", line 203, in raise_from_cause\
    reraise(type(exception), exception, tb=exc_tb, cause=cause)\
  File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py\\", line 1193, in _execute_context\
    context)\
  File \\"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py\\", line 507, in do_execute\
    cursor.execute(statement, parameters)\
  File \\"/usr/lib/python2.7/site-packages/pymysql/cursors.py\\", line 166, in execute\
    result = self._query(query)\
  File \\"/usr/lib/python2.7/site-packages/pymysql/cursors.py\\", line 322, in _query\
    conn.query(q)\
  File \\"/usr/lib/python2.7/site-packages/pymysql/connections.py\\", line 856, in query\
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)\
  File \\"/usr/lib/python2.7/site-packages/pymysql/connections.py\\", line 1057, in _read_query_result\
    result.read()\
  File \\"/usr/lib/python2.7/site-packages/pymysql/connections.py\\", line 1340, in read\
    first_packet = self.connection._read_packet()\
  File \\"/usr/lib/python2.7/site-packages/pymysql/connections.py\\", line 1014, in _read_packet\
    packet.check_error()\
  File \\"/usr/lib/python2.7/site-packages/pymysql/connections.py\\", line 393, in check_error\
    err.raise_mysql_exception(self._data)\
  File \\"/usr/lib/python2.7/site-packages/pymysql/err.py\\", line 107, in raise_mysql_exception\
    raise errorclass(errno, errval)\
DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, u\\"Duplicate entry \'compute01.domain.tld\' for key \'uniq_host_mappings0host\'\\") 


Version-Release number of selected component (if applicable):
openstack-tripleo-common-containers-8.7.1-5.el7ost.noarch
openstack-tripleo-common-8.7.1-5.el7ost.noarch
openstack-tripleo-heat-templates-8.4.1-23.el7ost.noarch

How reproducible:
Random(?)

Steps to Reproduce:
1.Deploy overcloud
2.
3.

Actual results:
overcloud deployment fails

Expected results:
overcloud deployment succeed

Additional info:
Have a sosreport from the Director node.
If anything else is needed please just ask.

Comment 4 Martin Schuppert 2020-02-07 09:24:20 UTC
The discovery task [1] is correct triggered via delegate only on a single node,
but all computes delegate jobs to this single host:


TASK [Discovering nova hosts] **************************************************
Thursday 06 February 2020  06:24:10 -0500 (0:00:00.694)       0:00:05.342 ***** 
ok: [192.168.24.11 -> 192.168.24.11]
ok: [192.168.24.10 -> 192.168.24.11]
ok: [192.168.24.17 -> 192.168.24.11]
ok: [192.168.24.14 -> 192.168.24.11]
ok: [192.168.24.7 -> 192.168.24.11]
ok: [192.168.24.6 -> 192.168.24.11]

We should just run this task once.

[1] https://github.com/openstack/tripleo-common/blob/stable/queens/playbooks/nova_cellv2_host_discover.yaml#L15

Comment 17 errata-xmlrpc 2020-03-10 11:24:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0760