Description of problem: 3: controller 60:compute node deployment failed with duplicate entry. Compute nodes are spread accross 10 leaf networks. Version-Release number of selected component (if applicable): OSP 13 2019-04-23.1 How reproducible: 70% of the time Steps to Reproduce: 1. Deploy UC with 10 leaf networks 2. Prepare oc with 3:controllers 6:computes on each leaf network. 10 leaf total 3. Deploy OC Actual results: failed: 2019-05-14 20:42:24Z [overcloud.AllNodesDeploySteps.Compute9Deployment_Step5.5]: SIGNAL_IN_PROGRESS Signal: deployment 637b6b6c-6dae-4e68-b84d-77ace81ff340 succeeded Stack overcloud CREATE_FAILED overcloud.AllNodesDeploySteps.Compute5Deployment_Step5.2: resource_type: OS::Heat::StructuredDeployment physical_resource_id: cf4b6185-4bfc-4338-9b84-b8b91c7ed605 status: CREATE_FAILED status_reason: | Error: resources[2]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 deploy_stdout: | ... " raise errorclass(errno, errval)", "DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, u\"Duplicate entry 'overcloud-compute6-4.localdomain' for key 'uniq_host_mappings0host'\") [SQL: u'INSE RT INTO host_mappings (created_at, updated_at, cell_id, host) VALUES (%(created_at)s, %(updated_at)s, %(cell_id)s, %(host)s)'] [parameters: {'host': u'overcloud-compute6- 4.localdomain', 'cell_id': 5, 'created_at': datetime.datetime(2019, 5, 14, 20, 42, 12, 365188), 'updated_at': None}] (Background on this error at: http://sqlalche.me/e/gk pj)", "stderr: " ] } to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/1a7b531c-e254-4971-a56d-6e64fd5a69d5_playbook.retry PLAY RECAP ********************************************************************* Expected results: pass Additional info: workaround: do a smaller deployment and scale up the environment.
Brad - can you add a sosreport or full set of logs? Its not clear where this error is coming from.
For reference her is the code where we are hitting a constraint issue: Master: https://opendev.org/openstack/nova/src/branch/master/nova/db/sqlalchemy/api_models.py#L156-L166 Queens: https://opendev.org/openstack/nova/src/branch/stable/queens/nova/db/sqlalchemy/api_models.py#L146-L156 We are placing "overcloud-compute6-4.localdomain" into the Cell HostMapping, but that host already exist. This may just be as expected with index stating at 0 for one and 1 for the other, but we have ``overcloud.AllNodesDeploySteps.Compute5Deployment_Step5.2`` and "overcloud-compute6-4.localdomain". "Compute5" and "compute6"? Could there be a template mistake? (I think not, since this is reproducible 70% of the time. A template error should result in 100% ...) @Brad, can you also share templates and deploy command? Actual access to a environment where this was reproduced would also be great.
Created attachment 1572675 [details] templates used
Hey Herald, I uploaded a copy of my templates. Also, this does not seem to happen all the time. Ill work on getting my environment back up and this issue reproducible. $ cat overcloud_deploy.sh #!/bin/bash openstack overcloud deploy \ --timeout 240 \ --templates /usr/share/openstack-tripleo-heat-templates \ -n /home/stack/virt/network_data_spine_leaf.yaml \ -r /home/stack/virt/roles_data_spine_leaf.yaml \ --libvirt-type kvm \ --ntp-server 192.168.220.1 \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/network-environment.yaml \ -e /home/stack/virt/nodes_data.yaml \ -e /home/stack/docker-images.yaml \ --log-file overcloud_deployment_$(date +%m_%d_%y__%H_%M_%S).log
(In reply to bjacot from comment #4) > Hey Herald, > > I uploaded a copy of my templates. Also, this does not seem to happen all > the time. Ill work on getting my environment back up and this issue > reproducible. > > $ cat overcloud_deploy.sh > #!/bin/bash > > openstack overcloud deploy \ > --timeout 240 \ > --templates /usr/share/openstack-tripleo-heat-templates \ > -n /home/stack/virt/network_data_spine_leaf.yaml \ > -r /home/stack/virt/roles_data_spine_leaf.yaml \ > --libvirt-type kvm \ > --ntp-server 192.168.220.1 \ > -e > /usr/share/openstack-tripleo-heat-templates/environments/network-isolation. > yaml \ > -e /home/stack/virt/network-environment.yaml \ > -e /home/stack/virt/nodes_data.yaml \ > -e /home/stack/docker-images.yaml \ > --log-file overcloud_deployment_$(date +%m_%d_%y__%H_%M_%S).log The templates and deploy command looks good. I wonder if this could be a race, or if it could be related to teardown/re-scheduling on error issues. (for example this, https://bugs.launchpad.net/nova/+bug/1815799 ?) We need those sosreport's Bob asked for (the logs), and if we don't see anything we may want to involve DFG:Compute.
Full traceback from 'openstack stack failures list overcloud --long' "DEBUG:novaclient.v2.client:GET call to compute for http://172.25.0.14:8774/v2.1/os-services?binary=nova-compute used request id req-f04c0768-58ff-48fa-ac09-3b4b7fb96627", "INFO:nova_cell_v2_discover_host:(cellv2) Service registered, running discovery", "Found 2 cell mappings.", "Skipping cell0 since it does not contain hosts.", "Getting computes from cell 'default': b4a8af65-cad8-4db4-a02a-220416aeee74", "Creating host mapping for service overcloud-compute1-0.localdomain", "An error has occurred:", "Traceback (most recent call last):", " File \"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\", line 1654, in main", " ret = fn(*fn_args, **fn_kwargs)", " File \"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\", line 1323, in discover_hosts", " by_service)", " File \"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\", line 265, in discover_hosts", " File \"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\", line 224, in _check_and_create_host_mappings", " status_fn)", " File \"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\", line 211, in _check_and_create_service_host_mappings", " host_mapping.create()", " File \"/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py\", line 226, in wrapper", " return fn(self, *args, **kwargs)", " File \"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\", line 114, in create", " db_mapping = self._create_in_db(self._context, changes)", " File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\", line 988, in wrapper", " return fn(*args, **kwargs)", " File \"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\", line 107, in _create_in_db", " return _apply_updates(context, db_mapping, updates)", " File \"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\", line 33, in _apply_updates", " db_mapping.save(context.session)", " File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/models.py\", line 50, in save", " session.flush()", " File \"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py\", line 2243, in flush", " self._flush(objects)", " File \"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py\", line 2369, in _flush", " transaction.rollback(_capture_exception=True)", " File \"/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py\", line 66, in __exit__", " compat.reraise(exc_type, exc_value, exc_tb)", " File \"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py\", line 2333, in _flush", " flush_context.execute()", " File \"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py\", line 391, in execute", " rec.execute(self)", " File \"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py\", line 556, in execute", " uow", " File \"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py\", line 181, in save_obj", " mapper, table, insert)", " File \"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py\", line 866, in _emit_insert_statements", " execute(statement, params)", " File \"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py\", line 948, in execute", " return meth(self, multiparams, params)", " File \"/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py\", line 269, in _execute_on_connection", " return connection._execute_clauseelement(self, multiparams, params)", " File \"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py\", line 1060, in _execute_clauseelement", " compiled_sql, distilled_params", " File \"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py\", line 1200, in _execute_context", " context)", " File \"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py\", line 1409, in _handle_dbapi_exception", " util.raise_from_cause(newraise, exc_info)", " File \"/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py\", line 203, in raise_from_cause", " reraise(type(exception), exception, tb=exc_tb, cause=cause)", " File \"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py\", line 1193, in _execute_context", " File \"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py\", line 507, in do_execute", " cursor.execute(statement, parameters)", " File \"/usr/lib/python2.7/site-packages/pymysql/cursors.py\", line 166, in execute", " result = self._query(query)", " File \"/usr/lib/python2.7/site-packages/pymysql/cursors.py\", line 322, in _query", " conn.query(q)", " File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 856, in query", " self._affected_rows = self._read_query_result(unbuffered=unbuffered)", " File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 1057, in _read_query_result", " result.read()", " File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 1340, in read", " first_packet = self.connection._read_packet()", " File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 1014, in _read_packet", " packet.check_error()", " File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 393, in check_error", " err.raise_mysql_exception(self._data)", " File \"/usr/lib/python2.7/site-packages/pymysql/err.py\", line 107, in raise_mysql_exception", " raise errorclass(errno, errval)", "DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, u\"Duplicate entry 'overcloud-compute1-0.localdomain' for key 'uniq_host_mappings0host'\") [SQL: u'INSERT INTO host_mappings (created_at, updated_at, cell_id, host) VALUES (%(created_at)s, %(updated_at)s, %(cell_id)s, %(host)s)'] [parameters: {'host': u'overcloud-compute1-0.localdomain', 'cell_id': 5, 'created_at': datetime.datetime(2019, 5, 31, 20, 26, 44, 449336), 'updated_at': None}] (Background on this error at: http://sqlalche.me/e/gkpj)", "stderr: "
$ openstack server list +--------------------------------------+-------------------------+--------+-------------------------+----------------+-----------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+-------------------------+--------+-------------------------+----------------+-----------+ | 051772dc-6ba1-47a1-bc63-f2394b9ee65c | overcloud-controller0-0 | ACTIVE | ctlplane=192.168.220.31 | overcloud-full | control0 | | 0eb99cf7-905d-4718-858e-bc79d2d8ed0d | overcloud-controller0-1 | ACTIVE | ctlplane=192.168.220.21 | overcloud-full | control0 | | 3c8e5b53-468a-4c48-94aa-6b3b6b278a38 | overcloud-compute4-0 | ACTIVE | ctlplane=192.168.224.10 | overcloud-full | compute4 | | 3e0af48c-cc87-4f00-9596-324006c25728 | overcloud-compute10-0 | ACTIVE | ctlplane=192.168.230.32 | overcloud-full | compute10 | | 46e8149c-1b12-4213-b8bd-00d94e2ae655 | overcloud-compute7-0 | ACTIVE | ctlplane=192.168.227.18 | overcloud-full | compute7 | | 4b3e2ecb-25dd-4200-bd65-aab7715b081a | overcloud-controller0-2 | ACTIVE | ctlplane=192.168.220.11 | overcloud-full | control0 | | a363f5d3-fd7f-4a98-aa37-97e582e7b97b | overcloud-compute3-0 | ACTIVE | ctlplane=192.168.223.26 | overcloud-full | compute3 | | c458e8a4-0723-4c64-8120-c4482cb84255 | overcloud-compute1-0 | ACTIVE | ctlplane=192.168.221.10 | overcloud-full | compute1 | | c19e2a2a-03d7-4054-b04a-bcaf2417dc72 | overcloud-compute8-0 | ACTIVE | ctlplane=192.168.228.12 | overcloud-full | compute8 | | e19303af-7c78-42f7-acc9-f5c4fbad9fd0 | overcloud-compute6-0 | ACTIVE | ctlplane=192.168.226.10 | overcloud-full | compute6 | | 802be941-e1f6-451f-9603-de40db03f935 | overcloud-compute9-0 | ACTIVE | ctlplane=192.168.229.19 | overcloud-full | compute9 | | 5fb3fea4-7a25-4ad6-8064-ceeb38d1214e | overcloud-compute5-0 | ACTIVE | ctlplane=192.168.225.13 | overcloud-full | compute5 | | a74a7022-1e3f-44c2-801a-e629e19f998c | overcloud-compute2-0 | ACTIVE | ctlplane=192.168.222.16 | overcloud-full | compute2 | | d4399790-563e-45db-93cb-049def7023da | overcloud-compute11-0 | ACTIVE | ctlplane=192.168.231.23 | overcloud-full | compute11 | +--------------------------------------+-------------------------+--------+-------------------------+----------------+-----------+ $ openstack server show overcloud-compute1-0 +-------------------------------------+----------------------------------------------------------+ | Field | Value | +-------------------------------------+----------------------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | nova | | OS-EXT-SRV-ATTR:host | core-undercloud-0.redhat.local | | OS-EXT-SRV-ATTR:hypervisor_hostname | dcc268d3-f1d2-4f4e-a37d-32a685306773 | | OS-EXT-SRV-ATTR:instance_name | instance-00000007 | | OS-EXT-STS:power_state | Running | | OS-EXT-STS:task_state | None | | OS-EXT-STS:vm_state | active | | OS-SRV-USG:launched_at | 2019-05-31T19:43:17.000000 | | OS-SRV-USG:terminated_at | None | | accessIPv4 | | | accessIPv6 | | | addresses | ctlplane=192.168.221.10 | | config_drive | True | | created | 2019-05-31T19:36:07Z | | flavor | compute1 (632d1aef-932c-4eb6-9a6d-720729b6bc66) | | hostId | d478fae80ef194fad70bacdb3080d9e0d2f15caddadf33dd60b44995 | | id | c458e8a4-0723-4c64-8120-c4482cb84255 | | image | overcloud-full (8a62c585-e0fe-4a28-abd1-14a90e301297) | | key_name | default | | name | overcloud-compute1-0 | | progress | 0 | | project_id | 523216891f884db599dac64fc58acad1 | | properties | | | security_groups | name='default' | | status | ACTIVE | | updated | 2019-05-31T19:43:17Z | | user_id | f46eed435d474bde85f3ebe3f09c42ae | | volumes_attached | | +-------------------------------------+----------------------------------------------------------+
Created attachment 1576661 [details] sos report for overcloud-compute1-0
Brad - if the setup is still available can we also get the sosreport from the controller to try and understand when entries were created. Including Compute DFG to help understand what can cause the DBDuplicateEntry in nova db.
Hey Bob and Compute DFG. I saw this issue again today. I grabbed SOS reports for the 3:controllers and overcloud-compute8-5.localdomain. I uploaded the files here http://rhos-release.virt.bos.redhat.com/log/bz1710118/. Today's error: overcloud.AllNodesDeploySteps.Compute4Deployment_Step5.5: resource_type: OS::Heat::StructuredDeployment physical_resource_id: 0a4a89ab-2d16-423d-aa61-8d0c48b3ec7c status: CREATE_FAILED status_reason: | Error: resources[5]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 deploy_stdout: | ... " raise errorclass(errno, errval)", "DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, u\"Duplicate entry 'overcloud-compute8-5.localdomain' for key 'uniq_host_mappings0host'\") [SQL : u'INSERT INTO host_mappings (created_at, updated_at, cell_id, host) VALUES (%(created_at)s, %(updated_at)s, %(cell_id)s, %(host)s)'] [parameters: {'created_at': datetime.datetime(2019, 6, 4, 21, 8, 15, 453649), 'cell_id': 6, 'host': u'overcloud-compute8-5.localdomain', 'updated_at': None}] (Background on this error at: h ttp://sqlalche.me/e/gkpj)", "stderr: " ] } to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/0d5fe8c3-b9e3-4c81-930c-b8c851a129a8_playbook.retry
As discussed on IRC, this is likely a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1699393 and fixed in openstack-tripleo-heat-templates-8.3.1-16.el7ost and later.
Thanks Martin! Closing as duplicate. *** This bug has been marked as a duplicate of bug 1699393 ***