Bug 1702737 - Overcloud deploy fails due to a error in nova api database
Summary: Overcloud deploy fails due to a error in nova api database
Keywords:
Status: CLOSED DUPLICATE of bug 1700876
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: ---
Assignee: OSP DFG:Compute
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-24 15:47 UTC by Candido Campos
Modified: 2023-03-21 19:15 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-04-25 07:01:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
deploy logs (7.91 MB, text/plain)
2019-04-24 15:47 UTC, Candido Campos
no flags Details

Description Candido Campos 2019-04-24 15:47:30 UTC
Created attachment 1558293 [details]
deploy logs

Description of problem:
Overcloud deploy fails due to a error in nova api database creating host mapping entries:

        "DEBUG:novaclient.v2.client:RESP BODY: {\"services\": [{\"status\": \"enabled\", \"binary\": \"nova-compute\", \"host\": \"compute-2.localdomain\", \"zone\": \"nova\", \"state\": \"up\", \"forced_down\": false, \"disabled_reason\": null, \"id\": 62, \"updated_at\": \"2019-04-23T10:50:32.000000\"}, {\"status\": \"enabled\", \"binary\": \"nova-compute\", \"host\": \"compute-0.localdomain\", \"zone\": \"nova\", \"state\": \"up\", \"forced_down\": false, \"disabled_reason\": null, \"id\": 71, \"updated_at\": \"2019-04-23T10:50:30.000000\"}, {\"status\": \"enabled\", \"binary\": \"nova-compute\", \"host\": \"compute-1.localdomain\", \"zone\": \"nova\", \"state\": \"up\", \"forced_down\": false, \"disabled_reason\": null, \"id\": 74, \"updated_at\": \"2019-04-23T10:50:29.000000\"}]}",
        "DEBUG:novaclient.v2.client:GET call to compute for http://172.17.1.10:8774/v2.1/os-services?binary=nova-compute used request id req-a496061b-fbb7-4e22-8634-460e5e1d404b",
        "INFO:nova_cell_v2_discover_host:(cellv2) Service registered, running discovery",
        "Found 2 cell mappings.",
        "Skipping cell0 since it does not contain hosts.",
        "Getting computes from cell 'default': 53257668-f8fb-4767-9ee5-c06802a3deb3",
        "Creating host mapping for service compute-2.localdomain",
        "An error has occurred:",
        "Traceback (most recent call last):",
        "  File \"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\", line 2338, in main",
        "    ret = fn(*fn_args, **fn_kwargs)",
        "  File \"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\", line 1455, in discover_hosts",

...

        "  File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 393, in check_error",
        "    err.raise_mysql_exOvercloud configuration failed.
ception(self._data)",
        "  File \"/usr/lib/python2.7/site-packages/pymysql/err.py\", line 107, in raise_mysql_exception",
        "    raise errorclass(errno, errval)",
        "DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, u\"Duplicate entry 'compute-2.localdomain' for key 'uniq_host_mappings0host'\") [SQL: u'INSERT INTO host_mappings (created_at, updated_at, cell_id, host) VALUES (%(created_at)s, %(updated_at)s, %(cell_id)s, %(host)s)'] [parameters: {'host': u'compute-2.localdomain', 'cell_id': 5, 'created_at': datetime.datetime(2019, 4, 23, 10, 50, 43, 384733), 'updated_at': None}] (Background on this error at: http://sqlalche.me/e/gkpj)",
        "stderr: "
    ]
}



 the bug seems some race condition in overcloud deply becaause the command:

nova-manage cell_v2 discover_hosts --by-service --verbose

It is executed in the 3 computes and it create the host mappping entry in nova api database for the 3 computes in the first execution:

...
        "DEBUG:novaclient.v2.client:RESP BODY: {\"services\": [{\"status\": \"enabled\", \"binary\": \"nova-compute\", \"host\": \"compute-2.localdomain\", \"zone\": \"nova\", \"state\": \"up\", \"forced_down\": false, \"disabled_reason\": null, \"id\": 62, \"updated_at\": \"2019-04-23T10:50:32.000000\"}, {\"status\": \"enabled\", \"binary\": \"nova-compute\", \"host\": \"compute-0.localdomain\", \"zone\": \"nova\", \"state\": \"up\", \"forced_down\": false, \"disabled_reason\": null, \"id\": 71, \"updated_at\": \"2019-04-23T10:50:30.000000\"}, {\"status\": \"enabled\", \"binary\": \"nova-compute\", \"host\": \"compute-1.localdomain\", \"zone\": \"nova\", \"state\": \"up\", \"forced_down\": false, \"disabled_reason\": null, \"id\": 74, \"updated_at\": \"2019-04-23T10:50:29.000000\"}]}",
        "DEBUG:novaclient.v2.client:GET call to compute for http://172.17.1.10:8774/v2.1/os-services?binary=nova-compute used request id req-217a9d58-8c0d-44a4-a25a-585bc2e9ecd9",
        "INFO:nova_cell_v2_discover_host:(cellv2) Service registered, running discovery",
        "Found 2 cell mappings.",
        "Skipping cell0 since it does not contain hosts.",
        "Getting computes from cell 'default': 53257668-f8fb-4767-9ee5-c06802a3deb3",
        "Creating host mapping for service compute-2.localdomain",
        "Creating host mapping for service compute-0.localdomain",
        "Creating host mapping for service compute-1.localdomain",
        "Found 3 unmapped computes in cell: 53257668-f8fb-4767-9ee5-c06802a3deb3",
        "",
        "stderr: "
    ]
}

...

And in the next executions seems do nothing:

...

        "DEBUG:novaclient.v2.client:RESP BODY: {\"services\": [{\"status\": \"enabled\", \"binary\": \"nova-compute\", \"host\": \"compute-2.localdomain\", \"zone\": \"nova\", \"state\": \"up\", \"forced_down\": false, \"disabled_reason\": null, \"id\": 62, \"updated_at\": \"2019-04-23T10:50:32.000000\"}, {\"status\": \"enabled\", \"binary\": \"nova-compute\", \"host\": \"compute-0.localdomain\", \"zone\": \"nova\", \"state\": \"up\", \"forced_down\": false, \"disabled_reason\": null, \"id\": 71, \"updated_at\": \"2019-04-23T10:50:30.000000\"}, {\"status\": \"enabled\", \"binary\": \"nova-compute\", \"host\": \"compute-1.localdomain\", \"zone\": \"nova\", \"state\": \"up\", \"forced_down\": false, \"disabled_reason\": null, \"id\": 74, \"updated_at\": \"2019-04-23T10:50:29.000000\"}]}",
        "DEBUG:novaclient.v2.client:GET call to compute for http://172.17.1.10:8774/v2.1/os-services?binary=nova-compute used request id req-3110ee7d-e26b-40c1-bf3d-a7ce5ad4e9fa",
        "INFO:nova_cell_v2_discover_host:(cellv2) Service registered, running discovery",
        "Found 2 cell mappings.",
        "Skipping cell0 since it does not contain hosts.",
        "Getting computes from cell 'default': 53257668-f8fb-4767-9ee5-c06802a3deb3",
        "Found 0 unmapped computes in cell: 53257668-f8fb-4767-9ee5-c06802a3deb3",
        "",
        "stderr: "
    ]
...

But when the failure occurs, it tries to re-create the entry:

...

        "DEBUG:novaclient.v2.client:GET call to compute for http://172.17.1.10:8774/v2.1/os-services?binary=nova-compute used request id req-a496061b-fbb7-4e22-8634-460e5e1d404b",
        "INFO:nova_cell_v2_discover_host:(cellv2) Service registered, running discovery",
        "Found 2 cell mappings.",
        "Skipping cell0 since it does not contain hosts.",
        "Getting computes from cell 'default': 53257668-f8fb-4767-9ee5-c06802a3deb3",
        "Creating host mapping for service compute-2.localdomain",
        "An error has occurred:",
        "Traceback (most recent call last):",

....


        "  File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 393, in check_error",
        "    err.raise_mysql_exOvercloud configuration failed.
ception(self._data)",
        "  File \"/usr/lib/python2.7/site-packages/pymysql/err.py\", line 107, in raise_mysql_exception",
        "    raise errorclass(errno, errval)",
        "DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, u\"Duplicate entry 'compute-2.localdomain' for key 'uniq_host_mappings0host'\") [SQL: u'INSERT INTO host_mappings (created_at, updated_at, cell_id, host) VALUES (%(created_at)s, %(updated_at)s, %(cell_id)s, %(host)s)'] [parameters: {'host': u'compute-2.localdomain', 'cell_id': 5, 'created_at': datetime.datetime(2019, 4, 23, 10, 50, 43, 384733), 'updated_at': None}] (Background on this error at: http://sqlalche.me/e/gkpj)",
        "stderr: "
    ]
}


.....


Playbook:

(undercloud) [stack@undercloud-0 ~]$ vi /var/lib/mistral/overcloud/deploy_steps_playbook.yaml



[root@compute-2 heat-admin]# vi /var/lib/docker-container-startup-configs.json
[root@compute-2 heat-admin]# vi /var/lib/docker-config-scripts/nova_cell_v2_discover_host.py




Version-Release number of selected component (if applicable):

(undercloud) [stack@undercloud-0 ~]$ cat core_puddle_version 
2019-04-12.1(undercloud) [stack@undercloud-0 ~]$ cat /etc/rhosp-release 
Red Hat OpenStack Platform release 14.0.2 RC (Rocky)



How reproducible:


Steps to Reproduce:
1.Deploy openstack using tripleo with several computes(3 in my test)
I have seen the error once, but after I deleted the stack and I could redeploy without error.


Actual results:
Sometimes the deploy fails

Expected results:
No failure

Additional info:

Comment 1 Martin Schuppert 2019-04-25 07:01:43 UTC
This is a duplicate of BZ1700876

*** This bug has been marked as a duplicate of bug 1700876 ***


Note You need to log in before you can comment on or make changes to this bug.