Bug 1643147
| Summary: | Unable to spawn an instance on a new env due to volume retrieve failure | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Raoul Scarazzini <rscarazz> |
| Component: | openstack-nova | Assignee: | OSP DFG:Compute <osp-dfg-compute> |
| Status: | CLOSED DUPLICATE | QA Contact: | OSP DFG:Compute <osp-dfg-compute> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 14.0 (Rocky) | CC: | abishop, berrange, cschwede, dasmith, eglynn, fpercoco, jhakimra, kchamart, mbooth, mschuppe, rscarazz, sbauza, sferdjao, sgordon, vromanso |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-11-01 15:40:32 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Raoul Scarazzini
2018-10-25 15:17:44 UTC
The sosreports don't contain any cinder or nova logs. Not sure what happened, but could you capture them manually? I examined the logs on the overcloud. Cinder's logs show the volume 0bec4d7d-4f5b-47a5-8e0e-5bf45d139fc5 is successfully created, but is then deleted a few seconds later.
I believe this is triggered by nova cleaning things up after the instance could not be spawned. I then found this in the nova-scheduler.log on controller-0.
2018-10-25 14:21:59.994 26 ERROR nova.scheduler.client.report [req-7d24a7ff-c533-41f3-8756-5efd2742b4b0 f66a66945cc6441c8906f5fcefb080ee 61e8865cb21f4e229cf8278713a7e39a - default default] Failed to retrieve allocation candidates from placement API for filters: RequestGroup(use_same_provider=False, resources={MEMORY_MB:2048, VCPU:2}, traits=[], aggregates=[])
Got 500: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator at
[no address given] to inform them of the time this error occurred,
and the actions you performed just before this error.</p>
<p>More information about this error may be available
in the server error log.</p>
</body></html>
.
2018-10-25 14:21:59.995 26 INFO nova.scheduler.manager [req-7d24a7ff-c533-41f3-8756-5efd2742b4b0 f66a66945cc6441c8906f5fcefb080ee 61e8865cb21f4e229cf8278
This is during the time period between when cinder created and later deleted the volume.
Moving over to Compute as this does not appear to be a cinder issue.
Martin, could this be another manifestation of the placement db_sync bug? (In reply to Matthew Booth from comment #4) > Martin, could this be another manifestation of the placement db_sync bug? Yes, this is another issue due to the deployment race, which we fix with BZ1636399 to make sure nova-placement gets started on ctrl1+2 after the nova api db_sync finished on ctr0: * haproxy nova_placement proxy with no backend servers Oct 25 10:02:33 overcloud-controller-0 haproxy[11]: proxy nova_placement has no server available! * placement started on controller-1+2 Oct 25 10:04:27 overcloud-controller-0 haproxy[11]: Server nova_placement/overcloud-controller-2.internalapi.localdomain is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 1ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue. Oct 25 10:04:31 overcloud-controller-0 haproxy[11]: Server nova_placement/overcloud-controller-1.internalapi.localdomain is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 0ms. 2 active and 0 backup servers online. 0 sessions requeued, 0 total in queue. * on controller-0 where we run the api db_sync placement gets started Oct 25 10:09:33 overcloud-controller-0 haproxy[11]: Server nova_placement/overcloud-controller-0.internalapi.localdomain is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 0ms. 3 active and 0 backup servers online. 0 sessions requeued, 0 total in queue. From /var/log/containers/httpd/nova-placement/placement_wsgi_error.log : * controller-1: [Tue Oct 30 00:01:25.983130 2018] [:error] [pid 15] [remote 172.17.0.11:120] mod_wsgi (pid=15): Target WSGI script '/var/www/cgi-bin/nova/nova-placement-api' cannot be loaded as Python module. [Tue Oct 30 00:01:25.983164 2018] [:error] [pid 15] [remote 172.17.0.11:120] mod_wsgi (pid=15): Exception occurred processing WSGI script '/var/www/cgi-bin/nova/nova-placement-api'. [Tue Oct 30 00:01:25.983191 2018] [:error] [pid 15] [remote 172.17.0.11:120] Traceback (most recent call last): [Tue Oct 30 00:01:25.983204 2018] [:error] [pid 15] [remote 172.17.0.11:120] File "/var/www/cgi-bin/nova/nova-placement-api", line 54, in <module> [Tue Oct 30 00:01:25.983228 2018] [:error] [pid 15] [remote 172.17.0.11:120] application = init_application() [Tue Oct 30 00:01:25.983235 2018] [:error] [pid 15] [remote 172.17.0.11:120] File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi.py", line 108, in init_application [Tue Oct 30 00:01:25.983254 2018] [:error] [pid 15] [remote 172.17.0.11:120] db_api.configure(conf.CONF) [Tue Oct 30 00:01:25.983259 2018] [:error] [pid 15] [remote 172.17.0.11:120] File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/db_api.py", line 35, in configure [Tue Oct 30 00:01:25.983272 2018] [:error] [pid 15] [remote 172.17.0.11:120] **_get_db_conf(conf.placement_database)) [Tue Oct 30 00:01:25.983278 2018] [:error] [pid 15] [remote 172.17.0.11:120] File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 788, in configure [Tue Oct 30 00:01:25.983309 2018] [:error] [pid 15] [remote 172.17.0.11:120] self._factory.configure(**kw) [Tue Oct 30 00:01:25.983319 2018] [:error] [pid 15] [remote 172.17.0.11:120] File "/usr/lib/python2.7/site-packages/debtcollector/renames.py", line 43, in decorator [Tue Oct 30 00:01:25.983333 2018] [:error] [pid 15] [remote 172.17.0.11:120] return wrapped(*args, **kwargs) [Tue Oct 30 00:01:25.983339 2018] [:error] [pid 15] [remote 172.17.0.11:120] File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 312, in configure [Tue Oct 30 00:01:25.983350 2018] [:error] [pid 15] [remote 172.17.0.11:120] self._configure(False, kw) [Tue Oct 30 00:01:25.983355 2018] [:error] [pid 15] [remote 172.17.0.11:120] File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 317, in _configure [Tue Oct 30 00:01:25.983365 2018] [:error] [pid 15] [remote 172.17.0.11:120] raise TypeError("this TransactionFactory is already started") [Tue Oct 30 00:01:25.983378 2018] [:error] [pid 15] [remote 172.17.0.11:120] TypeError: this TransactionFactory is already started [Tue Oct 30 00:01:37.259985 2018] [:error] [pid 15] [remote 172.17.0.14:120] mod_wsgi (pid=15): Target WSGI script '/var/www/cgi-bin/nova/nova-placement-api' cannot be loaded as Python module. * controller-2: [Tue Oct 30 00:00:42.921489 2018] [:error] [pid 15] [remote 172.17.0.11:72] mod_wsgi (pid=15): Target WSGI script '/var/www/cgi-bin/nova/nova-placement-api' cannot be loaded as Python module. [Tue Oct 30 00:00:42.921517 2018] [:error] [pid 15] [remote 172.17.0.11:72] mod_wsgi (pid=15): Exception occurred processing WSGI script '/var/www/cgi-bin/nova/nova-placement-api'. [Tue Oct 30 00:00:42.921541 2018] [:error] [pid 15] [remote 172.17.0.11:72] Traceback (most recent call last): [Tue Oct 30 00:00:42.921559 2018] [:error] [pid 15] [remote 172.17.0.11:72] File "/var/www/cgi-bin/nova/nova-placement-api", line 54, in <module> [Tue Oct 30 00:00:42.921592 2018] [:error] [pid 15] [remote 172.17.0.11:72] application = init_application() [Tue Oct 30 00:00:42.921603 2018] [:error] [pid 15] [remote 172.17.0.11:72] File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi.py", line 108, in init_application [Tue Oct 30 00:00:42.921632 2018] [:error] [pid 15] [remote 172.17.0.11:72] db_api.configure(conf.CONF) [Tue Oct 30 00:00:42.921642 2018] [:error] [pid 15] [remote 172.17.0.11:72] File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/db_api.py", line 35, in configure [Tue Oct 30 00:00:42.921663 2018] [:error] [pid 15] [remote 172.17.0.11:72] **_get_db_conf(conf.placement_database)) [Tue Oct 30 00:00:42.921673 2018] [:error] [pid 15] [remote 172.17.0.11:72] File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 788, in configure [Tue Oct 30 00:00:42.921695 2018] [:error] [pid 15] [remote 172.17.0.11:72] self._factory.configure(**kw) [Tue Oct 30 00:00:42.921704 2018] [:error] [pid 15] [remote 172.17.0.11:72] File "/usr/lib/python2.7/site-packages/debtcollector/renames.py", line 43, in decorator [Tue Oct 30 00:00:42.921725 2018] [:error] [pid 15] [remote 172.17.0.11:72] return wrapped(*args, **kwargs) [Tue Oct 30 00:00:42.921735 2018] [:error] [pid 15] [remote 172.17.0.11:72] File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 312, in configure [Tue Oct 30 00:00:42.921755 2018] [:error] [pid 15] [remote 172.17.0.11:72] self._configure(False, kw) [Tue Oct 30 00:00:42.921764 2018] [:error] [pid 15] [remote 172.17.0.11:72] File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 317, in _configure [Tue Oct 30 00:00:42.921784 2018] [:error] [pid 15] [remote 172.17.0.11:72] raise TypeError("this TransactionFactory is already started") [Tue Oct 30 00:00:42.921799 2018] [:error] [pid 15] [remote 172.17.0.11:72] TypeError: this TransactionFactory is already started Only controller-0 shows no errors where nova-placement started correct after the api db_sync. After a restart of nova_placement container on ctrl1+2 we can start instances on the overcloud. This is fixed with BZ1636399 / openstack-tripleo-heat-templates-9.0.1-0.20181013060866.ffbe879.el7ost where we make sure nova-placement gets started on ctrl1+2 after the nova api db_sync finished on ctr0. *** This bug has been marked as a duplicate of bug 1636399 *** |