Bug 1641877
Summary: | [UPGRADES][14] Failed to spawn vm after controllers upgrade | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Yurii Prokulevych <yprokule> |
Component: | openstack-nova | Assignee: | OSP DFG:Compute <osp-dfg-compute> |
Status: | CLOSED DUPLICATE | QA Contact: | OSP DFG:Compute <osp-dfg-compute> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 14.0 (Rocky) | CC: | augol, berrange, ccamacho, dasmith, eglynn, jhakimra, kchamart, lyarwood, mbooth, sbauza, sferdjao, sgolovat, sgordon, vromanso |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-10-25 12:41:12 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Yurii Prokulevych
2018-10-23 05:39:18 UTC
Restarting nova containers on controllers didn't help. Was able to spawn vm after computes got upgraded to 14 In the placement log for controllers 1 and 2, there are a ton of database-related errors, which appear to be causing every query to result in a 500 (as reported). Here are two of the types of errors I see, quoted from controller 1's log: [Mon Oct 22 12:02:54.381200 2018] [:error] [pid 15] [remote 192.168.172.12:92] mod_wsgi (pid=15): Exception occurred processing WSGI script '/var/www/cgi-bin/nova/nova-placement-api '. [Mon Oct 22 12:02:54.381234 2018] [:error] [pid 15] [remote 192.168.172.12:92] Traceback (most recent call last): [Mon Oct 22 12:02:54.381257 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/var/www/cgi-bin/nova/nova-placement-api", line 54, in <module> [Mon Oct 22 12:02:54.381298 2018] [:error] [pid 15] [remote 192.168.172.12:92] application = init_application() [Mon Oct 22 12:02:54.381313 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi.py", line 120, in init_appl ication [Mon Oct 22 12:02:54.381345 2018] [:error] [pid 15] [remote 192.168.172.12:92] return deploy.loadapp(conf.CONF) [Mon Oct 22 12:02:54.381358 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/deploy.py", line 119, in loadapp [Mon Oct 22 12:02:54.381485 2018] [:error] [pid 15] [remote 192.168.172.12:92] update_database() [Mon Oct 22 12:02:54.381502 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/deploy.py", line 99, in update_d atabase [Mon Oct 22 12:02:54.381531 2018] [:error] [pid 15] [remote 192.168.172.12:92] resource_provider.ensure_trait_sync(ctx) [Mon Oct 22 12:02:54.381546 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/objects/resource_provider.py", l ine 146, in ensure_trait_sync [Mon Oct 22 12:02:54.382796 2018] [:error] [pid 15] [remote 192.168.172.12:92] _trait_sync(ctx) [Mon Oct 22 12:02:54.382827 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 154, in wrapper [Mon Oct 22 12:02:54.382987 2018] [:error] [pid 15] [remote 192.168.172.12:92] ectxt.value = e.inner_exc [Mon Oct 22 12:02:54.383006 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ [Mon Oct 22 12:02:54.383172 2018] [:error] [pid 15] [remote 192.168.172.12:92] self.force_reraise() [Mon Oct 22 12:02:54.383190 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise [Mon Oct 22 12:02:54.383218 2018] [:error] [pid 15] [remote 192.168.172.12:92] six.reraise(self.type_, self.value, self.tb) [Mon Oct 22 12:02:54.383231 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 142, in wrapper [Mon Oct 22 12:02:54.383254 2018] [:error] [pid 15] [remote 192.168.172.12:92] return f(*args, **kwargs) [Mon Oct 22 12:02:54.383267 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 993, in wrapper [Mon Oct 22 12:02:54.383674 2018] [:error] [pid 15] [remote 192.168.172.12:92] return fn(*args, **kwargs) [Mon Oct 22 12:02:54.383699 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/objects/resource_provider.py", l ine 102, in _trait_sync [Mon Oct 22 12:02:54.383732 2018] [:error] [pid 15] [remote 192.168.172.12:92] res = ctx.session.execute(sel).fetchall() [Mon Oct 22 12:02:54.383749 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 1176, in execute [Mon Oct 22 12:02:54.384573 2018] [:error] [pid 15] [remote 192.168.172.12:92] bind, close_with_result=True).execute(clause, params or {}) [Mon Oct 22 12:02:54.384593 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 948, in execute [Mon Oct 22 12:02:54.385211 2018] [:error] [pid 15] [remote 192.168.172.12:92] return meth(self, multiparams, params) [Mon Oct 22 12:02:54.385236 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_conne ction [Mon Oct 22 12:02:54.386328 2018] [:error] [pid 15] [remote 192.168.172.12:92] return connection._execute_clauseelement(self, multiparams, params) [Mon Oct 22 12:02:54.386355 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseel ement [Mon Oct 22 12:02:54.386388 2018] [:error] [pid 15] [remote 192.168.172.12:92] compiled_sql, distilled_params [Mon Oct 22 12:02:54.386402 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context [Mon Oct 22 12:02:54.386425 2018] [:error] [pid 15] [remote 192.168.172.12:92] context) [Mon Oct 22 12:02:54.386438 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1409, in _handle_dbapi_exc eption [Mon Oct 22 12:02:54.386461 2018] [:error] [pid 15] [remote 192.168.172.12:92] util.raise_from_cause(newraise, exc_info) [Mon Oct 22 12:02:54.386474 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause [Mon Oct 22 12:02:54.386600 2018] [:error] [pid 15] [remote 192.168.172.12:92] reraise(type(exception), exception, tb=exc_tb, cause=cause) [Mon Oct 22 12:02:54.386617 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context [Mon Oct 22 12:02:54.386644 2018] [:error] [pid 15] [remote 192.168.172.12:92] context) [Mon Oct 22 12:02:54.386681 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 507, in do_execute [Mon Oct 22 12:02:54.387083 2018] [:error] [pid 15] [remote 192.168.172.12:92] cursor.execute(statement, parameters) [Mon Oct 22 12:02:54.387103 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 166, in execute [Mon Oct 22 12:02:54.387307 2018] [:error] [pid 15] [remote 192.168.172.12:92] result = self._query(query) [Mon Oct 22 12:02:54.387325 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 322, in _query [Mon Oct 22 12:02:54.387352 2018] [:error] [pid 15] [remote 192.168.172.12:92] conn.query(q) [Mon Oct 22 12:02:54.387366 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 856, in query [Mon Oct 22 12:02:54.387827 2018] [:error] [pid 15] [remote 192.168.172.12:92] self._affected_rows = self._read_query_result(unbuffered=unbuffered) [Mon Oct 22 12:02:54.387850 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1057, in _read_query_result [Mon Oct 22 12:02:54.387879 2018] [:error] [pid 15] [remote 192.168.172.12:92] result.read() [Mon Oct 22 12:02:54.387897 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1340, in read [Mon Oct 22 12:02:54.387922 2018] [:error] [pid 15] [remote 192.168.172.12:92] first_packet = self.connection._read_packet() [Mon Oct 22 12:02:54.387935 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1014, in _read_packet [Mon Oct 22 12:02:54.387958 2018] [:error] [pid 15] [remote 192.168.172.12:92] packet.check_error() [Mon Oct 22 12:02:54.387971 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 393, in check_error [Mon Oct 22 12:02:54.388007 2018] [:error] [pid 15] [remote 192.168.172.12:92] err.raise_mysql_exception(self._data) [Mon Oct 22 12:02:54.388022 2018] [:error] [pid 15] [remote 192.168.172.12:92] File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 107, in raise_mysql_exception [Mon Oct 22 12:02:54.388122 2018] [:error] [pid 15] [remote 192.168.172.12:92] raise errorclass(errno, errval) [Mon Oct 22 12:02:54.388213 2018] [:error] [pid 15] [remote 192.168.172.12:92] ProgrammingError: (pymysql.err.ProgrammingError) (1146, u"Table 'nova_placement.traits' doesn't exist" ) [SQL: u'SELECT traits.name \\nFROM traits'] (Background on this error at: http://sqlalche.me/e/f405) [Mon Oct 22 14:04:06.902733 2018] [:error] [pid 16] [remote 192.168.172.18:84] mod_wsgi (pid=16): Exception occurred processing WSGI script '/var/www/cgi-bin/nova/nova-placement-api'. [Mon Oct 22 14:04:06.902761 2018] [:error] [pid 16] [remote 192.168.172.18:84] Traceback (most recent call last): [Mon Oct 22 14:04:06.902784 2018] [:error] [pid 16] [remote 192.168.172.18:84] File "/var/www/cgi-bin/nova/nova-placement-api", line 54, in <module> [Mon Oct 22 14:04:06.902827 2018] [:error] [pid 16] [remote 192.168.172.18:84] application = init_application() [Mon Oct 22 14:04:06.902841 2018] [:error] [pid 16] [remote 192.168.172.18:84] File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi.py", line 108, in init_application [Mon Oct 22 14:04:06.902875 2018] [:error] [pid 16] [remote 192.168.172.18:84] db_api.configure(conf.CONF) [Mon Oct 22 14:04:06.902889 2018] [:error] [pid 16] [remote 192.168.172.18:84] File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/db_api.py", line 35, in configure [Mon Oct 22 14:04:06.902915 2018] [:error] [pid 16] [remote 192.168.172.18:84] **_get_db_conf(conf.placement_database)) [Mon Oct 22 14:04:06.902928 2018] [:error] [pid 16] [remote 192.168.172.18:84] File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 788, in configure [Mon Oct 22 14:04:06.902955 2018] [:error] [pid 16] [remote 192.168.172.18:84] self._factory.configure(**kw) [Mon Oct 22 14:04:06.902968 2018] [:error] [pid 16] [remote 192.168.172.18:84] File "/usr/lib/python2.7/site-packages/debtcollector/renames.py", line 43, in decorator [Mon Oct 22 14:04:06.903001 2018] [:error] [pid 16] [remote 192.168.172.18:84] return wrapped(*args, **kwargs) [Mon Oct 22 14:04:06.903014 2018] [:error] [pid 16] [remote 192.168.172.18:84] File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 312, in configure [Mon Oct 22 14:04:06.903039 2018] [:error] [pid 16] [remote 192.168.172.18:84] self._configure(False, kw) [Mon Oct 22 14:04:06.903051 2018] [:error] [pid 16] [remote 192.168.172.18:84] File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 317, in _configure [Mon Oct 22 14:04:06.903082 2018] [:error] [pid 16] [remote 192.168.172.18:84] raise TypeError("this TransactionFactory is already started") [Mon Oct 22 14:04:06.903105 2018] [:error] [pid 16] [remote 192.168.172.18:84] TypeError: this TransactionFactory is already started Controller 0 seems to be fine. I see no indication that this would be related to the computes being un-upgraded. Did you try restarting the httpd container on all controllers? Is it possible that you didn't, but that happened as a result of doing the compute upgrade? Compute upgrade doesn't touches controller nodes. I cannot find httpd container on controller nodes, could U please elaborate which one should I try restarting next time? (tried nova_metadata nova_api nova_conductor nova_vnc_proxy nova_consoleauth nova_api_cron nova_scheduler nova_placement) DB Errors most likely are related to controller upgrades(nodes are stopped and upgraded one-by-one) I'll try to get in touch with U next time I have env handy. Thanks! These logs are from sosreport-failed-placemant-boot-upg-controller-1-20181022135145/var/log/containers/httpd/nova-placement/placement_wsgi_error.log So I would expect that whatever container you need to restart to bump those services would be relevant. The errors in those logs are definitely related to why scheduler gets 500s from placement, and seem to indicate that placement on those controllers either can't talk to the database, or is talking to the wrong one. Looking at the placement access log on those controllers, many requests were made, all of which failed with 500. So I don't think these errors were just transient during the upgrade -- there is no indication that those placement services worked at all. (In reply to Dan Smith from comment #4) > "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/db_api.py", > line 35, in configure > [Mon Oct 22 14:04:06.902915 2018] [:error] [pid 16] [remote > 192.168.172.18:84] **_get_db_conf(conf.placement_database)) > [Mon Oct 22 14:04:06.902928 2018] [:error] [pid 16] [remote > 192.168.172.18:84] File > "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line > 788, in configure > [Mon Oct 22 14:04:06.902955 2018] [:error] [pid 16] [remote > 192.168.172.18:84] self._factory.configure(**kw) > [Mon Oct 22 14:04:06.902968 2018] [:error] [pid 16] [remote > 192.168.172.18:84] File > "/usr/lib/python2.7/site-packages/debtcollector/renames.py", line 43, in > decorator > [Mon Oct 22 14:04:06.903001 2018] [:error] [pid 16] [remote > 192.168.172.18:84] return wrapped(*args, **kwargs) > [Mon Oct 22 14:04:06.903014 2018] [:error] [pid 16] [remote > 192.168.172.18:84] File > "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line > 312, in configure > [Mon Oct 22 14:04:06.903039 2018] [:error] [pid 16] [remote > 192.168.172.18:84] self._configure(False, kw) > [Mon Oct 22 14:04:06.903051 2018] [:error] [pid 16] [remote > 192.168.172.18:84] File > "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line > 317, in _configure > [Mon Oct 22 14:04:06.903082 2018] [:error] [pid 16] [remote > 192.168.172.18:84] raise TypeError("this TransactionFactory is already > started") > [Mon Oct 22 14:04:06.903105 2018] [:error] [pid 16] [remote > 192.168.172.18:84] TypeError: this TransactionFactory is already started This looks like a dup of bug 1630069 to me, which is caused by starting placement before db_sync completes. There are 2 different patches in progress for that. *** This bug has been marked as a duplicate of bug 1630069 *** |