Bug 1307125
Summary: | Heat engine crashes during osp-d update from 7.1 and from 7.0 to 7.3 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Toure Dunnon <tdunnon> | ||||||||
Component: | openstack-heat | Assignee: | Steve Baker <sbaker> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Amit Ugol <augol> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 7.0 (Kilo) | CC: | adahms, athomas, jcoufal, mburns, ohochman, rhel-osp-director-maint, sasha, sbaker, sclewis, shardy, yeylon, zbitter | ||||||||
Target Milestone: | async | Keywords: | Reopened, ZStream | ||||||||
Target Release: | 7.0 (Kilo) | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | openstack-heat-2015.1.2-10.el7ost | Doc Type: | Known Issue | ||||||||
Doc Text: |
If the database is not available when heat-engine starts, heat-engine fails to start, preventing heat-engine from running under certain circumstances. This occurs after you update a machine on which both the database and heat-engine are running using yum alone such as when updating the undercloud. As a workaround, heat-engine must be explicitly started by running 'systemctl start openstack-heat-engine.service'. You can confirm whether heat-engine is running by running 'systemctl status openstack-heat-engine.service'.
|
Story Points: | --- | ||||||||
Clone Of: | |||||||||||
: | 1308576 (view as bug list) | Environment: | |||||||||
Last Closed: | 2016-03-10 17:07:10 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1308576 | ||||||||||
Attachments: |
|
Description
Toure Dunnon
2016-02-12 18:45:00 UTC
Created attachment 1123619 [details]
heat engine log backtrace
*** Bug 1307124 has been marked as a duplicate of this bug. *** Reproduced on 7.0 and 7.1: 7.0: Feb 12 12:01:49 Installed: openstack-heat-engine-2015.1.0-4.el7ost.noarch Feb 12 13:27:40 Updated: openstack-heat-engine-2015.1.2-9.el7ost.noarch 7.1: Feb 12 11:45:20 Installed: openstack-heat-engine-2015.1.1-5.el7ost.noarch Feb 12 13:16:10 Updated: openstack-heat-engine-2015.1.2-9.el7ost.noarch During the yum update the openstack-heat-engine service fails. The W/A is to start it. Created attachment 1123623 [details]
7.1 heat logs
Created attachment 1123624 [details]
7.0 heat logs
2016-02-12 13:18:46.487 13936 CRITICAL heat-engine [req-33db6d70-974c-4210-8311-b85ba49fb031 - -] DBConnectionError: (_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away') [SQL: u'SELECT watch _rule.created_at AS watch_rule_created_at, watch_rule.updated_at AS watch_rule_updated_at, watch_rule.id AS watch_rule_id, watch_rule.name AS watch_rule_name, watch_rule.rule AS watch_rule_rule, watch_rule.state AS watch_rule_state, watch_rule.last_evaluated AS watch_rule_last_evaluated, watch_rule.stack_id AS watch_rule_stack_id \nFROM watch_rule \nWHERE watch_rule.stack_id = %s'] [parameters: ('fda3526e-7dbe-4f23-bb3 d-a1b8b3bd0fa8',)] 2016-02-12 13:18:46.487 13936 TRACE heat-engine Traceback (most recent call last): 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/bin/heat-engine", line 83, in <module> 2016-02-12 13:18:46.487 13936 TRACE heat-engine srv.create_periodic_tasks() 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 105, in wrapper 2016-02-12 13:18:46.487 13936 TRACE heat-engine return f(*args, **kwargs) 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/lib/python2.7/site-packages/heat/engine/service.py", line 332, in create_periodic_tasks 2016-02-12 13:18:46.487 13936 TRACE heat-engine self.stack_watch.start_watch_task(s.id, admin_context) 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/lib/python2.7/site-packages/heat/engine/service_stack_watch.py", line 57, in start_watch_task 2016-02-12 13:18:46.487 13936 TRACE heat-engine if stack_has_a_watchrule(stack_id): 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/lib/python2.7/site-packages/heat/engine/service_stack_watch.py", line 52, in stack_has_a_watchrule 2016-02-12 13:18:46.487 13936 TRACE heat-engine if stack_has_a_watchrule(child.id): 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/lib/python2.7/site-packages/heat/engine/service_stack_watch.py", line 52, in stack_has_a_watchrule 2016-02-12 13:18:46.487 13936 TRACE heat-engine if stack_has_a_watchrule(child.id): 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/lib/python2.7/site-packages/heat/engine/service_stack_watch.py", line 52, in stack_has_a_watchrule 2016-02-12 13:18:46.487 13936 TRACE heat-engine if stack_has_a_watchrule(child.id): 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/lib/python2.7/site-packages/heat/engine/service_stack_watch.py", line 36, in stack_has_a_watchrule 2016-02-12 13:18:46.487 13936 TRACE heat-engine wrs = watch_rule_object.WatchRule.get_all_by_stack(cnxt, sid) 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/lib/python2.7/site-packages/heat/objects/watch_rule.py", line 77, in get_all_by_stack 2016-02-12 13:18:46.487 13936 TRACE heat-engine stack_id)] 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/lib/python2.7/site-packages/heat/db/api.py", line 248, in watch_rule_get_all_by_stack 2016-02-12 13:18:46.487 13936 TRACE heat-engine return IMPL.watch_rule_get_all_by_stack(context, stack_id) 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/lib/python2.7/site-packages/heat/db/sqlalchemy/api.py", line 718, in watch_rule_get_all_by_stack 2016-02-12 13:18:46.487 13936 TRACE heat-engine context, models.WatchRule).filter_by(stack_id=stack_id).all() 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2399, in all 2016-02-12 13:18:46.487 13936 TRACE heat-engine return list(self) 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2516, in __iter__ 2016-02-12 13:18:46.487 13936 TRACE heat-engine return self._execute_and_instances(context) 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2531, in _execute_and_instances 2016-02-12 13:18:46.487 13936 TRACE heat-engine result = conn.execute(querycontext.statement, self._params) 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute 2016-02-12 13:18:46.487 13936 TRACE heat-engine return meth(self, multiparams, params) 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection 2016-02-12 13:18:46.487 13936 TRACE heat-engine return connection._execute_clauseelement(self, multiparams, params) 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement 2016-02-12 13:18:46.487 13936 TRACE heat-engine compiled_sql, distilled_params 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context 2016-02-12 13:18:46.487 13936 TRACE heat-engine context) 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1337, in _handle_dbapi_exception 2016-02-12 13:18:46.487 13936 TRACE heat-engine util.raise_from_cause(newraise, exc_info) 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause 2016-02-12 13:18:46.487 13936 TRACE heat-engine reraise(type(exception), exception, tb=exc_tb) 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context 2016-02-12 13:18:46.487 13936 TRACE heat-engine context) 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute 2016-02-12 13:18:46.487 13936 TRACE heat-engine cursor.execute(statement, parameters) 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/lib64/python2.7/site-packages/MySQLdb/cursors.py", line 174, in execute 2016-02-12 13:18:46.487 13936 TRACE heat-engine self.errorhandler(self, exc, value) 2016-02-12 13:18:46.487 13936 TRACE heat-engine File "/usr/lib64/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler 2016-02-12 13:18:46.487 13936 TRACE heat-engine raise errorclass, errorvalue 2016-02-12 13:18:46.487 13936 TRACE heat-engine DBConnectionError: (_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away') [SQL: u'SELECT watch_rule.created_at AS watch_rule_created_at, watch_rule.updated_at AS watch_rule_updated_at, watch_rule.id AS watch_rule_id, watch_rule.name AS watch_rule_name, watch_rule.rule AS watch_rule_rule, watch_rule.state AS watch_rule_state, watch_rule.last_evaluated AS watch_rule_last_evaluated, watch_rule.stack_id AS watch_rule_stack_id \nFROM watch_rule \nWHERE watch_rule.stack_id = %s'] [parameters: ('fda3526e-7dbe-4f23-bb3d-a1b8b3bd0fa8',)] 2016-02-12 13:18:46.487 13936 TRACE heat-engine 2016-02-12 13:18:46.611 13995 INFO heat.openstack.common.service [-] Parent process has died unexpectedly, exiting 2016-02-12 13:18:46.611 13993 INFO heat.openstack.common.service [-] Parent process has died unexpectedly, exiting 2016-02-12 13:18:46.612 13995 DEBUG heat.engine.service [-] Attempting to stop engine service... _stop_rpc_server /usr/lib/python2.7/site-packages/heat/engine/service.py:373 2016-02-12 13:18:46.612 13993 DEBUG heat.engine.service [-] Attempting to stop engine service... _stop_rpc_server /usr/lib/python2.7/site-packages/heat/engine/service.py:373 2016-02-12 13:18:46.613 13992 INFO heat.openstack.common.service [-] Parent process has died unexpectedly, exiting Could be just bad timing with the database going down just as Heat is starting up maybe? Doctext supplied for the 7.3 release. I would like to explain how does it looks from the user side - and why IMO it can cause a very bad user experience . [stack@instack ~]$ date Mon Feb 15 12:31:57 EST 2016 [stack@instack ~]$ yes "" | openstack overcloud update stack overcloud -i --templates -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml ... ...hangs ...hangs ...hangs ... ...10 minutes later fails on heat timeout .. That is the output :(non informative message ): ------------------- "ERROR: openstack ERROR: Timed out waiting for a reply to message ID aeeb27a4937d4825be6de2e5a6c57dbe looking at the heat log you don't see anything meaningful, even worse you see heat problem to connect to DB (you might think it's connectivity issue) : 2016-02-12 13:18:46.487 13936 CRITICAL heat-engine [req-33db6d70-974c-4210-8311-b85ba49fb031 - -] DBConnectionError: (_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away') [SQL: u'SELECT watch Only after digging enough into /var/log/messages - if you're lucky you'll notice the heat-engine service failed during the yum update (in messages it fails without a trace-back - so it hard to spot it) [root@instack ~]# openstack-status | grep heat openstack-heat-api: active openstack-heat-api-cfn: active openstack-heat-api-cloudwatch: active openstack-heat-engine: failed Warning keystonerc not sourced then apply workaround.. I'll look into fixing this by moving create_periodic_tasks to its own thread where it can retry until the db is available. This will allow systemd.notify to be called even when the db is down. Here is a proposed upstream fix, will backport now and test heat's behaviour without a database. This downstream change allowed heat-engine to start with a stopped mariadb, and then recover once mariadb was started 90 minutes later. Workaround available, not a blocker for z4. Though, I asked Lon if we can add the backported patch to the latest build for z4 if all possible. -- Jarda Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0435.html |