Hide Forgot
+++ This bug was initially created as a clone of Bug #1373105 +++ Description of problem: Upstream bug https://bugs.launchpad.net/heat/+bug/1499669 Heat stucks in Delete_IN_PROGRESS in case of specific creation/deletion sequence of a stack. Version-Release number of selected component (if applicable): RHOSP 10 - Newton How reproducible: In specific Steps to Reproduce: 1.Create the stack: rg.yaml: heat_template_version: 2013-05-23 resources: rg: type: OS::Heat::ResourceGroup properties: count: 125 resource_def: type: rand_str.yaml rand_str.yaml: heat_template_version: 2013-05-23 resources: (yep, without any resource:) ) 2. Run the stack: heat stack-create abc --template-file rg.yaml wait about 20-30 sec and run: heat stack-delete abc (before stack will be in CREATE_COMPLETE) Actual results: 2015-09-25 12:59:15.450 ERROR heat.engine.resource [-] DB error Not found or 2015-09-25 13:04:52.109 ERROR heat.engine.resource [-] DB error This result object does not return rows. It has been closed automatically. 2015-09-25 13:04:52.110 ERROR sqlalchemy.pool.QueuePool [-] Exception during reset or similar 2015-09-25 13:04:52.110 TRACE sqlalchemy.pool.QueuePool Traceback (most recent call last): 2015-09-25 13:04:52.110 TRACE sqlalchemy.pool.QueuePool File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 636, in _finalize_fairy 2015-09-25 13:04:52.110 TRACE sqlalchemy.pool.QueuePool fairy._reset(pool) 2015-09-25 13:04:52.110 TRACE sqlalchemy.pool.QueuePool File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 776, in _reset 2015-09-25 13:04:52.110 TRACE sqlalchemy.pool.QueuePool pool._dialect.do_rollback(self) 2015-09-25 13:04:52.110 TRACE sqlalchemy.pool.QueuePool File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/dialects/mysql/base.py", line 2519, in do_rollback 2015-09-25 13:04:52.110 TRACE sqlalchemy.pool.QueuePool dbapi_connection.rollback() 2015-09-25 13:04:52.110 TRACE sqlalchemy.pool.QueuePool File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 711, in rollback 2015-09-25 13:04:52.110 TRACE sqlalchemy.pool.QueuePool self._read_ok_packet() 2015-09-25 13:04:52.110 TRACE sqlalchemy.pool.QueuePool File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 687, in _read_ok_packet 2015-09-25 13:04:52.110 TRACE sqlalchemy.pool.QueuePool raise err.OperationalError(2014, "Command Out of Sync") 2015-09-25 13:04:52.110 TRACE sqlalchemy.pool.QueuePool OperationalError: (2014, 'Command Out of Sync') And: 2016-06-04 02:51:54.091 3205 INFO heat.engine.stack [req-d3d7848b-ea6e-4557-8759-37626e7f5f63 demo demo - default default] Stopped due to GreenletExit() in create Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 457, in fire_timers timer() File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line 58, in __call__ cb(*args, **kw) File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 217, in main self._resolve_links() File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 232, in _resolve_links f(self, *ca, **ckw) File "/opt/stack/new/heat/heat/engine/service.py", line 187, in release stack.persist_state_and_release_lock(lock.engine_id) File "/opt/stack/new/heat/heat/engine/stack.py", line 913, in persist_state_and_release_lock stack = stack_object.Stack.get_by_id(self.context, self.id) File "/opt/stack/new/heat/heat/objects/stack.py", line 85, in get_by_id db_stack = db_api.stack_get(context, stack_id, **kwargs) File "/opt/stack/new/heat/heat/db/api.py", line 146, in stack_get eager_load=eager_load) File "/opt/stack/new/heat/heat/db/sqlalchemy/api.py", line 401, in stack_get result = query.get(stack_id) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 831, in get return self._get_impl(ident, loading.load_on_ident) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 864, in _get_impl return fallback_fn(self, key) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/loading.py", line 219, in load_on_ident return q.one() File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2718, in one ret = list(self) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2761, in __iter__ return self._execute_and_instances(context) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2774, in _execute_and_instances close_with_result=True) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2765, in _connection_from_session **kw) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 893, in connection execution_options=execution_options) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 900, in _connection_for_bind conn = engine.contextual_connect(**kw) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2041, in contextual_connect **kwargs) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 92, in __init__ self.dispatch.engine_connect(self, self.__branch) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/event/attr.py", line 256, in __call__ fn(*args, **kw) File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/engines.py", line 72, in _connect_ping_listener connection.scalar(select([1])) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 844, in scalar return self.execute(object, *multiparams, **params).scalar() File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/result.py", line 1066, in scalar row = self.first() File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/result.py", line 1040, in first return self._non_result(None) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/result.py", line 922, in _non_result "This result object does not return rows. " ResourceClosedError: This result object does not return rows. It has been closed automatically. 2016-06-04 02:51:54.129 3205 ERROR sqlalchemy.pool.QueuePool [-] Exception during reset or similar 2016-06-04 02:51:54.129 3205 ERROR sqlalchemy.pool.QueuePool Traceback (most recent call last): 2016-06-04 02:51:54.129 3205 ERROR sqlalchemy.pool.QueuePool File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 636, in _finalize_fairy 2016-06-04 02:51:54.129 3205 ERROR sqlalchemy.pool.QueuePool fairy._reset(pool) 2016-06-04 02:51:54.129 3205 ERROR sqlalchemy.pool.QueuePool File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 776, in _reset 2016-06-04 02:51:54.129 3205 ERROR sqlalchemy.pool.QueuePool pool._dialect.do_rollback(self) 2016-06-04 02:51:54.129 3205 ERROR sqlalchemy.pool.QueuePool File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/dialects/mysql/base.py", line 2542, in do_rollback 2016-06-04 02:51:54.129 3205 ERROR sqlalchemy.pool.QueuePool dbapi_connection.rollback() 2016-06-04 02:51:54.129 3205 ERROR sqlalchemy.pool.QueuePool File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 772, in rollback 2016-06-04 02:51:54.129 3205 ERROR sqlalchemy.pool.QueuePool self._read_ok_packet() 2016-06-04 02:51:54.129 3205 ERROR sqlalchemy.pool.QueuePool File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 748, in _read_ok_packet 2016-06-04 02:51:54.129 3205 ERROR sqlalchemy.pool.QueuePool raise err.OperationalError(2014, "Command Out of Sync") 2016-06-04 02:51:54.129 3205 ERROR sqlalchemy.pool.QueuePool OperationalError: (2014, 'Command Out of Sync') 2016-06-04 02:51:54.129 3205 ERROR sqlalchemy.pool.QueuePool Exception AssertionError: AssertionError('do not call blocking functions from the mainloop',) in <function <lambda> at 0x7f2b1241b6e0> ignored Expected results: Heat deletes the stack. No exceptions. Additional info: Reproduced also in RH-OSP8. ... --- Additional comment from Zane Bitter on 2016-09-21 21:19:09 EDT --- OK, there are 3 things going on here: 1) Database errors due to exit exceptions not being handled by sqlalchemy/PyMySQL. Mike is fixing that in sqlalchemy here: https://bitbucket.org/zzzeek/sqlalchemy/issues/3803/dbapi-connections-go-invalid-on and there's a good chance the fix will be in sqlalchemy 1.1. It's unlikely anybody is going to want to go back and use that with Liberty OpenStack though. Fortunately this isn't the biggest problem here - losing writes to the database sucks, but Heat almost always handles it pretty gracefully. 2) When we fail to start deleting the ResourceGroup nested stack, the parent stack should immediately transition to DELETE_FAILED, rather than remain DELETE_IN_PROGRESS until it times out. I've proposed a fix at https://review.openstack.org/374442 - it's trivial and will be easy to backport. 3) The *real* problem. When we cancel the in-progress creation of the ResourceGroup nested stack, there is a super-short timeout (2s by default) during which all of the tasks have to be cancelled, and if they're not cancelled by then Heat gives up and won't start the delete. On a stack this size, that just isn't long enough. You can work around this by setting the config option "engine_life_check_timeout" to a larger value (e.g. 30s). I've proposed a patch to not use that option for this particular timeout, and to use 30s instead: https://review.openstack.org/374443 - it's also very simple and will be easy to backport.
*** Bug 1347011 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2986.html