Description of problem: cinder-volume hangs after all controller nodes reset on Openstack deployment which consists of three controller nodes which host all the controller services including haproxy/DB/rabbitmq/cinder-volume. There is a traceback in cinder-volume log right before the hang related to DB connection lost (see below), more logs will be attached with sosreport. I tried to reproduce by stopping all galera resources and restarting cinder-volume but I did not mange to reproduce. The problem I can see constantly on OSP12 and OSP13 disaster recovery jobs. 2018-04-12 11:20:54.551 76 ERROR oslo_service.service [req-649ca7af-76e4-4fbf-8eb3-09c5dbc5eea5 - - - - -] Error starting thread.: DBConnectionError: (pymysql .err.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: u'UPDATE services SET updated_at=%(updated_at)s, replication_status=%(repl ication_status)s WHERE services.deleted = false AND services.id = %(id_1)s'] [parameters: {u'id_1': 5, 'replication_status': u'disabled', 'updated_at': dateti me.datetime(2018, 4, 12, 11, 20, 53, 708739)}] (Background on this error at: http://sqlalche.me/e/e3q8) 2018-04-12 11:20:54.551 76 ERROR oslo_service.service Traceback (most recent call last): 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 729, in run_service 2018-04-12 11:20:54.551 76 ERROR oslo_service.service service.start() 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/cinder/service.py", line 267, in start 2018-04-12 11:20:54.551 76 ERROR oslo_service.service self.manager.init_host_with_rpc() 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 556, in init_host_with_rpc 2018-04-12 11:20:54.551 76 ERROR oslo_service.service service.save() 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/cinder/objects/service.py", line 181, in save 2018-04-12 11:20:54.551 76 ERROR oslo_service.service db.service_update(self._context, self.id, updates) 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/cinder/db/api.py", line 154, in service_update 2018-04-12 11:20:54.551 76 ERROR oslo_service.service return IMPL.service_update(context, service_id, values) 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/cinder/db/sqlalchemy/api.py", line 164, in wrapper 2018-04-12 11:20:54.551 76 ERROR oslo_service.service return f(*args, **kwargs) 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 147, in wrapper 2018-04-12 11:20:54.551 76 ERROR oslo_service.service ectxt.value = e.inner_exc 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2018-04-12 11:20:54.551 76 ERROR oslo_service.service self.force_reraise() 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2018-04-12 11:20:54.551 76 ERROR oslo_service.service six.reraise(self.type_, self.value, self.tb) 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 135, in wrapper 2018-04-12 11:20:54.551 76 ERROR oslo_service.service return f(*args, **kwargs) 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/cinder/db/sqlalchemy/api.py", line 560, in service_update 2018-04-12 11:20:54.551 76 ERROR oslo_service.service result = query.update(values) 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 3369, in update 2018-04-12 11:20:54.551 76 ERROR oslo_service.service update_op.exec_() 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 1325, in exec_ 2018-04-12 11:20:54.551 76 ERROR oslo_service.service self._do_exec() 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 1492, in _do_exec 2018-04-12 11:20:54.551 76 ERROR oslo_service.service self._execute_stmt(update_stmt) 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 1332, in _execute_stmt 2018-04-12 11:20:54.551 76 ERROR oslo_service.service mapper=self.mapper) 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 1176, in execute 2018-04-12 11:20:54.551 76 ERROR oslo_service.service bind, close_with_result=True).execute(clause, params or {}) 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 948, in execute 2018-04-12 11:20:54.551 76 ERROR oslo_service.service return meth(self, multiparams, params) 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection 2018-04-12 11:20:54.551 76 ERROR oslo_service.service return connection._execute_clauseelement(self, multiparams, params) 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement 2018-04-12 11:20:54.551 76 ERROR oslo_service.service compiled_sql, distilled_params 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context 2018-04-12 11:20:54.551 76 ERROR oslo_service.service context) 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1409, in _handle_dbapi_exception 2018-04-12 11:20:54.551 76 ERROR oslo_service.service util.raise_from_cause(newraise, exc_info) 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause 2018-04-12 11:20:54.551 76 ERROR oslo_service.service reraise(type(exception), exception, tb=exc_tb, cause=cause) 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context 2018-04-12 11:20:54.551 76 ERROR oslo_service.service context) 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 507, in do_execute 2018-04-12 11:20:54.551 76 ERROR oslo_service.service cursor.execute(statement, parameters) 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 166, in execute 2018-04-12 11:20:54.551 76 ERROR oslo_service.service result = self._query(query) 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 322, in _query 2018-04-12 11:20:54.551 76 ERROR oslo_service.service conn.query(q) 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 856, in query 2018-04-12 11:20:54.551 76 ERROR oslo_service.service self._affected_rows = self._read_query_result(unbuffered=unbuffered) 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1057, in _read_query_result 2018-04-12 11:20:54.551 76 ERROR oslo_service.service result.read() 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1340, in read 2018-04-12 11:20:54.551 76 ERROR oslo_service.service first_packet = self.connection._read_packet() 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 987, in _read_packet 2018-04-12 11:20:54.551 76 ERROR oslo_service.service packet_header = self._read_bytes(4) 2018-04-12 11:20:54.551 76 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1033, in _read_bytes 2018-04-12 11:20:54.551 76 ERROR oslo_service.service CR.CR_SERVER_LOST, "Lost connection to MySQL server during query") 2018-04-12 11:20:54.551 76 ERROR oslo_service.service DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: u'UPDATE services SET updated_at=%(updated_at)s, replication_status=%(replication_status)s WHERE servi Version-Release number of selected component (if applicable): openstack-cinder-12.0.1-0.20180326201852.46c4ec1.el7ost.noarch How reproducible: Often Steps to Reproduce: 1. Deploy OSP with 3 controllers hosting openstack services including cinder-volume and DB/Messaging 2. Reset ungracefully all the controller nodes at once Actual results: cinder-volumes hangs at start up 2018-04-12 11:20:54.582 76 DEBUG oslo_concurrency.lockutils [req-649ca7af-76e4-4fbf-8eb3-09c5dbc5eea5 - - - - -] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212 2018-04-12 11:20:54.582 76 DEBUG oslo_concurrency.lockutils [req-649ca7af-76e4-4fbf-8eb3-09c5dbc5eea5 - - - - -] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:228 2018-04-12 11:21:24.583 76 WARNING oslo_messaging.server [req-649ca7af-76e4-4fbf-8eb3-09c5dbc5eea5 - - - - -] Possible hang: wait is waiting for stop to complete Additional info: The hangs happened on controller-0 in attached logs.
Still under investigation, updating flags to target 13z.
I wonder if this BZ is a duplicate based on bug #1640388 comment #19. A fix for that issue was released in OSP-13z7 (bug #1693615).
I'm confident of the assessment I stated in comment #14. If the problem is observed again then this BZ can be reopened.