Bug 1549598
| Summary: | [RFE] Configure DB Lock Wait Timeout via Heat Templates | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Benjamin Schmaus <bschmaus> |
| Component: | openstack-cinder | Assignee: | Cinder Bugs List <cinder-bugs> |
| Status: | CLOSED DUPLICATE | QA Contact: | Avi Avraham <aavraham> |
| Severity: | medium | Docs Contact: | Kim Nylander <knylande> |
| Priority: | unspecified | ||
| Version: | 10.0 (Newton) | CC: | aschultz, mbayer, mburns, rhel-osp-director-maint, srevivo |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-03-12 00:46:51 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1464146 | ||
| Bug Blocks: | |||
Hi Benjamin - the issue you are having is fixed and is not a matter of increasing a timeout, it is due to Cinder's architecture. This is a dupe of bz#1464146 note the stack trace shown here is identical to the one shown in bz#1464146, so the MySQL variable in question is the server side "connect_timeout" but that doesn't really fix the underlying problem. also note there is no way that innodb_lock_wait_timeout could resolve the stack trace shown here. lock_wait_timeout defaults to 50 seconds, not 10, whereas the connect_timeout is a 10 second by default timeout. connect_timeout applies to the initial server handshake which the trace on this issue shows is where the drop is occurring. the innodb_lock_wait_timeout setting has to do with SQL queries and the error would be "ERROR 1205 (HY000): Lock wait timeout exceeded;". *** This bug has been marked as a duplicate of bug 1464146 *** |
Description of problem: Once we backup 18+ cinder volumes via cinder-backup using the NFS driver, some backups go into error state with "Lost connection to MySQL server during query" This is the stack trace we get from cinder-backup: 2018-02-14 09:43:20.850 700318 ERROR cinder.service [req-0c45d535-92fa-4305-87c6-7f7a03ea38a3 923b1c6a759a4077957f1421287bf2c2 0e33b539580847aca4b88f9fd7391515 - default default] model server went away 2018-02-14 09:43:20.850 700318 ERROR cinder.service Traceback (most recent call last): 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib/python2.7/site-packages/cinder/service.py", line 456, in report_state 2018-02-14 09:43:20.850 700318 ERROR cinder.service service_ref.save() 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib/python2.7/site-packages/cinder/objects/service.py", line 150, in save 2018-02-14 09:43:20.850 700318 ERROR cinder.service db.service_update(self._context, self.id, updates) 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib/python2.7/site-packages/cinder/db/api.py", line 140, in service_update 2018-02-14 09:43:20.850 700318 ERROR cinder.service return IMPL.service_update(context, service_id, values) 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib/python2.7/site-packages/cinder/db/sqlalchemy/api.py", line 178, in wrapper 2018-02-14 09:43:20.850 700318 ERROR cinder.service return f(*args, **kwargs) 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib/python2.7/site-packages/cinder/db/sqlalchemy/api.py", line 236, in wrapped 2018-02-14 09:43:20.850 700318 ERROR cinder.service return f(*args, **kwargs) 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib/python2.7/site-packages/cinder/db/sqlalchemy/api.py", line 529, in service_update 2018-02-14 09:43:20.850 700318 ERROR cinder.service result = query.update(values) 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 3176, in update 2018-02-14 09:43:20.850 700318 ERROR cinder.service update_op.exec_() 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 1127, in exec_ 2018-02-14 09:43:20.850 700318 ERROR cinder.service self._do_exec() 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 1282, in _do_exec 2018-02-14 09:43:20.850 700318 ERROR cinder.service mapper=self.mapper) 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 1046, in execute 2018-02-14 09:43:20.850 700318 ERROR cinder.service bind, close_with_result=True).execute(clause, params or {}) 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 912, in _connection_for_bind 2018-02-14 09:43:20.850 700318 ERROR cinder.service conn = engine.contextual_connect(**kw) 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2039, in contextual_connect 2018-02-14 09:43:20.850 700318 ERROR cinder.service self._wrap_pool_connect(self.pool.connect, None), 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2078, in _wrap_pool_connect 2018-02-14 09:43:20.850 700318 ERROR cinder.service e, dialect, self) 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1401, in _handle_dbapi_exception_noconnection 2018-02-14 09:43:20.850 700318 ERROR cinder.service util.raise_from_cause(newraise, exc_info) 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 200, in raise_from_cause 2018-02-14 09:43:20.850 700318 ERROR cinder.service reraise(type(exception), exception, tb=exc_tb) 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2074, in _wrap_pool_connect 2018-02-14 09:43:20.850 700318 ERROR cinder.service return fn() 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 376, in connect 2018-02-14 09:43:20.850 700318 ERROR cinder.service return _ConnectionFairy._checkout(self) 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 713, in _checkout 2018-02-14 09:43:20.850 700318 ERROR cinder.service fairy = _ConnectionRecord.checkout(pool) 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 480, in checkout 2018-02-14 09:43:20.850 700318 ERROR cinder.service rec = pool._do_get() 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 1060, in _do_get 2018-02-14 09:43:20.850 700318 ERROR cinder.service self._dec_overflow() 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__ 2018-02-14 09:43:20.850 700318 ERROR cinder.service compat.reraise(exc_type, exc_value, exc_tb) 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 1057, in _do_get 2018-02-14 09:43:20.850 700318 ERROR cinder.service return self._create_connection() 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 323, in _create_connection 2018-02-14 09:43:20.850 700318 ERROR cinder.service return _ConnectionRecord(self) 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 449, in __init__ 2018-02-14 09:43:20.850 700318 ERROR cinder.service self.connection = self.__connect() 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 607, in __connect 2018-02-14 09:43:20.850 700318 ERROR cinder.service connection = self.__pool._invoke_creator(self) 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 97, in connect 2018-02-14 09:43:20.850 700318 ERROR cinder.service return dialect.connect(*cargs, **cparams) 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 385, in connect 2018-02-14 09:43:20.850 700318 ERROR cinder.service return self.dbapi.connect(*cargs, **cparams) 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib/python2.7/site-packages/pymysql/__init__.py", line 90, in Connect 2018-02-14 09:43:20.850 700318 ERROR cinder.service return Connection(*args, **kwargs) 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 694, in __init__ 2018-02-14 09:43:20.850 700318 ERROR cinder.service self.connect() 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 916, in connect 2018-02-14 09:43:20.850 700318 ERROR cinder.service self._request_authentication() 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1124, in _request_authentication 2018-02-14 09:43:20.850 700318 ERROR cinder.service auth_packet = self._read_packet() 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 971, in _read_packet 2018-02-14 09:43:20.850 700318 ERROR cinder.service packet_header = self._read_bytes(4) 2018-02-14 09:43:20.850 700318 ERROR cinder.service File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1008, in _read_bytes 2018-02-14 09:43:20.850 700318 ERROR cinder.service 2013, "Lost connection to MySQL server during query") The resolution was to increase the lock wait timeout in Mariadb. However there does not seem to be a way to set this value via heat templates and/or hiera. The value is not set on a deployment and thus defaults to 10 seconds as reported by customer. However in large scaling events like the cinder backups above or cinder volume creates 10 seconds is not enough given there will be a number of processes in parallel all waiting to get a lock on a given table. We need a way to configure the lock wait timeout on deployment and update. Version-Release number of selected component (if applicable): OSP10 How reproducible: See attached script Steps to Reproduce: 1.Run attached script 2. 3. Actual results: DB connections go away Expected results: Connections should not timeout Additional info: