Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1549598

Summary: [RFE] Configure DB Lock Wait Timeout via Heat Templates
Product: Red Hat OpenStack Reporter: Benjamin Schmaus <bschmaus>
Component: openstack-cinderAssignee: Cinder Bugs List <cinder-bugs>
Status: CLOSED DUPLICATE QA Contact: Avi Avraham <aavraham>
Severity: medium Docs Contact: Kim Nylander <knylande>
Priority: unspecified    
Version: 10.0 (Newton)CC: aschultz, mbayer, mburns, rhel-osp-director-maint, srevivo
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-12 00:46:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1464146    
Bug Blocks:    

Description Benjamin Schmaus 2018-02-27 13:31:06 UTC
Description of problem:

Once we backup 18+ cinder volumes via cinder-backup using the NFS driver, some backups go into error state with "Lost connection to MySQL server during query"

This is the stack trace we get from cinder-backup:

2018-02-14 09:43:20.850 700318 ERROR cinder.service [req-0c45d535-92fa-4305-87c6-7f7a03ea38a3 923b1c6a759a4077957f1421287bf2c2 0e33b539580847aca4b88f9fd7391515 - default default] model server went away
2018-02-14 09:43:20.850 700318 ERROR cinder.service Traceback (most recent call last):
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib/python2.7/site-packages/cinder/service.py", line 456, in report_state
2018-02-14 09:43:20.850 700318 ERROR cinder.service     service_ref.save()
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib/python2.7/site-packages/cinder/objects/service.py", line 150, in save
2018-02-14 09:43:20.850 700318 ERROR cinder.service     db.service_update(self._context, self.id, updates)
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib/python2.7/site-packages/cinder/db/api.py", line 140, in service_update
2018-02-14 09:43:20.850 700318 ERROR cinder.service     return IMPL.service_update(context, service_id, values)
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib/python2.7/site-packages/cinder/db/sqlalchemy/api.py", line 178, in wrapper
2018-02-14 09:43:20.850 700318 ERROR cinder.service     return f(*args, **kwargs)
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib/python2.7/site-packages/cinder/db/sqlalchemy/api.py", line 236, in wrapped
2018-02-14 09:43:20.850 700318 ERROR cinder.service     return f(*args, **kwargs)
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib/python2.7/site-packages/cinder/db/sqlalchemy/api.py", line 529, in service_update
2018-02-14 09:43:20.850 700318 ERROR cinder.service     result = query.update(values)
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 3176, in update
2018-02-14 09:43:20.850 700318 ERROR cinder.service     update_op.exec_()
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 1127, in exec_
2018-02-14 09:43:20.850 700318 ERROR cinder.service     self._do_exec()
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 1282, in _do_exec
2018-02-14 09:43:20.850 700318 ERROR cinder.service     mapper=self.mapper)
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 1046, in execute
2018-02-14 09:43:20.850 700318 ERROR cinder.service     bind, close_with_result=True).execute(clause, params or {})
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 912, in _connection_for_bind
2018-02-14 09:43:20.850 700318 ERROR cinder.service     conn = engine.contextual_connect(**kw)
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2039, in contextual_connect
2018-02-14 09:43:20.850 700318 ERROR cinder.service     self._wrap_pool_connect(self.pool.connect, None),
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2078, in _wrap_pool_connect
2018-02-14 09:43:20.850 700318 ERROR cinder.service     e, dialect, self)
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1401, in _handle_dbapi_exception_noconnection
2018-02-14 09:43:20.850 700318 ERROR cinder.service     util.raise_from_cause(newraise, exc_info)
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 200, in raise_from_cause
2018-02-14 09:43:20.850 700318 ERROR cinder.service     reraise(type(exception), exception, tb=exc_tb)
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2074, in _wrap_pool_connect
2018-02-14 09:43:20.850 700318 ERROR cinder.service     return fn()
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 376, in connect
2018-02-14 09:43:20.850 700318 ERROR cinder.service     return _ConnectionFairy._checkout(self)
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 713, in _checkout
2018-02-14 09:43:20.850 700318 ERROR cinder.service     fairy = _ConnectionRecord.checkout(pool)
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 480, in checkout
2018-02-14 09:43:20.850 700318 ERROR cinder.service     rec = pool._do_get()
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 1060, in _do_get
2018-02-14 09:43:20.850 700318 ERROR cinder.service     self._dec_overflow()
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
2018-02-14 09:43:20.850 700318 ERROR cinder.service     compat.reraise(exc_type, exc_value, exc_tb)
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 1057, in _do_get
2018-02-14 09:43:20.850 700318 ERROR cinder.service     return self._create_connection()
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 323, in _create_connection
2018-02-14 09:43:20.850 700318 ERROR cinder.service     return _ConnectionRecord(self)
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 449, in __init__
2018-02-14 09:43:20.850 700318 ERROR cinder.service     self.connection = self.__connect()
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 607, in __connect
2018-02-14 09:43:20.850 700318 ERROR cinder.service     connection = self.__pool._invoke_creator(self)
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 97, in connect
2018-02-14 09:43:20.850 700318 ERROR cinder.service     return dialect.connect(*cargs, **cparams)
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 385, in connect
2018-02-14 09:43:20.850 700318 ERROR cinder.service     return self.dbapi.connect(*cargs, **cparams)
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib/python2.7/site-packages/pymysql/__init__.py", line 90, in Connect
2018-02-14 09:43:20.850 700318 ERROR cinder.service     return Connection(*args, **kwargs)
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 694, in __init__
2018-02-14 09:43:20.850 700318 ERROR cinder.service     self.connect()
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 916, in connect
2018-02-14 09:43:20.850 700318 ERROR cinder.service     self._request_authentication()
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1124, in _request_authentication
2018-02-14 09:43:20.850 700318 ERROR cinder.service     auth_packet = self._read_packet()
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 971, in _read_packet
2018-02-14 09:43:20.850 700318 ERROR cinder.service     packet_header = self._read_bytes(4)
2018-02-14 09:43:20.850 700318 ERROR cinder.service   File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1008, in _read_bytes
2018-02-14 09:43:20.850 700318 ERROR cinder.service     2013, "Lost connection to MySQL server during query")

The resolution was to increase the lock wait timeout in Mariadb.  However there does not seem to be a way to set this value via heat templates and/or hiera.  

The value is not set on a deployment and thus defaults to 10 seconds as reported by customer.  However in large scaling events like the cinder backups above or cinder volume creates 10 seconds is not enough given there will be a number of processes in parallel all waiting to get a lock on a given table.

We need a way to configure the lock wait timeout on deployment and update.


Version-Release number of selected component (if applicable):
OSP10

How reproducible:
See attached script

Steps to Reproduce:
1.Run attached script
2.
3.

Actual results:
DB connections go away

Expected results:
Connections should not timeout

Additional info:

Comment 5 Michael Bayer 2018-03-10 15:22:58 UTC
Hi Benjamin -

the issue you are having is fixed and is not a matter of increasing a timeout, it is due to Cinder's architecture.   This is a dupe of bz#1464146

Comment 6 Michael Bayer 2018-03-10 15:25:33 UTC
note the stack trace shown here is identical to the one shown in bz#1464146, so the MySQL variable in question is the server side "connect_timeout" but that doesn't really fix the underlying problem.

Comment 7 Michael Bayer 2018-03-10 15:30:19 UTC
also note there is no way that innodb_lock_wait_timeout could resolve the stack trace shown here.   lock_wait_timeout defaults to 50 seconds, not 10, whereas the connect_timeout is a 10 second by default timeout.   connect_timeout applies to the initial server handshake which the trace on this issue shows is where the drop is occurring.   the innodb_lock_wait_timeout setting has to do with SQL queries and the error would be "ERROR 1205 (HY000): Lock wait timeout exceeded;".

Comment 11 Benjamin Schmaus 2018-03-12 00:46:51 UTC

*** This bug has been marked as a duplicate of bug 1464146 ***