Bug 1766133

Summary: DB connection errors on three separate deployments
Product: Red Hat OpenStack Reporter: Tzach Shefi <tshefi>
Component: puppet-mysqlAssignee: RHOS Maint <rhos-maint>
Status: CLOSED NOTABUG QA Contact: nlevinki <nlevinki>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 13.0 (Queens)CC: acanan, jjoyce, jschluet, lmiccini, slinaber, tvignaud
Target Milestone: ---Keywords: Regression, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-28 13:39:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1714805    
Attachments:
Description Flags
Example Nova and Cinder logs showing sql connection/route problems. none

Description Tzach Shefi 2019-10-28 11:48:27 UTC
Created attachment 1629746 [details]
Example Nova and Cinder logs showing sql connection/route problems.

Description of problem: On three deployments keep hitting issues, for example when I try to boot an instance from an ansible script:


fatal: [localhost]: FAILED! => {"changed": false, "extra_data": null, "msg": "Error fetching flavor list: Server Error for url: https://10.0.0.101:13774/v2.1/flavors/detail?is_public=None, Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.\n<class 'oslo_db.exception.DBConnectionError'>"}


Version-Release number of selected component (if applicable):
collectd-mysql-5.8.1-4.el7ost.x86_64
puppet-mysql-5.2.2-0.20180216012141.a5497b2.el7ost.noarch
python2-sqlalchemy-1.2.2-2.el7ost.x86_64
python-sqlalchemy-utils-0.31.3-2.el7ost.noarch
python-sqlparse-0.1.18-5.el7ost.noarch
sqlite-3.7.17-8.el7.x86_64
postgresql-libs-9.2.24-1.el7_5.x86_64


How reproducible:
I've hit SQL connection problems on three separate deployments of this same puddle 13  -p 2019-10-23.1. 

Steps to Reproduce:
1. System is sluggish commands like Nova boot or before this volume attach fail
2. Rebooting whole host (virt deployment) doesn't help. 
3.

Actual results:
Nova boot failed  example logs:

2019-10-28 11:34:05.756 16 ERROR nova.api.openstack     self._get_server_information()
2019-10-28 11:34:05.756 16 ERROR nova.api.openstack   File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1245, in _get_server_information
2019-10-28 11:34:05.756 16 ERROR nova.api.openstack     packet = self._read_packet()
2019-10-28 11:34:05.756 16 ERROR nova.api.openstack   File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 987, in _read_packet
2019-10-28 11:34:05.756 16 ERROR nova.api.openstack     packet_header = self._read_bytes(4)
2019-10-28 11:34:05.756 16 ERROR nova.api.openstack   File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1033, in _read_bytes
2019-10-28 11:34:05.756 16 ERROR nova.api.openstack     CR.CR_SERVER_LOST, "Lost connection to MySQL server during query")
2019-10-28 11:34:05.756 16 ERROR nova.api.openstack DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
2019-10-28 11:34:05.756 16 ERROR nova.api.openstack 

2019-10-28 11:42:14.879 15 ERROR oslo_db.sqlalchemy.engines   File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1044, in _write_bytes
2019-10-28 11:42:14.879 15 ERROR oslo_db.sqlalchemy.engines     "MySQL server has gone away (%r)" % (e,))
2019-10-28 11:42:14.879 15 ERROR oslo_db.sqlalchemy.engines DBConnectionError: (pymysql.err.OperationalError) (2006, "MySQL server has gone away (error(110, 'Connection timed out'))") [SQL: u'SELECT 1'] (Background on this error at: http://sqlalche.me/e/e3q8)
2019-10-28 11:42:14.879 15 ERROR oslo_db.sqlalchemy.engines 

Cinder also reports sql errors:

/var/log/containers/cinder/cinder-manage.log:130:2019-10-28 00:27:20.051 1044 WARNING oslo_db.sqlalchemy.engines [req-0357d875-f06d-420a-8975-86615998ab01 - - - - -] SQL connection failed. -132 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.13' ([Errno 113] No route to host)") (Background on this error at: http://sqlalche.me/e/e3q8)


Expected results:
Instance should boot up. 

Additional info:

Comment 4 Tzach Shefi 2019-10-28 13:39:24 UTC
Sorry for fire-drill as suggested/negelected to check contiller had limited resources. 
Once I'd add RAM/CPU SQL problems vanished. 

Sorry folks.