1468714 – ERROR [sqlalchemy.pool.QueuePool] raised and services lost connection to MySQL server during query

Bug 1468714 - ERROR [sqlalchemy.pool.QueuePool] raised and services lost connection to MySQL server during query

Summary: ERROR [sqlalchemy.pool.QueuePool] raised and services lost connection to MySQ...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	python-sqlalchemy
Sub Component:
Version:	11.0 (Ocata)
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Michael Bayer
QA Contact:	Shai Revivo
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-07-07 17:15 UTC by Yuri Obshansky
Modified:	2017-07-24 13:08 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-07-24 13:08:02 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
message log file from controller 0 (5.31 MB, application/x-gzip) 2017-07-07 17:17 UTC, Yuri Obshansky	no flags	Details
message log file from controller 1 (4.15 MB, application/x-gzip) 2017-07-07 17:18 UTC, Yuri Obshansky	no flags	Details
message log file from controller 2 (4.25 MB, application/x-gzip) 2017-07-07 17:18 UTC, Yuri Obshansky	no flags	Details
slow query controller 0 (7.16 KB, text/plain) 2017-07-07 17:19 UTC, Yuri Obshansky	no flags	Details
slow query controller 1 (1.88 KB, text/plain) 2017-07-07 17:20 UTC, Yuri Obshansky	no flags	Details
slow query controller 2 (410 bytes, text/plain) 2017-07-07 17:20 UTC, Yuri Obshansky	no flags	Details
sosreport controller 0 part 1 (19.07 MB, application/x-xz) 2017-07-07 19:02 UTC, Yuri Obshansky	no flags	Details
sosreport controller 0 part 2 (19.07 MB, application/octet-stream) 2017-07-07 19:05 UTC, Yuri Obshansky	no flags	Details
sosreport controller 0 part 3 (13.52 MB, application/octet-stream) 2017-07-07 19:06 UTC, Yuri Obshansky	no flags	Details
sosreport controller 1 part 1 (19.07 MB, application/x-xz) 2017-07-07 19:07 UTC, Yuri Obshansky	no flags	Details
sosreport controller 1 part 2 (19.07 MB, application/octet-stream) 2017-07-07 19:08 UTC, Yuri Obshansky	no flags	Details
sosreport controller 1 part 3 (10.44 MB, application/octet-stream) 2017-07-07 19:09 UTC, Yuri Obshansky	no flags	Details
sosreport controller 2 part 1 (19.07 MB, application/x-xz) 2017-07-07 19:11 UTC, Yuri Obshansky	no flags	Details
sosreport controller 2 part 2 (19.07 MB, application/octet-stream) 2017-07-07 19:12 UTC, Yuri Obshansky	no flags	Details
sosreport controller 2 part 3 (10.16 MB, application/octet-stream) 2017-07-07 19:13 UTC, Yuri Obshansky	no flags	Details
rhos10 sosreport controller 0 part 1 (19.07 MB, application/x-xz) 2017-07-10 13:47 UTC, Yuri Obshansky	no flags	Details
rhos10 sosreport controller 0 part 2 (19.07 MB, application/octet-stream) 2017-07-10 13:49 UTC, Yuri Obshansky	no flags	Details
rhos10 sosreport controller 0 part 3 (19.07 MB, application/octet-stream) 2017-07-10 13:50 UTC, Yuri Obshansky	no flags	Details
rhos10 sosreport controller 0 part 4 (19.07 MB, application/octet-stream) 2017-07-10 13:51 UTC, Yuri Obshansky	no flags	Details
rhos10 sosreport controller 0 part 5 (19.07 MB, application/octet-stream) 2017-07-10 13:53 UTC, Yuri Obshansky	no flags	Details
rhos10 sosreport controller 0 part 6 (3.98 MB, application/octet-stream) 2017-07-10 13:54 UTC, Yuri Obshansky	no flags	Details
View All

Description Yuri Obshansky 2017-07-07 17:15:44 UTC

Description of problem:
The problem detected during execution REST API automatic tests on virt environment (1 undercloud, 3 controllers, 2 computes).
(it's hard to call them "performance", because only 2 concurrent threads loaded)
It's a regression compare to rhos10 which I loaded with 5 concurrent threads and test passed successfully.
(More details here - http://yobshans.rdu.openstack.engineering.redhat.com/rhos-jmeter/result/2017-07-05-virt-rhos-10-restapi-perf-test-5x50/) 
So, error stared to raise after 10-15 min of test running. As result services such as neutron-server, cinder-scheduler,nova-scheduler, nova-conductor and other lost connection to MySQL server and REST API calls failed. Openstack is not functional. Only controller nodes restart help to return back an application to live. 
See attached message file for details.
Errors looks like:
Jul  6 22:05:48 controller-1 neutron-server: ERROR [sqlalchemy.pool.QueuePool] [12842] Exception during reset or similar
Jul  6 22:05:48 controller-1 neutron-server: Traceback (most recent call last):
Jul  6 22:05:48 controller-1 neutron-server: File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 636, in _finalize_fairy
Jul  6 22:05:48 controller-1 neutron-server: fairy._reset(pool)
Jul  6 22:05:48 controller-1 neutron-server: File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 776, in _reset
Jul  6 22:05:48 controller-1 neutron-server: pool._dialect.do_rollback(self)
Jul  6 22:05:48 controller-1 neutron-server: File "/usr/lib64/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 2526, in do_rollback
Jul  6 22:05:48 controller-1 neutron-server: dbapi_connection.rollback()
Jul  6 22:05:48 controller-1 neutron-server: File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 778, in rollback
Jul  6 22:05:48 controller-1 neutron-server: self._read_ok_packet()
Jul  6 22:05:48 controller-1 neutron-server: File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 752, in _read_ok_packet
Jul  6 22:05:48 controller-1 neutron-server: pkt = self._read_packet()
Jul  6 22:05:48 controller-1 neutron-server: File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 991, in _read_packet
Jul  6 22:05:48 controller-1 neutron-server: packet.check_error()
Jul  6 22:05:48 controller-1 neutron-server: File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 393, in check_error
Jul  6 22:05:48 controller-1 neutron-server: err.raise_mysql_exception(self._data)
Jul  6 22:05:48 controller-1 neutron-server: File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 107, in raise_mysql_exception
Jul  6 22:05:48 controller-1 neutron-server: raise errorclass(errno, errval)
Jul  6 22:05:48 controller-1 neutron-server: InternalError: (1047, u'WSREP has not yet prepared node for application use')
Jul  6 22:05:48 controller-1 neutron-server: ERROR [oslo_db.sqlalchemy.exc_filters] [12842] DBAPIError exception wrapped from (pymysql.err.InternalError) (1047, u'WSREP has not yet prepared node for application use') [SQL: u'SELECT 1']
Jul  6 22:05:48 controller-1 neutron-server: Traceback (most recent call last):
Jul  6 22:05:48 controller-1 neutron-server: File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
Jul  6 22:05:48 controller-1 neutron-server: context)
Jul  6 22:05:48 controller-1 neutron-server: File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
Jul  6 22:05:48 controller-1 neutron-server: cursor.execute(statement, parameters)
Jul  6 22:05:48 controller-1 neutron-server: File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 166, in execute
Jul  6 22:05:48 controller-1 neutron-server: result = self._query(query)
Jul  6 22:05:48 controller-1 neutron-server: File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 322, in _query
Jul  6 22:05:48 controller-1 neutron-server: conn.query(q)
Jul  6 22:05:48 controller-1 neutron-server: File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 841, in query
Jul  6 22:05:48 controller-1 neutron-server: self._affected_rows = self._read_query_result(unbuffered=unbuffered)
Jul  6 22:05:48 controller-1 neutron-server: File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1029, in _read_query_result
Jul  6 22:05:48 controller-1 neutron-server: result.read()
Jul  6 22:05:48 controller-1 neutron-server: File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1312, in read
Jul  6 22:05:48 controller-1 neutron-server: first_packet = self.connection._read_packet()
Jul  6 22:05:48 controller-1 neutron-server: File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 991, in _read_packet
Jul  6 22:05:48 controller-1 neutron-server: packet.check_error()
Jul  6 22:05:48 controller-1 neutron-server: File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 393, in check_error
Jul  6 22:05:48 controller-1 neutron-server: err.raise_mysql_exception(self._data)
Jul  6 22:05:48 controller-1 neutron-server: File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 107, in raise_mysql_exception
Jul  6 22:05:48 controller-1 neutron-server: raise errorclass(errno, errval)
Jul  6 22:05:48 controller-1 neutron-server: InternalError: (1047, u'WSREP has not yet prepared node for application use')

Version-Release number of selected component (if applicable):
rhos-release 11   -p 2017-06-30.3
[root@controller-1 tmp]# python -c "import sqlalchemy; print sqlalchemy.__version__" 
1.0.11

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
Test result:
http://yobshans.rdu.openstack.engineering.redhat.com/rhos-jmeter/result/2017-07-06-virt-rhos-11-restapi-perf-test-2x50/
Pool statistics:
http://yobshans.rdu.openstack.engineering.redhat.com/rhos-jmeter/result/2017-07-06-virt-rhos-11-restapi-perf-test-2x50/stat/stat.html

Expected results:


Additional info:

Comment 1 Yuri Obshansky 2017-07-07 17:17:48 UTC

Created attachment 1295358 [details]
message log file from controller 0

Comment 2 Yuri Obshansky 2017-07-07 17:18:29 UTC

Created attachment 1295360 [details]
message log file from controller 1

Comment 3 Yuri Obshansky 2017-07-07 17:18:58 UTC

Created attachment 1295361 [details]
message log file from controller 2

Comment 4 Yuri Obshansky 2017-07-07 17:19:40 UTC

Created attachment 1295362 [details]
slow query controller 0

Comment 5 Yuri Obshansky 2017-07-07 17:20:09 UTC

Created attachment 1295363 [details]
slow query controller 1

Comment 6 Yuri Obshansky 2017-07-07 17:20:44 UTC

Created attachment 1295364 [details]
slow query controller 2

Comment 7 Michael Bayer 2017-07-07 18:10:29 UTC

Have you tried the innodb_flush_log_at_trx_commit flag as mentioned and additionally if these are VM based overcloud nodes, that is not appropriate for a stress test.   sosreports will be very helpful.

Comment 8 Yuri Obshansky 2017-07-07 18:31:42 UTC

(In reply to Michael Bayer from comment #7)
> Have you tried the innodb_flush_log_at_trx_commit flag as mentioned and
Not yet. I'd like provide all required information before
After that I planed to rerun test as you suggested
> additionally if these are VM based overcloud nodes, that is not appropriate
> for a stress test.   
It's not stress test - 2 concurrent threads. Nothing!
Why RHOS 10 is working?
>sosreports will be very helpful.
Preparing.

Regards,
Yuri

Comment 9 Michael Bayer 2017-07-07 18:53:11 UTC

There is a setting called innodb_file_per_table that increases file handles and write activity by a large margin.   This flag is now turned on in OSP11 (would want to confirm you see this move from off to on from 10 to 11).  In particular it can be taxing on a qemu VM that does not specify disk cache mode = 'unsafe'.

Comment 10 Yuri Obshansky 2017-07-07 19:02:59 UTC

Created attachment 1295381 [details]
sosreport controller 0 part 1

I split original sosreport archive using command:
# split -b 20000000 sosreport-controller-0.localdomain-20170707182823.tar.xz sosreport-controller-0
To join use command:
# cat sosreport-controller-0* > new.tar.xz

Comment 11 Yuri Obshansky 2017-07-07 19:05:04 UTC

Created attachment 1295384 [details]
sosreport controller 0 part 2

Comment 12 Yuri Obshansky 2017-07-07 19:06:02 UTC

Created attachment 1295386 [details]
sosreport controller 0 part 3

Comment 13 Yuri Obshansky 2017-07-07 19:07:10 UTC

Created attachment 1295388 [details]
sosreport controller 1 part 1

Comment 14 Yuri Obshansky 2017-07-07 19:08:22 UTC

Created attachment 1295389 [details]
sosreport controller 1 part 2

Comment 15 Yuri Obshansky 2017-07-07 19:09:30 UTC

Created attachment 1295390 [details]
sosreport controller 1 part 3

Comment 16 Yuri Obshansky 2017-07-07 19:11:01 UTC

Created attachment 1295391 [details]
sosreport controller 2 part 1

Comment 17 Yuri Obshansky 2017-07-07 19:12:20 UTC

Created attachment 1295392 [details]
sosreport controller 2 part 2

Comment 18 Yuri Obshansky 2017-07-07 19:13:14 UTC

Created attachment 1295393 [details]
sosreport controller 2 part 3

Comment 19 Yuri Obshansky 2017-07-07 19:29:44 UTC

(In reply to Michael Bayer from comment #9)
> There is a setting called innodb_file_per_table that increases file handles
> and write activity by a large margin.   This flag is now turned on in OSP11
> (would want to confirm you see this move from off to on from 10 to 11).  In
> particular it can be taxing on a qemu VM that does not specify disk cache
> mode = 'unsafe'.

I found that setting in /etc/my.cnf.d/galera.cnf:
innodb_file_per_table = ON

How do you suggest to change it?

Thanks
Yuri

Comment 20 Michael Bayer 2017-07-07 20:05:23 UTC

can be done in /etc/my.cnf.d/galera.cnf:

1. turn off performance schema on all three nodes:

https://mariadb.com/resources/blog/starting-mysql-low-memory-virtual-machines

2. set innodb_flush_log_at_trx_commit=2

we can see all three controllers are very low on memory, they only have 8G to start with and are all down to 100M:

 cat sosreport-controller-?/free 
              total        used        free      shared  buff/cache   available
Mem:        8010516     7653780      145040       40032      211696       38732
Swap:             0           0           0
              total        used        free      shared  buff/cache   available
Mem:        8010516     7566804      196388       55436      247324      101084
Swap:             0           0           0
              total        used        free      shared  buff/cache   available
Mem:        8010516     7570244      105232       39968      335040       61480
Swap:             0           0           0

controller0 is also churning on load:

[classic@photon2 sos]$ cat sosreport-controller-0/uptime 
 18:28:39 up  5:44,  1 user,  load average: 10.66, 5.29, 3.48
[classic@photon2 sos]$ cat sosreport-controller-1/uptime 
 18:27:18 up  5:43,  1 user,  load average: 2.12, 2.11, 2.12
[classic@photon2 sos]$ cat sosreport-controller-2/uptime 
 18:28:54 up  5:43,  1 user,  load average: 2.18, 2.62, 2.48


looking at lsof stats, we can se ceilometer and httpd are the biggest owners of filehandles, and gnocchi, which I'm not sure is part of OSP10, is using a lot too.  The mysqld daemon, which will use more handles as a result of the innodb_file_per_table setting, is still not the biggest user of handles:

[classic@photon2 sos]$ awk '{ print $1 }'  sosreport-controller-0/lsof | sort | uniq -c | sort -gr | head -10 
  37289 ceilomete
  26045 httpd
   6502 swift-obj
   5793 gnocchi-m
   5024 mysqld
   3648 mongod
   2224 libvirtd
   2196 /usr/bin/
   1612 aodh-eval
   1322 nova-api
[classic@photon2 sos]$ awk '{ print $1 }'  sosreport-controller-1/lsof | sort | uniq -c | sort -gr | head -10 
awk: fatal: cannot open file `sosreport-controller-1/lsof' for reading (No such file or directory)
[classic@photon2 sos]$ awk '{ print $1 }'  sosreport-controller-2/lsof | sort | uniq -c | sort -gr | head -10 
  36861 ceilomete
  22646 httpd
   6501 swift-obj
   5793 gnocchi-m
   3224 mongod
   2224 libvirtd
   2196 /usr/bin/
   1626 aodh-eval
   1464 nova-api
   1260 heat-api-

i'm starting to look at the nature of the "out of memory" error, it seems like it did not actually kill the server and hit two times, each after controller 0 joined the cluster.  i want to see what the nature of this join was.  that will be in a subsequent comment.

Overall I think stuffing this many extremely file/memory hungry applications all onto a tiny 8G VM should not be surprising when the server runs out of memory.   ideally galera and others would all be spread out on individual servers.

Comment 21 Michael Bayer 2017-07-07 20:07:05 UTC

(In reply to Yuri Obshansky from comment #19)
> (In reply to Michael Bayer from comment #9)
> > There is a setting called innodb_file_per_table that increases file handles
> > and write activity by a large margin.   This flag is now turned on in OSP11
> > (would want to confirm you see this move from off to on from 10 to 11).  In
> > particular it can be taxing on a qemu VM that does not specify disk cache
> > mode = 'unsafe'.
> 
> I found that setting in /etc/my.cnf.d/galera.cnf:
> innodb_file_per_table = ON
> 
> How do you suggest to change it?

don't change that.

I'd look into getting ceilometer / gnocchi (IIUC gnocchi is strictly used by ceilometer) off these controllers as it seems to be the biggest user of resources (though a "top" display, for some reason not part of the sosreport, would tell a lot more).

Comment 22 Michael Bayer 2017-07-07 20:16:48 UTC

galera node on controller 0 became partitioned due to network partition at 170707 13:27:31, moved into NON-PRIMARY, was shutdown by Pacemaker and restarted, rejoined the cluster.  out of memory errror occurred after it received SST sync on rejoining the cluster but became synced nevertheless by 13:27:57.

controller 0 again seemed to fall behind and get stopped at 14:04:09 and again restarted, out of memory error at 14:07:16, synced in cluster again at 14:07:16.  

I haven't dug too closely into these splits but all three servers were at 95%+ memory utilization which I think is the main issue.

Comment 23 Yuri Obshansky 2017-07-10 13:30:48 UTC

Hi Michael,
You did very detailed analysis of sosreport.
That's awesome, but still there is no answer why that issue didn't happen on RHOS 10. I deployed RHOS 10 and rerun test on weekend using the same hardware.
Test completed successfully with load of 5 (Five) concurrent threads. 
Here is test report:
http://yobshans.rdu.openstack.engineering.redhat.com/rhos-jmeter/result/2017-07-09-virt-rhos-10-restapi-perf-test-5x20/

The default RHOS 10 has running ceilometer and gnocchi service on controllers,
and they didn't impact on test result.
[root@controller-0 tmp]# ps -ef | grep ceilometer
ceilome+    1892       1  0 Jul09 ?        00:04:01 /usr/bin/python2 /usr/bin/ceilometer-polling --polling-namespaces central --logfile /var/log/ceilometer/central.log
ceilome+    1924       1  0 Jul09 ?        00:03:49 /usr/bin/python2 /usr/bin/ceilometer-collector --logfile /var/log/ceilometer/collector.log
ceilome+    1941       1  0 Jul09 ?        00:03:52 /usr/bin/python2 /usr/bin/ceilometer-agent-notification --logfile /var/log/ceilometer/agent-notification.log
ceilome+    2533    1892  0 Jul09 ?        00:05:40 ceilometer-polling - AgentManager(0)
ceilome+    2550    1924  7 Jul09 ?        01:34:42 ceilometer-collector - CollectorService(0)
ceilome+  791538    1941  3 02:18 ?        00:23:06 ceilometer-agent-notification - NotificationService(0)
[root@controller-0 tmp]# ps -ef | grep gnocchi
gnocchi     1921       1  0 Jul09 ?        00:04:21 /usr/bin/python2 /usr/bin/gnocchi-metricd --logfile /var/log/gnocchi/metricd.log
gnocchi     1944       1  0 Jul09 ?        00:02:51 /usr/bin/python2 /usr/bin/gnocchi-statsd --logfile /var/log/gnocchi/statsd.log
gnocchi     2583    1921  0 Jul09 ?        00:01:19 /usr/bin/python2 /usr/bin/gnocchi-metricd --logfile /var/log/gnocchi/metricd.log
gnocchi     2596    1921  0 Jul09 ?        00:03:17 gnocchi-metricd - scheduler(0)
gnocchi     2597    1921  0 Jul09 ?        00:05:58 gnocchi-metricd - processing(0)
gnocchi     2598    1921  0 Jul09 ?        00:06:10 gnocchi-metricd - processing(1)
gnocchi     2599    1921  0 Jul09 ?        00:01:24 gnocchi-metricd - reporting(0)
gnocchi     2600    1921  0 Jul09 ?        00:01:23 gnocchi-metricd - janitor(0)
gnocchi   107735    4056  0 03:35 ?        00:00:50 /usr/sbin/httpd -DFOREGROUND
gnocchi   107736    4056  0 03:35 ?        00:00:49 /usr/sbin/httpd -DFOREGROUND

I created sosreport for RHOS 10.
It will be useful if you could analyse them as you're already done for RHOS 11.

Comment 24 Yuri Obshansky 2017-07-10 13:47:38 UTC

Created attachment 1295835 [details]
rhos10 sosreport controller 0 part 1

Comment 25 Yuri Obshansky 2017-07-10 13:49:03 UTC

Created attachment 1295836 [details]
rhos10 sosreport controller 0 part 2

Comment 26 Yuri Obshansky 2017-07-10 13:50:15 UTC

Created attachment 1295838 [details]
rhos10 sosreport controller 0 part 3

Comment 27 Yuri Obshansky 2017-07-10 13:51:47 UTC

Created attachment 1295839 [details]
rhos10 sosreport controller 0 part 4

Comment 28 Yuri Obshansky 2017-07-10 13:53:49 UTC

Created attachment 1295841 [details]
rhos10 sosreport controller 0 part 5

Comment 29 Yuri Obshansky 2017-07-10 13:54:46 UTC

Created attachment 1295842 [details]
rhos10 sosreport controller 0 part 6

Comment 31 Yuri Obshansky 2017-07-10 14:09:37 UTC

I have an idea. 
Let's rerun tests without ceilometer / gnocchi 
and see what's result.

Comment 32 Michael Bayer 2017-07-10 14:29:28 UTC

yeah you just mid-aired me on the same comment.   so see what happens, the OSP 11 test.

Comment 33 Yuri Obshansky 2017-07-11 22:52:02 UTC

Hi Mike,
You were right. Gnocchi and Ceilometer impacted on performance and functionality of Openstack 11. There is still an opened question: why their impact so dramatic in RHOS 11 compare to RHOS 10.
I deployed RHOS 10 and 11 with disabled telemetry (no Gnocchi and Ceilometer)
and rerun tests. Both test passed successfully.
Even RHOS 11 support load of 5 concurrent threads as RHOS 10.
Comparison test results here:

- RHOS 11 vs RHOS 10
http://yobshans.rdu.openstack.engineering.redhat.com/rhos-jmeter/result/2017-07-10-compare_restapi-perf-test-rhos-11-vs-10/

- RHOS 11 vs RHOS 9
http://yobshans.rdu.openstack.engineering.redhat.com/rhos-jmeter/result/2017-07-11-compare_restapi-perf-test-rhos-11-vs-9/

- RHOS 10 vs RHOS 9
http://yobshans.rdu.openstack.engineering.redhat.com/rhos-jmeter/result/2017-07-11-compare_restapi-perf-test-rhos-10-vs-9/

There is a performance degradation still existed in RHOS 11
but, without mysql crashes and other functionality issues.

Probably, this bug could be closed because I think, already existed bugs related to Telemetry problems.

Regards,
Yuri

Comment 34 Michael Bayer 2017-07-11 23:44:42 UTC

well every new version has more bells, whistles, flags, and things, so unless you tend to performance equally as much on every release, it will otherwise decrease.   but overall the "run 18 services on one machine" model needs to change at this point.

Comment 35 Michael Bayer 2017-07-24 13:08:02 UTC

while there's an issue with resource usage for this particular test, SQLAlchemy's connection pool is not part of it, resource starvation on the host was making it impossible to reliably maintain database connections from a network / MySQL server perspective.

Note You need to log in before you can comment on or make changes to this bug.