1709075 – mysql process consuming 400% CPU

Bug 1709075 - mysql process consuming 400% CPU

Summary: mysql process consuming 400% CPU

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	z7
Target Release:	13.0 (Queens)
Assignee:	Martin Magr
QA Contact:	Leonid Natapov
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1721557
TreeView+	depends on / blocked

Reported:	2019-05-13 01:29 UTC by Brendan Shephard
Modified:	2024-03-25 15:17 UTC (History)
CC List:	17 users (show)
Fixed In Version:	openstack-tripleo-heat-templates-8.3.1-79.el7ost
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	1721557 1721647 (view as bug list)
Environment:
Last Closed:	2019-09-19 10:44:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1825477	None	None	None	2019-05-14 13:54:58 UTC
OpenStack gerrit	664048	None	MERGED	Add panko_api_cron container	2021-01-14 01:01:02 UTC
Red Hat Knowledge Base (Solution)	5655471	None	None	None	2020-12-17 03:39:50 UTC

Description Brendan Shephard 2019-05-13 01:29:03 UTC

Description of problem:
On all three controllers, the mysql process is consuming around 4000% CPU.

Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform 13

How reproducible:
Hard to reproduce outside of customer environment

Steps to Reproduce:
1. Deploy overcloud with or without composable roles
2. Check CPU utilization of mysql process
3.

Actual results:
Very high CPU utilisation. Even after restarting galera from pcs we're seeing 400% CPU utilization. No iowait observed

Expected results:
Much lower CPU utilization

Additional info:
We have checked that Ceilometer isn't the cause by stopping Ceilometer and Panko containers.
We tried increasing file descriptors

Comment 6 Brendan Shephard 2019-05-13 23:42:46 UTC

It appears that there is a large Panko query that is running longer than the connection is staying open for"
  File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1033, in _read_bytes
    CR.CR_SERVER_LOST, "Lost connection to MySQL server during query")

Should we potentially try increasing net_read_timeout from 30 to 60 seconds here?

Comment 28 Brendan Shephard 2019-05-22 03:44:00 UTC

Right, so the problem here is:

There is a cron job for panko-expirer in the panko_api container, under /var/spool/cron/panko that looks like this:

# HEADER: This file was autogenerated at 2019-05-20 03:43:36 +0000 by puppet.
# HEADER: While it can still be managed manually, it is definitely not recommended.
# HEADER: Note particularly that the comments starting with 'Puppet Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.
# Puppet Name: panko-expirer
PATH=/bin:/usr/bin:/usr/sbin SHELL=/bin/sh
1 0 * * * panko-expirer


But no crond running:
()[root@overcloud-controller-0 /]# ps -ef | grep cron
root       11598   11552  0 03:39 ?        00:00:00 grep --color=auto cron


Other containers do the same thing, but they all have accompanying cron containers:
[root@overcloud-controller-0 etc]# docker ps --filter name=_cron --format "{{.Names}}"
heat_api_cron
cinder_api_cron
logrotate_crond
nova_api_cron
keystone_cron


So the problem here is that we aren't deploying a panko_api_cron container, therefore we never run panko-expirer and the panko events just continue to fill up.

Comment 35 Nataf Sharabi 2019-06-24 09:13:19 UTC

In order to verify:

After fixed - need to enter controller &

find panko_api_cron container.

See that it's healthy & running.


Also need to see the cron job: sudo docker exec -ti panko_api_cron ps -elf

Comment 38 Lon Hohberger 2019-07-11 10:41:19 UTC

According to our records, this should be resolved by openstack-tripleo-heat-templates-8.3.1-54.el7ost.  This build is available now.

Comment 39 Sasha Smolyak 2019-07-15 08:33:15 UTC

openstack-tripleo-heat-templates-8.3.1-55 is not in the latest puddle of OSP13 (2019-06-28.1), failing QA:
(undercloud) [stack@undercloud-0 ~]$ rpm -qa | grep tripleo-heat
openstack-tripleo-heat-templates-8.3.1-54.el7ost.noarch

Comment 58 Lon Hohberger 2019-09-04 10:43:57 UTC

According to our records, this should be resolved by openstack-tripleo-heat-templates-8.3.1-79.el7ost.  This build is available now.

Comment 61 Leonid Natapov 2019-09-18 10:51:26 UTC

Tested according to description in comment #35.

Note You need to log in before you can comment on or make changes to this bug.