Bug 1709075 - mysql process consuming 400% CPU
Summary: mysql process consuming 400% CPU
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
Target Milestone: z7
: 13.0 (Queens)
Assignee: Martin Magr
QA Contact: Leonid Natapov
Depends On:
Blocks: 1721557
TreeView+ depends on / blocked
Reported: 2019-05-13 01:29 UTC by Brendan Shephard
Modified: 2020-12-17 03:39 UTC (History)
17 users (show)

Fixed In Version: openstack-tripleo-heat-templates-8.3.1-79.el7ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1721557 1721647 (view as bug list)
Last Closed: 2019-09-19 10:44:11 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Launchpad 1825477 0 None None None 2019-05-14 13:54:58 UTC
OpenStack gerrit 664048 0 None MERGED Add panko_api_cron container 2021-01-14 01:01:02 UTC
Red Hat Knowledge Base (Solution) 5655471 0 None None None 2020-12-17 03:39:50 UTC

Description Brendan Shephard 2019-05-13 01:29:03 UTC
Description of problem:
On all three controllers, the mysql process is consuming around 4000% CPU.

Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform 13

How reproducible:
Hard to reproduce outside of customer environment

Steps to Reproduce:
1. Deploy overcloud with or without composable roles
2. Check CPU utilization of mysql process

Actual results:
Very high CPU utilisation. Even after restarting galera from pcs we're seeing 400% CPU utilization. No iowait observed

Expected results:
Much lower CPU utilization

Additional info:
We have checked that Ceilometer isn't the cause by stopping Ceilometer and Panko containers.
We tried increasing file descriptors

Comment 6 Brendan Shephard 2019-05-13 23:42:46 UTC
It appears that there is a large Panko query that is running longer than the connection is staying open for"
  File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1033, in _read_bytes
    CR.CR_SERVER_LOST, "Lost connection to MySQL server during query")

Should we potentially try increasing net_read_timeout from 30 to 60 seconds here?

Comment 28 Brendan Shephard 2019-05-22 03:44:00 UTC
Right, so the problem here is:

There is a cron job for panko-expirer in the panko_api container, under /var/spool/cron/panko that looks like this:

# HEADER: This file was autogenerated at 2019-05-20 03:43:36 +0000 by puppet.
# HEADER: While it can still be managed manually, it is definitely not recommended.
# HEADER: Note particularly that the comments starting with 'Puppet Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.
# Puppet Name: panko-expirer
PATH=/bin:/usr/bin:/usr/sbin SHELL=/bin/sh
1 0 * * * panko-expirer

But no crond running:
()[root@overcloud-controller-0 /]# ps -ef | grep cron
root       11598   11552  0 03:39 ?        00:00:00 grep --color=auto cron

Other containers do the same thing, but they all have accompanying cron containers:
[root@overcloud-controller-0 etc]# docker ps --filter name=_cron --format "{{.Names}}"

So the problem here is that we aren't deploying a panko_api_cron container, therefore we never run panko-expirer and the panko events just continue to fill up.

Comment 35 Nataf Sharabi 2019-06-24 09:13:19 UTC
In order to verify:

After fixed - need to enter controller &

find panko_api_cron container.

See that it's healthy & running.

Also need to see the cron job: sudo docker exec -ti panko_api_cron ps -elf

Comment 38 Lon Hohberger 2019-07-11 10:41:19 UTC
According to our records, this should be resolved by openstack-tripleo-heat-templates-8.3.1-54.el7ost.  This build is available now.

Comment 39 Sasha Smolyak 2019-07-15 08:33:15 UTC
openstack-tripleo-heat-templates-8.3.1-55 is not in the latest puddle of OSP13 (2019-06-28.1), failing QA:
(undercloud) [stack@undercloud-0 ~]$ rpm -qa | grep tripleo-heat

Comment 58 Lon Hohberger 2019-09-04 10:43:57 UTC
According to our records, this should be resolved by openstack-tripleo-heat-templates-8.3.1-79.el7ost.  This build is available now.

Comment 61 Leonid Natapov 2019-09-18 10:51:26 UTC
Tested according to description in comment #35.

Note You need to log in before you can comment on or make changes to this bug.