Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1709075

Summary: mysql process consuming 400% CPU
Product: Red Hat OpenStack Reporter: Brendan Shephard <bshephar>
Component: openstack-tripleo-heat-templatesAssignee: Martin Magr <mmagr>
Status: CLOSED CURRENTRELEASE QA Contact: Leonid Natapov <lnatapov>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: asimonel, dciabrin, emacchi, jschluet, maufart, mbayer, mburns, michele, mmagr, mrunge, mvalsecc, nsharabi, pkilambi, slinaber, sputhenp, ssmolyak, vkapalav
Target Milestone: z7Keywords: Reopened, TestOnly, Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-8.3.1-79.el7ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1721557 1721647 (view as bug list) Environment:
Last Closed: 2019-09-19 10:44:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1721557    

Description Brendan Shephard 2019-05-13 01:29:03 UTC
Description of problem:
On all three controllers, the mysql process is consuming around 4000% CPU.

Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform 13

How reproducible:
Hard to reproduce outside of customer environment

Steps to Reproduce:
1. Deploy overcloud with or without composable roles
2. Check CPU utilization of mysql process
3.

Actual results:
Very high CPU utilisation. Even after restarting galera from pcs we're seeing 400% CPU utilization. No iowait observed

Expected results:
Much lower CPU utilization

Additional info:
We have checked that Ceilometer isn't the cause by stopping Ceilometer and Panko containers.
We tried increasing file descriptors

Comment 6 Brendan Shephard 2019-05-13 23:42:46 UTC
It appears that there is a large Panko query that is running longer than the connection is staying open for"
  File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1033, in _read_bytes
    CR.CR_SERVER_LOST, "Lost connection to MySQL server during query")

Should we potentially try increasing net_read_timeout from 30 to 60 seconds here?

Comment 28 Brendan Shephard 2019-05-22 03:44:00 UTC
Right, so the problem here is:

There is a cron job for panko-expirer in the panko_api container, under /var/spool/cron/panko that looks like this:

# HEADER: This file was autogenerated at 2019-05-20 03:43:36 +0000 by puppet.
# HEADER: While it can still be managed manually, it is definitely not recommended.
# HEADER: Note particularly that the comments starting with 'Puppet Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.
# Puppet Name: panko-expirer
PATH=/bin:/usr/bin:/usr/sbin SHELL=/bin/sh
1 0 * * * panko-expirer


But no crond running:
()[root@overcloud-controller-0 /]# ps -ef | grep cron
root       11598   11552  0 03:39 ?        00:00:00 grep --color=auto cron


Other containers do the same thing, but they all have accompanying cron containers:
[root@overcloud-controller-0 etc]# docker ps --filter name=_cron --format "{{.Names}}"
heat_api_cron
cinder_api_cron
logrotate_crond
nova_api_cron
keystone_cron


So the problem here is that we aren't deploying a panko_api_cron container, therefore we never run panko-expirer and the panko events just continue to fill up.

Comment 35 Nataf Sharabi 2019-06-24 09:13:19 UTC
In order to verify:

After fixed - need to enter controller &

find panko_api_cron container.

See that it's healthy & running.


Also need to see the cron job: sudo docker exec -ti panko_api_cron ps -elf

Comment 38 Lon Hohberger 2019-07-11 10:41:19 UTC
According to our records, this should be resolved by openstack-tripleo-heat-templates-8.3.1-54.el7ost.  This build is available now.

Comment 39 Sasha Smolyak 2019-07-15 08:33:15 UTC
openstack-tripleo-heat-templates-8.3.1-55 is not in the latest puddle of OSP13 (2019-06-28.1), failing QA:
(undercloud) [stack@undercloud-0 ~]$ rpm -qa | grep tripleo-heat
openstack-tripleo-heat-templates-8.3.1-54.el7ost.noarch

Comment 58 Lon Hohberger 2019-09-04 10:43:57 UTC
According to our records, this should be resolved by openstack-tripleo-heat-templates-8.3.1-79.el7ost.  This build is available now.

Comment 61 Leonid Natapov 2019-09-18 10:51:26 UTC
Tested according to description in comment #35.