Description of problem: On all three controllers, the mysql process is consuming around 4000% CPU. Version-Release number of selected component (if applicable): Red Hat OpenStack Platform 13 How reproducible: Hard to reproduce outside of customer environment Steps to Reproduce: 1. Deploy overcloud with or without composable roles 2. Check CPU utilization of mysql process 3. Actual results: Very high CPU utilisation. Even after restarting galera from pcs we're seeing 400% CPU utilization. No iowait observed Expected results: Much lower CPU utilization Additional info: We have checked that Ceilometer isn't the cause by stopping Ceilometer and Panko containers. We tried increasing file descriptors
It appears that there is a large Panko query that is running longer than the connection is staying open for" File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1033, in _read_bytes CR.CR_SERVER_LOST, "Lost connection to MySQL server during query") Should we potentially try increasing net_read_timeout from 30 to 60 seconds here?
Right, so the problem here is: There is a cron job for panko-expirer in the panko_api container, under /var/spool/cron/panko that looks like this: # HEADER: This file was autogenerated at 2019-05-20 03:43:36 +0000 by puppet. # HEADER: While it can still be managed manually, it is definitely not recommended. # HEADER: Note particularly that the comments starting with 'Puppet Name' should # HEADER: not be deleted, as doing so could cause duplicate cron jobs. # Puppet Name: panko-expirer PATH=/bin:/usr/bin:/usr/sbin SHELL=/bin/sh 1 0 * * * panko-expirer But no crond running: ()[root@overcloud-controller-0 /]# ps -ef | grep cron root 11598 11552 0 03:39 ? 00:00:00 grep --color=auto cron Other containers do the same thing, but they all have accompanying cron containers: [root@overcloud-controller-0 etc]# docker ps --filter name=_cron --format "{{.Names}}" heat_api_cron cinder_api_cron logrotate_crond nova_api_cron keystone_cron So the problem here is that we aren't deploying a panko_api_cron container, therefore we never run panko-expirer and the panko events just continue to fill up.
In order to verify: After fixed - need to enter controller & find panko_api_cron container. See that it's healthy & running. Also need to see the cron job: sudo docker exec -ti panko_api_cron ps -elf
According to our records, this should be resolved by openstack-tripleo-heat-templates-8.3.1-54.el7ost. This build is available now.
openstack-tripleo-heat-templates-8.3.1-55 is not in the latest puddle of OSP13 (2019-06-28.1), failing QA: (undercloud) [stack@undercloud-0 ~]$ rpm -qa | grep tripleo-heat openstack-tripleo-heat-templates-8.3.1-54.el7ost.noarch
According to our records, this should be resolved by openstack-tripleo-heat-templates-8.3.1-79.el7ost. This build is available now.
Tested according to description in comment #35.