Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1686397

Summary: [osp15] mistral-executor is stuck with 100% cpu
Product: Red Hat OpenStack Reporter: Michele Baldessari <michele>
Component: openstack-mistralAssignee: Adriano Petrich <apetrich>
Status: CLOSED ERRATA QA Contact: Amit Ugol <augol>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 15.0 (Stein)CC: apetrich, augol, beth.white, emacchi, jjoyce, jschluet, slinaber, tvignaud
Target Milestone: betaKeywords: Triaged
Target Release: 15.0 (Stein)Flags: apetrich: needinfo-
apetrich: needinfo-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-mistral-8.0.1-0.20190606070407.6ff82c3.el8ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-21 11:20:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michele Baldessari 2019-03-07 11:26:43 UTC
Problem statement:
On a rhel8 os + rhel8 containers I can reliably get mistral-server to lock up using 100% cpu

This seems to happen as soon as an ansible task finishes (either successfully or unsuccessfully)

So I will get this in top :
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 319045 42430     20   0 1068584 251084  24316 R 100.0   1.2  18:03.18 mistral-server

It is the mistral_executor container that is stuck:
 [root@undercloud-0 mistral]# for i in mistral_engine mistral_executor mistral_api mistral_event_engine; do echo $i; podman inspect $i --format '{{.Sta
te.Pid}}' | grep 319045; done
mistral_engine
mistral_executor
319045
mistral_api
mistral_event_engine

There are no particular errors in the executor.log (as a matter of fact it seems it stops logging once it hangs)

sosreport from the undercloud: http://file.rdu.redhat.com/~mbaldess/mistral-bz/sosreport-undercloud-0-2019-03-07-bkgdcaw.tar.xz
mistral logs are here: http://file.rdu.redhat.com/~mbaldess/mistral-bz/mistral-logs.tgz

Versions inside the container:
 [root@undercloud-0 containers]# for i in mistral_executor; do echo $i; podman exec -it -u root $i sh -c 'rpm -qa |grep mistral'; done
mistral_executor
puppet-mistral-14.2.1-0.20190226101400.b76cf93.el8ost.noarch
python3-mistral-lib-1.0.0-0.20190117093849.d1ccfd0.el8ost.noarch
python3-mistral-8.0.0-0.20190228185014.608367f.el8ost.noarch
openstack-mistral-common-8.0.0-0.20190228185014.608367f.el8ost.noarch
python3-mistralclient-3.7.0-0.20190110194247.f0ee48f.el8ost.noarch
openstack-mistral-executor-8.0.0-0.20190228185014.608367f.el8ost.noarch

Comment 1 Michele Baldessari 2019-03-07 11:40:46 UTC
It seems to be stuck in a loop reading from a pipe that is constantly returning zero.
 [root@undercloud-0 containers]# head -n10 /tmp/strace
319045 read(9<pipe:[3258466]>, "", 8192) = 0
319045 read(9<pipe:[3258466]>, "", 8192) = 0
319045 read(9<pipe:[3258466]>, "", 8192) = 0
319045 read(9<pipe:[3258466]>, "", 8192) = 0
319045 read(9<pipe:[3258466]>, "", 8192) = 0
319045 read(9<pipe:[3258466]>, "", 8192) = 0
319045 read(9<pipe:[3258466]>, "", 8192) = 0

I guess the other side died since there is only one process with a descriptor open on the pipe?
 [root@undercloud-0 containers]# lsof -n 2>&1|grep -w 3258466
mistral-s 319045                           42430    9r     FIFO               0,12       0t0    3258466 pipe

Comment 2 Michele Baldessari 2019-03-07 12:07:59 UTC
Here is a small reproducer:
 [root@undercloud-0 ~]# more reproducer.py                    
import subprocess                                             
import time                                                   
                                                              
command = ['ansible', '-m', 'ping', 'localhost']              
# that processutils has, do we need to replicate that somehow?
process = subprocess.Popen(command, stdout=subprocess.PIPE,   
                           stderr=subprocess.STDOUT,          
                           shell=False, bufsize=1,            
                           universal_newlines=True)           
start = time.time()                                           
stdout = []                                                   
lines = []                                                    
for line in iter(process.stdout.readline, b''):               
    lines.append(line)                                        
                                                              
print(lines)                                                  

(mostly copied from https://github.com/openstack/tripleo-common/blob/master/tripleo_common/actions/ansible.py#L570)

Comment 6 Amit Ugol 2019-07-21 06:20:30 UTC
Its not better by much but it doesn't seem to block anything now and mistral only stresses itself during operations.

Comment 10 errata-xmlrpc 2019-09-21 11:20:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811