Bug 891349

Summary: Multi-process launching issue
Product: Red Hat OpenStack Reporter: Mark McLoughlin <markmc>
Component: openstack-novaAssignee: Nikola Dipanov <ndipanov>
Status: CLOSED ERRATA QA Contact: Kashyap Chamarthy <kchamart>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2.0 (Folsom)CC: eglynn, kchamart, mmagr
Target Milestone: snapshot5Keywords: TestCaseNeeded, Triaged, VerifiedOnDev
Target Release: 2.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-nova-2012.2.3-5.el6ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-04-04 20:21:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mark McLoughlin 2013-01-02 16:50:38 UTC
See:

 https://bugs.launchpad.net/oslo/+bug/1094076
 https://review.openstack.org/18689

We backported the fix which went into 0.10.0 into python-eventlet-0.9.17-2:

 http://pkgs.fedoraproject.org/cgit/python-eventlet.git/commit/?h=el6&id=22aee3ca73f9872b4aa8f5e65bc2c4feba87146d

However, AFAICT, this means that if you kill the parent process it no longer shuts down itself and all the children. I suspect we should just backport the patch above.

This is basically a TODO task to test multi-process launching in more detail.

You need to set e.g. osapi_compute_workers=5, restart nova-api and confirm that:

  - The API server can handle concurrent requests safely
  - There are 5 worker processes
  - Killing a worker causes another one to respawned
  - Killing the parent causes it and the children to shut down
  - There's no extra warnings or debug spew in api.log

Comment 2 Martin Magr 2013-01-10 12:38:24 UTC
IMHO this should be backported to Folsom, but as Mark wrote, it should be tested first.

Comment 8 Kashyap Chamarthy 2013-03-21 10:15:16 UTC
Some investigation..

1] Version info:
#-------------#
$ rpm -q openstack-nova ; uname -r ; arch
openstack-nova-2012.2.3-7.el6ost.noarch
2.6.32-358.el6.x86_64
x86_64
#-------------#


2] Ensure the fix is in:
#-------------#
$ grep os.waitpid /usr/lib/python2.6/site-packages/nova/service.py -A1 -B2
   # Don't block if no child processes have exited
    pid, status = os.waitpid(0, os.WNOHANG)
    if not pid:
#-------------#


3] Verification procedure:

3.1] Ensure to have 5 worker processes; restart nova services:
#-------------#
$ sudo grep -i osapi_compute_workers=5 /etc/nova/nova.conf 
 osapi_compute_workers=5
#-------------#
$ for j in `for i in $(ls -1 /etc/init.d/openstack-nova-*) ; do $i status | grep
 running ; done | awk '{print $1}'` ; do service $j restart ; done
 Stopping openstack-nova-cert:                              [  OK  ]
 Starting openstack-nova-cert:                              [  OK  ]
 Stopping openstack-nova-compute:                           [  OK  ]
 Starting openstack-nova-compute:                           [  OK  ]
 Stopping openstack-nova-console:                           [  OK  ]
 Starting openstack-nova-console:                           [  OK  ]
 Stopping openstack-nova-consoleauth:                       [  OK  ]
 Starting openstack-nova-consoleauth:                       [  OK  ]
 Stopping openstack-nova-network:                           [  OK  ]
 Starting openstack-nova-network:                           [  OK  ]
 Stopping openstack-nova-novncproxy:                        [  OK  ]
 Starting openstack-nova-novncproxy:                        [  OK  ]
#-------------#
$ for j in `for i in $(ls -1 /etc/init.d/openstack-nova-*) ; do $i status | grep
 running ; done | awk '{print $1}'` ; do service $j status ; done
 openstack-nova-cert (pid  75571) is running...
 openstack-nova-compute (pid  75596) is running...
 openstack-nova-console (pid  75637) is running...
 openstack-nova-consoleauth (pid  75671) is running...
 openstack-nova-network (pid  75711) is running...
 openstack-nova-novncproxy (pid  75729) is running...
#-------------#


3.2] Restart doesn't mean, they're running fine, let's also check the status:
#-------------#
# for j in `for i in $(ls -1 /etc/init.d/openstack-nova-*) ; do $i status | grep running ; done | awk '{print $1}'` ; do service $j status ; done
openstack-nova-cert (pid  75571) is running...
openstack-nova-compute (pid  75596) is running...
openstack-nova-console (pid  75637) is running...
openstack-nova-consoleauth (pid  75671) is running...
openstack-nova-network (pid  75711) is running...
openstack-nova-novncproxy (pid  75729) is running...

#-------------#


3.4] Check ps -ef for nova:
#-------------#
[tuser1@interceptor ~(keystone_user1)]$ ps -ef | egrep -i nova-
tuser1   20557 68539  0 15:31 pts/4    00:00:00 egrep -i nova-
nobody   25857     1  0 Feb13 ?        00:01:41 /usr/sbin/dnsmasq --strict-order --bind-interfaces --conf-file= --domain=novalocal --pid-file=/var/lib/nova/networks/nova-demonetbr0.pid --listen-address=10.65.207.49 --except-interface=lo --dhcp-range=set:'net1',xx.yy.ww.zz,static,120s --dhcp-lease-max=16 --dhcp-hostsfile=/var/lib/nova/networks/nova-demonetbr0.conf --dhcp-script=/usr/bin/nova-dhcpbridge --leasefile-ro
root     25858 25857  0 Feb13 ?        00:01:10 /usr/sbin/dnsmasq --strict-order --bind-interfaces --conf-file= --domain=novalocal --pid-file=/var/lib/nova/networks/nova-demonetbr0.pid --listen-address=10.65.207.49 --except-interface=lo --dhcp-range=set:'net1',xx.yy.ww.zz,static,120s --dhcp-lease-max=16 --dhcp-hostsfile=/var/lib/nova/networks/nova-demonetbr0.conf --dhcp-script=/usr/bin/nova-dhcpbridge --leasefile-ro
nova     56853     1  0 Feb28 ?        00:00:01 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/api.log
nova     56884 56853  0 Feb28 ?        00:00:03 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/api.log
nova     56890     1  0 Feb28 ?        02:55:14 /usr/bin/python /usr/bin/nova-cert --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/cert.log
nova     56902 56853  0 Feb28 ?        00:06:30 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/api.log
nova     56907 56853  0 Feb28 ?        00:01:26 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/api.log
nova     56975     1  0 Feb28 ?        02:56:25 /usr/bin/python /usr/bin/nova-console --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/console.log
nova     57009     1  0 Feb28 ?        02:56:06 /usr/bin/python /usr/bin/nova-consoleauth --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/consoleauth.log
nova     57082     1  0 Feb28 ?        00:00:04 /usr/bin/python /usr/bin/nova-objectstore --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/objectstore.log
nova     57120     1  1 Feb28 ?        07:21:55 /usr/bin/python /usr/bin/nova-scheduler --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/scheduler.log
nova     57155     1  0 Feb28 ?        00:00:04 /usr/bin/python /usr/bin/nova-xvpvncproxy --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/xvpvncproxy.log
nova     75571     1  0 13:17 ?        00:00:50 /usr/bin/python /usr/bin/nova-cert --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/cert.log
nova     75596     1 14 13:17 ?        00:19:39 /usr/bin/python /usr/bin/nova-compute --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/compute.log
nova     75637     1  0 13:17 ?        00:00:51 /usr/bin/python /usr/bin/nova-console --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/console.log
nova     75671     1  0 13:17 ?        00:00:50 /usr/bin/python /usr/bin/nova-consoleauth --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/consoleauth.log
nova     75711     1  1 13:17 ?        00:01:59 /usr/bin/python /usr/bin/nova-network --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/network.log
nova     75729     1  0 13:17 ?        00:00:01 python /usr/bin/nova-novncproxy --web /usr/share/novnc/
#-------------#

There appears to be only 3 workers : 56853 is the daemon --  ppid1, and it has 3 corresponding chidren

Comment 9 Kashyap Chamarthy 2013-03-21 11:05:47 UTC
Investigation - round2

Still debugging with ndipanov (thanks!), why there are 7 worker threads now, instead of 5:


Meantime:

The nova API service was dead for some reason.

Get it up & running:
#------------------------#
$ sudo service openstack-nova-api status
openstack-nova-api (pid  62600) is running...
#------------------------#

Search for the process
#------------------------#
]# ps axf | grep nova-api
69054 pts/6    S+     0:00  |                   \_ grep nova-api
62600 ?        S      0:08 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile 
62607 ?        S      0:00  \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf
62634 ?        S      0:00  \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf
62635 ?        S      0:00  \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf
62636 ?        S      0:00  \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf
62638 ?        S      0:00  \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf
62644 ?        S      0:00  \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf
65389 ?        S      0:00  \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf
/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova
#------------------------#

Side note: Re-spawning seems to work, to notice the respawning:
=====
   $ kill -9 $any-child-process; ps -ef | grep nova-api
   $ ps -ef | grep nova-api
=====

Comment 10 Kashyap Chamarthy 2013-03-21 12:35:01 UTC
Finally, round-3 investigation fixed it:

Short summary:
=============
We had to turn off metadata and ec2 services, and enable only osapi_compute, and have the worker threads set to 5


Details:
=======

It's not as trivial as it appeard. Here's how Nikola helped me investigate it further and reach a conclusion.


Additional things we had to try was:

1] Change the enabled_apis to only osapi_compute
#------------------------#
$ egrep -i 'enabled_apis|workers' /etc/nova/nova.conf | grep -v ^$ | grep -v ^#
osapi_compute_workers=5
enabled_apis = osapi_compute

#------------------------#

2] Restart the nova services ; check for the nova api service
#------------------------#
# service openstack-nova-api restart
Stopping openstack-nova-api:                               [  OK  ]
Starting openstack-nova-api:                               [  OK  ]
#------------------------#
$ ps -axf | grep nova-api
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
39940 pts/6    S+     0:00  |                   \_ grep nova-api
39296 ?        S      0:01 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile 
39347 ?        S      0:00  \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf
39348 ?        S      0:00  \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf
39349 ?        S      0:00  \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf
39350 ?        S      0:00  \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf
39351 ?        S      0:00  \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf
#------------------------# 
$ ps -ef | grep -i nova-api
nova     39296     1  1 17:36 ?        00:00:15 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova
nova     39347 39296  0 17:36 ?        00:00:00 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova
nova     39348 39296  0 17:36 ?        00:00:00 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova
nova     39349 39296  0 17:36 ?        00:00:00 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova
nova     39350 39296  0 17:36 ?        00:00:00 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova
nova     39351 39296  0 17:36 ?        00:00:00 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova
root     52889 54194  2 17:54 pts/6    00:00:00 grep -i nova-api
#------------------------#


3] From the above, let's kill a worker process 39351 (which has its parent ppid as 39256)
#------------------------#
$ kill -9 39351 | grep nova-api
#------------------------#

3.1] Check if another worker thread has re-spawned:
#------------------------#
$ ps -ef | grep -i nova-api
nova     39296     1  1 17:36 ?        00:00:21 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova
nova     39347 39296  0 17:36 ?        00:00:00 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova
nova     39348 39296  0 17:36 ?        00:00:00 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova
nova     39349 39296  0 17:36 ?        00:00:00 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova
nova     39350 39296  0 17:36 ?        00:00:00 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova
nova     57441 39296  0 17:59 ?        00:00:00 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova
root     58291 54194  0 18:01 pts/6    00:00:00 grep -i nova-api

#------------------------#

4] Let's try to kill the parent process, to ensure it & its child process are shut down
#------------------------#
$ kill -9 39296
$ ps -ef | grep -i nova-api
root     59355 54194  0 18:02 pts/6    00:00:00 grep -i nova-api
#------------------------#

The above does demonstrate the desired result, from the bug description.

n-v-r and other related details are in comment #8. VERIFIED.

Comment 12 errata-xmlrpc 2013-04-04 20:21:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0709.html