Bug 891349
Summary: | Multi-process launching issue | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Mark McLoughlin <markmc> |
Component: | openstack-nova | Assignee: | Nikola Dipanov <ndipanov> |
Status: | CLOSED ERRATA | QA Contact: | Kashyap Chamarthy <kchamart> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 2.0 (Folsom) | CC: | eglynn, kchamart, mmagr |
Target Milestone: | snapshot5 | Keywords: | TestCaseNeeded, Triaged, VerifiedOnDev |
Target Release: | 2.1 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-nova-2012.2.3-5.el6ost | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-04-04 20:21:08 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Mark McLoughlin
2013-01-02 16:50:38 UTC
IMHO this should be backported to Folsom, but as Mark wrote, it should be tested first. Some investigation.. 1] Version info: #-------------# $ rpm -q openstack-nova ; uname -r ; arch openstack-nova-2012.2.3-7.el6ost.noarch 2.6.32-358.el6.x86_64 x86_64 #-------------# 2] Ensure the fix is in: #-------------# $ grep os.waitpid /usr/lib/python2.6/site-packages/nova/service.py -A1 -B2 # Don't block if no child processes have exited pid, status = os.waitpid(0, os.WNOHANG) if not pid: #-------------# 3] Verification procedure: 3.1] Ensure to have 5 worker processes; restart nova services: #-------------# $ sudo grep -i osapi_compute_workers=5 /etc/nova/nova.conf osapi_compute_workers=5 #-------------# $ for j in `for i in $(ls -1 /etc/init.d/openstack-nova-*) ; do $i status | grep running ; done | awk '{print $1}'` ; do service $j restart ; done Stopping openstack-nova-cert: [ OK ] Starting openstack-nova-cert: [ OK ] Stopping openstack-nova-compute: [ OK ] Starting openstack-nova-compute: [ OK ] Stopping openstack-nova-console: [ OK ] Starting openstack-nova-console: [ OK ] Stopping openstack-nova-consoleauth: [ OK ] Starting openstack-nova-consoleauth: [ OK ] Stopping openstack-nova-network: [ OK ] Starting openstack-nova-network: [ OK ] Stopping openstack-nova-novncproxy: [ OK ] Starting openstack-nova-novncproxy: [ OK ] #-------------# $ for j in `for i in $(ls -1 /etc/init.d/openstack-nova-*) ; do $i status | grep running ; done | awk '{print $1}'` ; do service $j status ; done openstack-nova-cert (pid 75571) is running... openstack-nova-compute (pid 75596) is running... openstack-nova-console (pid 75637) is running... openstack-nova-consoleauth (pid 75671) is running... openstack-nova-network (pid 75711) is running... openstack-nova-novncproxy (pid 75729) is running... #-------------# 3.2] Restart doesn't mean, they're running fine, let's also check the status: #-------------# # for j in `for i in $(ls -1 /etc/init.d/openstack-nova-*) ; do $i status | grep running ; done | awk '{print $1}'` ; do service $j status ; done openstack-nova-cert (pid 75571) is running... openstack-nova-compute (pid 75596) is running... openstack-nova-console (pid 75637) is running... openstack-nova-consoleauth (pid 75671) is running... openstack-nova-network (pid 75711) is running... openstack-nova-novncproxy (pid 75729) is running... #-------------# 3.4] Check ps -ef for nova: #-------------# [tuser1@interceptor ~(keystone_user1)]$ ps -ef | egrep -i nova- tuser1 20557 68539 0 15:31 pts/4 00:00:00 egrep -i nova- nobody 25857 1 0 Feb13 ? 00:01:41 /usr/sbin/dnsmasq --strict-order --bind-interfaces --conf-file= --domain=novalocal --pid-file=/var/lib/nova/networks/nova-demonetbr0.pid --listen-address=10.65.207.49 --except-interface=lo --dhcp-range=set:'net1',xx.yy.ww.zz,static,120s --dhcp-lease-max=16 --dhcp-hostsfile=/var/lib/nova/networks/nova-demonetbr0.conf --dhcp-script=/usr/bin/nova-dhcpbridge --leasefile-ro root 25858 25857 0 Feb13 ? 00:01:10 /usr/sbin/dnsmasq --strict-order --bind-interfaces --conf-file= --domain=novalocal --pid-file=/var/lib/nova/networks/nova-demonetbr0.pid --listen-address=10.65.207.49 --except-interface=lo --dhcp-range=set:'net1',xx.yy.ww.zz,static,120s --dhcp-lease-max=16 --dhcp-hostsfile=/var/lib/nova/networks/nova-demonetbr0.conf --dhcp-script=/usr/bin/nova-dhcpbridge --leasefile-ro nova 56853 1 0 Feb28 ? 00:00:01 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/api.log nova 56884 56853 0 Feb28 ? 00:00:03 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/api.log nova 56890 1 0 Feb28 ? 02:55:14 /usr/bin/python /usr/bin/nova-cert --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/cert.log nova 56902 56853 0 Feb28 ? 00:06:30 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/api.log nova 56907 56853 0 Feb28 ? 00:01:26 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/api.log nova 56975 1 0 Feb28 ? 02:56:25 /usr/bin/python /usr/bin/nova-console --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/console.log nova 57009 1 0 Feb28 ? 02:56:06 /usr/bin/python /usr/bin/nova-consoleauth --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/consoleauth.log nova 57082 1 0 Feb28 ? 00:00:04 /usr/bin/python /usr/bin/nova-objectstore --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/objectstore.log nova 57120 1 1 Feb28 ? 07:21:55 /usr/bin/python /usr/bin/nova-scheduler --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/scheduler.log nova 57155 1 0 Feb28 ? 00:00:04 /usr/bin/python /usr/bin/nova-xvpvncproxy --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/xvpvncproxy.log nova 75571 1 0 13:17 ? 00:00:50 /usr/bin/python /usr/bin/nova-cert --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/cert.log nova 75596 1 14 13:17 ? 00:19:39 /usr/bin/python /usr/bin/nova-compute --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/compute.log nova 75637 1 0 13:17 ? 00:00:51 /usr/bin/python /usr/bin/nova-console --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/console.log nova 75671 1 0 13:17 ? 00:00:50 /usr/bin/python /usr/bin/nova-consoleauth --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/consoleauth.log nova 75711 1 1 13:17 ? 00:01:59 /usr/bin/python /usr/bin/nova-network --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile /var/log/nova/network.log nova 75729 1 0 13:17 ? 00:00:01 python /usr/bin/nova-novncproxy --web /usr/share/novnc/ #-------------# There appears to be only 3 workers : 56853 is the daemon -- ppid1, and it has 3 corresponding chidren Investigation - round2 Still debugging with ndipanov (thanks!), why there are 7 worker threads now, instead of 5: Meantime: The nova API service was dead for some reason. Get it up & running: #------------------------# $ sudo service openstack-nova-api status openstack-nova-api (pid 62600) is running... #------------------------# Search for the process #------------------------# ]# ps axf | grep nova-api 69054 pts/6 S+ 0:00 | \_ grep nova-api 62600 ? S 0:08 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile 62607 ? S 0:00 \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf 62634 ? S 0:00 \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf 62635 ? S 0:00 \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf 62636 ? S 0:00 \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf 62638 ? S 0:00 \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf 62644 ? S 0:00 \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf 65389 ? S 0:00 \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf /nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova #------------------------# Side note: Re-spawning seems to work, to notice the respawning: ===== $ kill -9 $any-child-process; ps -ef | grep nova-api $ ps -ef | grep nova-api ===== Finally, round-3 investigation fixed it: Short summary: ============= We had to turn off metadata and ec2 services, and enable only osapi_compute, and have the worker threads set to 5 Details: ======= It's not as trivial as it appeard. Here's how Nikola helped me investigate it further and reach a conclusion. Additional things we had to try was: 1] Change the enabled_apis to only osapi_compute #------------------------# $ egrep -i 'enabled_apis|workers' /etc/nova/nova.conf | grep -v ^$ | grep -v ^# osapi_compute_workers=5 enabled_apis = osapi_compute #------------------------# 2] Restart the nova services ; check for the nova api service #------------------------# # service openstack-nova-api restart Stopping openstack-nova-api: [ OK ] Starting openstack-nova-api: [ OK ] #------------------------# $ ps -axf | grep nova-api Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ 39940 pts/6 S+ 0:00 | \_ grep nova-api 39296 ? S 0:01 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logfile 39347 ? S 0:00 \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf 39348 ? S 0:00 \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf 39349 ? S 0:00 \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf 39350 ? S 0:00 \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf 39351 ? S 0:00 \_ /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --logf #------------------------# $ ps -ef | grep -i nova-api nova 39296 1 1 17:36 ? 00:00:15 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova nova 39347 39296 0 17:36 ? 00:00:00 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova nova 39348 39296 0 17:36 ? 00:00:00 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova nova 39349 39296 0 17:36 ? 00:00:00 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova nova 39350 39296 0 17:36 ? 00:00:00 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova nova 39351 39296 0 17:36 ? 00:00:00 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova root 52889 54194 2 17:54 pts/6 00:00:00 grep -i nova-api #------------------------# 3] From the above, let's kill a worker process 39351 (which has its parent ppid as 39256) #------------------------# $ kill -9 39351 | grep nova-api #------------------------# 3.1] Check if another worker thread has re-spawned: #------------------------# $ ps -ef | grep -i nova-api nova 39296 1 1 17:36 ? 00:00:21 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova nova 39347 39296 0 17:36 ? 00:00:00 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova nova 39348 39296 0 17:36 ? 00:00:00 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova nova 39349 39296 0 17:36 ? 00:00:00 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova nova 39350 39296 0 17:36 ? 00:00:00 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova nova 57441 39296 0 17:59 ? 00:00:00 /usr/bin/python /usr/bin/nova-api --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova root 58291 54194 0 18:01 pts/6 00:00:00 grep -i nova-api #------------------------# 4] Let's try to kill the parent process, to ensure it & its child process are shut down #------------------------# $ kill -9 39296 $ ps -ef | grep -i nova-api root 59355 54194 0 18:02 pts/6 00:00:00 grep -i nova-api #------------------------# The above does demonstrate the desired result, from the bug description. n-v-r and other related details are in comment #8. VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0709.html |