Bug 841486

Summary: [vdsm] super vdsm server leave child process defunct after create vm
Product: [oVirt] vdsm Reporter: Royce Lv <lvroyce>
Component: GeneralAssignee: Oved Ourfali <oourfali>
Status: CLOSED CURRENTRELEASE QA Contact: Petr Kubica <pkubica>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.14.0CC: abaron, adevolder, agkesos, bazulay, bugs, danken, dyasny, gklein, mgoldboi, oourfali, rbalakri, shyu, s.kieske, ybronhei, ykaul, ylavi
Target Milestone: ovirt-3.6.0-rcKeywords: Reopened
Target Release: 4.17.0Flags: rule-engine: ovirt-3.6.0+
ylavi: planning_ack+
rule-engine: devel_ack+
rule-engine: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: 3.6.0-4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-01-13 14:40:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1180864, 1181624, 1210347    
Bug Blocks:    

Description Royce Lv 2012-07-19 06:59:56 UTC
Description of problem:
After create vm/localfs, found a process in defunct states.
It's a child process of supervdsm server.

[lvroyce@localhost x86_64]$ ps -ef |grep qemu
qemu     20614 20305  0 14:45 ?        00:00:00 [python] <defunct>

qemu     20886     1  3 14:45 ?        00:00:17 /usr/bin/qemu-kvm -name vm6 -S -M pc-1.0 -cpu qemu64,-svm -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -uuid f7c1b02f-b304-4e68-8565-3659b1214c40 -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=17-1,serial=0EA0A181-50B2-11CB-BBE1-DFC7A7500304_00:21:cc:62:a6:07,uuid=f7c1b02f-b304-4e68-8565-3659b1214c40 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm6.monitor,server,nowait -mon ......


[lvroyce@localhost x86_64]$ ps -ef |grep vdsm
vdsm     20244     1  0 14:44 ?        00:00:00 /bin/bash -e /usr/share/vdsm/respawn --minlifetime 10 --daemon --masterpid /var/run/vdsm/respawn.pid /usr/share/vdsm/vdsm
vdsm     20246 20244  0 14:44 ?        00:00:03 /usr/bin/python /usr/share/vdsm/vdsm
root     20304 20246  0 14:44 ?        00:00:00 /usr/bin/sudo -n /usr/bin/python /usr/share/vdsm/supervdsmServer.py d2b71523-ce00-4495-bfa2-bd214577a32c 20246
root     20305 20304  0 14:44 ?        00:00:00 /usr/bin/python /usr/share/vdsm/supervdsmServer.py d2b71523-ce00-4495-bfa2-bd214577a32c 20246


Version-Release number of selected component (if applicable):
[lvroyce@localhost x86_64]$ rpm -q libvirt
libvirt-0.9.13-1.fc17.x86_64
[lvroyce@localhost x86_64]$ rpm -q vdsm
vdsm-4.10.0-0.185.gitb52165e.fc17.lvroyce1342680119.x86_64

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Itamar Heim 2013-02-03 12:25:15 UTC
Closing old bugs. If this issue is still relevant/important in current version, please re-open the bug.

Comment 2 Sven Kieske 2014-05-09 15:54:22 UTC
please reopen, as I have defunct supervdsm process in ovirt 3.3.3
again.

See discussion on devel:
http://lists.ovirt.org/pipermail/devel/2014-May/007289.html

Comment 3 Dan Kenigsberg 2014-05-14 12:25:44 UTC
Starting to use zombiereaper may solve this issue, but there's another one in supervdsm: it uses multiprocessing, which uses subprocess.Popen, which is known to be buggy on python2.

Please consider monkey-patching

  multiprocessing.process.Process._Popen = CPopen

before use.

Comment 4 Sven Kieske 2014-06-18 08:21:30 UTC
any progress regarding this problem?
I still see supervdsmServer defunct processes popping up.

installed version is atm:
rpm -q vdsm
vdsm-4.13.3-3.el6.x86_64

if you need any additional logs, please tell me.

is this just about replacing subprocess.Popen with CPopen?

It would be nice if someone could explain where the exact problem is, which
prevents this from getting fixed, if it's just time to crawl through vdsm
code, I'm happy to assist.

Comment 5 Dan Kenigsberg 2014-06-21 13:58:13 UTC
(In reply to Sven Kieske from comment #4)

> is this just about replacing subprocess.Popen with CPopen?

No. That's an unrelated issue that I've noticed while reading the code. Note that my suggestion for consideration is wrong, as multiprocessing.forking.Popen does not have the same API as CPopen and subprocess.Popen.

I suppose that a properly-placed zombiereaper.autoReapPID(proc.pid) would take care of your zombies.

Comment 6 Dan Kenigsberg 2014-12-02 15:25:57 UTC
We have to revert the patch from the 3.5 branch, as it makes the much more annoying Bug 1168217 more evident.

Comment 7 Yaniv Bronhaim 2015-01-12 11:33:32 UTC
Adding dependency on Bug 1180864 which its solution allows to use zombiereaper without worries in supervdsmServer. After backport multiprocessing fix to python 2.6 which allows to handle SIGCHILD interuppts,  we'll be able to merge http://gerrit.ovirt.org/#/c/28915/ back

Comment 8 Yaniv Bronhaim 2015-01-13 13:03:25 UTC
will merge when rhel 6.7 be out (See Bug 1180864). moving to 3.6

Comment 9 Dan Kenigsberg 2015-04-12 08:55:39 UTC
The former patches have been reverted when we realized that Python still had the EINTR bug. They must be re-posted.

Comment 11 Red Hat Bugzilla Rules Engine 2015-10-18 08:34:05 UTC
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.

Comment 14 Petr Kubica 2016-01-07 11:42:08 UTC
Verified in vdsm-4.17.15-0.el7ev.noarch

Comment 15 Sandro Bonazzola 2016-01-13 14:40:24 UTC
oVirt 3.6.0 has been released, closing current release