Bug 1078590

Summary: use of tls with libvirt.so can leave zombie processes
Product: Red Hat Enterprise Linux 7 Reporter: Eric Blake <eblake>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0CC: dyuan, eblake, jdenemar, mzhan, rbalakri, tdosek, vivianzhang, ydu, zhwang
Target Milestone: rcKeywords: Upstream, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-1.2.7-1.el7 Doc Type: Bug Fix
Doc Text:
A previous update introduced an error where a SIG_SETMASK argument was incorrectly replaced by a SIG_BLOCK argument after the poll() system call. Consequently, the SIGCHLD signal could be permanently blocked, which caused signal masks not to return to their original values and defunct processes to be generated. With this update, the original signal masks are restored as intended, and poll() now functions correctly.
Story Points: ---
Clone Of: 1078589
: 1112689 (view as bug list) Environment:
Last Closed: 2015-03-05 07:32:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1078589    
Bug Blocks: 1112689    

Description Eric Blake 2014-03-19 23:08:42 UTC
Cloning to RHEL 7

+++ This bug was initially created as a clone of Bug #1078589 +++

Description of problem:
Libvirt commit 434de30 refactored the client-side tls code, but accidentally changed a SIG_SETMASK to a SIG_BLOCK when attempting to restore signals after temporarily blocking them around a poll() call.  As a result, the client can end up with SIGCHLD permanently blocked, at which point the child leaks zombie processes.

Version-Release number of selected component (if applicable):
libvirt-0.10.2-29.el6_5.5
but present all the way back to RHEL 6.2

How reproducible:
https://www.redhat.com/archives/libvir-list/2014-March/msg00858.html

Steps to Reproduce:
1. See the upstream mail thread
2.
3.

Actual results:
zombie processes leaked because SIGCHLD permanently blocked

Expected results:
no zombies, correct signal handling

Additional info:
Fixed with this patch upstream:
commit 3d4b4f5ac634c123af1981084add29d3a2ca6ab0
Author: Michal Privoznik <mprivozn>
Date:   Wed Mar 19 18:10:34 2014 +0100

    virNetClientSetTLSSession: Restore original signal mask
    
    Currently, we use pthread_sigmask(SIG_BLOCK, ...) prior to calling
    poll(). This is okay, as we don't want poll() to be interrupted.
    However, then - immediately as we fall out from the poll() - we try to
    restore the original sigmask - again using SIG_BLOCK. But as the man
    page says, SIG_BLOCK adds signals to the signal mask:
    
    SIG_BLOCK
          The set of blocked signals is the union of the current set and the set argument.
    
    Therefore, when restoring the original mask, we need to completely
    overwrite the one we set earlier and hence we should be using:
    
    SIG_SETMASK
          The set of blocked signals is set to the argument set.
    
    Signed-off-by: Michal Privoznik <mprivozn>

--- Additional comment from Eric Blake on 2014-03-19 17:07:47 MDT ---

Technically a regression from RHEL 6.1 behavior; but as it has been so long since the bug was introduced I'm not sure if it deserves a z-stream fix to 6.5 or if it can just wait for 6.6

Comment 10 vivian zhang 2014-12-08 03:30:54 UTC
verify this issue on build
libvirt-1.2.8-9.el7.x86_64
qemu-img-rhev-2.1.2-14.el7.x86_64


1. Preapre the tls env with 2 servers (one is client and the other is server)
Make sure you could remote tls from client to server
# virsh -c qemu+tls://server/system

2. Install perl-Sys-Virt-1.2.8-3.el7.x86_64 on libvirt client

3. On client, run  libvirt-perl.pl as comments 4

4.[root@client ~]# perl libvirt-perl.pl
init... pid=12300
while...
fork 1
end... pid=12301
receive chld
fork 2
end... pid=12302
receive chld
connection open
fork 3
end... pid=12303
receive chld
fork 4
end... pid=12304
receive chld
go next...
while...
fork 1
end... pid=12305
receive chld
fork 2
end... pid=12306
receive chld
connection open
fork 3
end... pid=12307
receive chld
fork 4
end... pid=12308
receive chld
go next...
while...
fork 1
end... pid=12309
receive chld
fork 2
end... pid=12310
receive chld
connection open
fork 3
end... pid=12311
receive chld
fork 4
end... pid=12312
receive chld
go next...
while...
fork 1
end... pid=12313
receive chld
fork 2
end... pid=12314
receive chld
connection open
fork 3
end... pid=12315
receive chld
fork 4
end... pid=12316
receive chld
go next...
while...
fork 1
end... pid=12317
receive chld
fork 2
end... pid=12318
receive chld
connection open
fork 3
end... pid=12320
receive chld
fork 4
end... pid=12321
receive chld
go next...
while...
fork 1
end... pid=12322
receive chld
fork 2
end... pid=12323
receive chld
connection open
fork 3
end... pid=12324
receive chld
....
4. check process, no zombie process

ps -afx |grep perl
    12300 pts/0 S+ 0:00 | \_ perl libvirt-perl.pl
    12382 pts/2 S+ 0:00 \_ grep --color=auto perl

5. repeat step 1-4 with libvirt tcp connection, got the same result
move to verified

Comment 12 errata-xmlrpc 2015-03-05 07:32:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0323.html