Bug 623166

Summary:	libvirtd daemon core dumps not working
Product:	Red Hat Enterprise Linux 6	Reporter:	Moran Goldboim <mgoldboi>
Component:	libvirt	Assignee:	Jiri Denemark <jdenemar>
Status:	CLOSED ERRATA	QA Contact:	Moran Goldboim <mgoldboi>
Severity:	high	Docs Contact:
Priority:	low
Version:	6.1	CC:	abaron, apevec, berrange, dallan, danken, dyuan, eblake, hateya, kxiong, xen-maint, yoyzhang
Target Milestone:	rc	Keywords:	Reopened
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	libvirt-0.8.7-2.el6	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-05-19 13:20:01 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Moran Goldboim 2010-08-11 14:01:21 UTC

Description of problem:
[libvirt]please add an option to run libvirt/qemu with core dumps enabled

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 RHEL Program Management 2010-08-11 14:18:04 UTC

This feature request has been proposed after Feature Freeze and we
are unable to resolve it in time for the current Red Hat Enterprise
Linux release. It has been denied for the current release and
proposed for the next Red Hat Enterprise Linux release.

Comment 3 Dan Kenigsberg 2010-08-11 14:47:05 UTC

Sorry, Moran and others, I can do it without libvirtd involvmemt.

I'll set
   DAEMON_COREFILE_LIMIT=unlimited

in /etc/sysconfig/libvirtd, and that's that.

It works fine for qemu-kvm exec'ed by libvirtd, but not for libvirtd itself: `pkill -SEGV libvirtd`  does not generate a core dump. Any clue why?

Comment 4 Daniel Berrangé 2010-10-20 15:16:39 UTC

Removing feature tag, since this should already work & thus its merely a bug if it doesn't

Comment 5 Dave Allan 2010-11-08 22:17:55 UTC

*** Bug 625334 has been marked as a duplicate of this bug. ***

Comment 6 Jiri Denemark 2010-12-02 14:50:06 UTC

In the bz which was closed as a dup some more details can be found:

# libvirtd & kill -SEGV $!
[1] 11144
[1]+  Segmentation fault      (core dumped) libvirtd

But if given enough time, libvirt no longer dumps the core:

# libvirtd & sleep 3; kill -SEGV $!
[1] 11567
[1]+  Segmentation fault      libvirtd


The first case just kills bash since exec() didn't get a chance to be called. After a while libvirtd process is executed and it doesn't generate cores. However, /proc/PID/limits still says that core file size is unlimited.

The reporters are also using /var/log/core/core.%p.%t.dump core patterns in /proc/sys/kernel/core_pattern to change the location where core dumps are stored.

The machine is RHEL-6.0, libvirt-0.8.1-28.el6.x86_64, kernel-2.6.32-72.el6.x86_64, SELinux was in permissive mode.

Still I couldn't reproduce it locally.

Comment 7 Daniel Berrangé 2010-12-02 15:41:37 UTC

I can't reproduce this problem on any machine except for those in TLV showing the problem, which are all running VDSM. For any RHEL or Fedora machine of any vintage, libvirtd always generates core dumps as expected when    DAEMON_COREFILE_LIMIT=unlimited is set in sysconfig, or ulimited -c unlimited.

Request that the reporter tries reproducing this problem on a machine which has *never* had VSDM installed. Assuming it dumps core correctly, then install VDSM & reboot & try and reproduce it again, to see if the problem now occurs.

Comment 8 Haim 2010-12-08 21:06:14 UTC

well, results are not conclusive (tried the first part), maybe i miss some configuration.

1) clean machine (RHEL6) - install libvirt 
2) set 'ulimit -c unlimited' 
3) service libvirtd start 
4) kill -SEGV `pgrep libvirt` 
5) result: no cores!
6) rm -rf /var/run/libvirtd.pid 
7) /usr/sbin/libvirtd --daemon & (run daemon from command line)
8) kill SEGV `pgrep libvirt`
9) result: core dump to current dir

Daniel - where should i configure the 'DAEMON_COREFILE_LIMIT=unlimited' ? 
tried to put it under /etc/sysconfig/init but libvirt refused to go up. 

please elaborate.

Comment 9 Dan Kenigsberg 2010-12-08 21:28:34 UTC

(In reply to comment #8)
> Daniel - where should i configure the 'DAEMON_COREFILE_LIMIT=unlimited' ? 
> tried to put it under /etc/sysconfig/init but libvirt refused to go up. 

odd. but please try putting it in
    /etc/sysconfig/libvirtd

Where are you looking for daemon cores? If /proc/sys/kernel/core_pattern is unset it should be in libvirt's cwd (which is /).

Comment 10 Dave Allan 2010-12-09 21:45:14 UTC

Haim, did Cole's suggestions on IRC yesterday solve it for you?

Comment 12 Jiri Denemark 2010-12-13 09:45:48 UTC

(In reply to comment #11)
> (In reply to comment #9)
> > Where are you looking for daemon cores? If /proc/sys/kernel/core_pattern is
> > unset it should be in libvirt's cwd (which is /).
> 
> Dan - tried to add it to /etc/sysconfig/libvirtd with no luck. 
> no cores at '/'. 

What about the directory where you started libvirtd from? No cores even there?

Comment 17 Alan Pevec 2011-01-06 23:11:10 UTC

core dumps with following commented out in vdsm customized libvirtd.conf

#unix_sock_group="kvm" # by vdsm

I have no idea why, but I hope libvirt developers will have a clue now.

Comment 18 Alan Pevec 2011-01-06 23:23:53 UTC

Just to confirm, setting unix_sock_group to any random group e.g. on my laptop:
unix_sock_group = "apevec"

prevents libvirtd from producing core dumps on sigsegv.

Comment 19 Jiri Denemark 2011-01-07 08:33:32 UTC

WTH, you are right, I reproduced it even on my systems. Thanks a lot Alan for chasing that down.

Comment 20 Jiri Denemark 2011-01-07 10:12:25 UTC

As a temporary workaround, you can do

# sysctl fs.suid_dumpable=2

or

# sysctl fs.suid_dumpable=1

The first case is more secure and can work only if custom core_pattern which prevents overwriting existing files is set because fs.suid_dumpable == 2 does not overwrite existing files. The second option is dangerous.

I'm working on a proper fix in the meantime...

Comment 21 Daniel Berrangé 2011-01-07 12:13:09 UTC

libvirtd does

  old = getgid()
  setgid(unix_sock_gid);
  ...create unix sock...
  setgid(old);


So we should *not* in fact be considered as as a setuid/setgid process. The kernel, however, does track the gid changes. It simply looks for any change in GID/UID, and thereafter refuses dump, even if you change back to your original UID/GID:

        /* dumpability changes */
        if (old->euid != new->euid ||
            old->egid != new->egid ||
            old->fsuid != new->fsuid ||
            old->fsgid != new->fsgid ||
            !cap_issubset(new->cap_permitted, old->cap_permitted)) {
                if (task->mm)
                        set_dumpable(task->mm, suid_dumpable);
                task->pdeath_signal = 0;
                smp_wmb();
        }


So I'd argue this is a kernel bug, but I doubt we'll have any luck getting that behaviour changed. The only option I can think of is to use  fchgrp(sockfd) on the socket FD, instead of setgid before/after the socket calls.

Comment 22 Jiri Denemark 2011-01-10 10:42:36 UTC

This is fixed upstream by v0.8.7-19-g5e5acbc:

commit 5e5acbc8d67e1ac074320176bbc3682b9ba934c0
Author: Jiri Denemark <jdenemar>
Date:   Fri Jan 7 12:34:12 2011 +0100

    daemon: Fix core dumps if unix_sock_group is set
    
    Setting unix_sock_group to something else than default "root" in
    /etc/libvirt/libvirtd.conf prevents system libvirtd from dumping core on
    crash. This is because we used setgid(unix_sock_group) before binding to
    /var/run/libvirt/libvirt-sock* and setgid() back to original group.
    However, if a process changes its effective or filesystem group ID, it
    will be forbidden from leaving core dumps unless fs.suid_dumpable sysctl
    is set to something else then 0 (and it is 0 by default).
    
    Changing socket's group ownership after bind works better. And we can do
    so without introducing a race condition since we loosen access rights by
    changing the group from root to something else.

Comment 23 Jiri Denemark 2011-01-13 21:36:43 UTC

Patch sent to rhvirt-patches:
http://post-office.corp.redhat.com/archives/rhvirt-patches/2011-January/msg00657.html

Comment 25 zhanghaiyan 2011-01-18 07:08:06 UTC

Verified this bug pass with libvirt-0.8.7-2.el6.x86_64
- libvirt-0.8.7-2.el6.x86_64
- qemu-kvm-0.12.1.2-2.129.el6.x86_64
- 2.6.32-94.el6.x86_64

1. # echo DAEMON_COREFILE_LIMIT=unlimited >>  /etc/sysconfig/libvirtd
2. Make default unix sock group settings
# This is restricted to 'root' by default.
#unix_sock_group = "libvirt"
3. # cat /proc/sys/kernel/core_pattern 
|/usr/libexec/abrt-hook-ccpp /var/spool/abrt %p %s %u %c
4. # service libvirtd restart
5. # pkill -SEGV libvirtd
6. # ls /core*
/core.22317
7. Change unix sock group to 'kvm'
# This is restricted to 'root' by default.
#unix_sock_group = "libvirt"
unix_sock_group = "kvm"
8. # service libvirtd restart
9. # pkill -SEGV libvirtd
# ls /core*
/core.22317  /core.22814

Also reproduced this bug with libvirt-0.8.7-1.el6.x86_64
For step9, cannot get core dump file

Comment 28 errata-xmlrpc 2011-05-19 13:20:01 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0596.html