Bug 623166
Summary: | libvirtd daemon core dumps not working | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Moran Goldboim <mgoldboi> |
Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> |
Status: | CLOSED ERRATA | QA Contact: | Moran Goldboim <mgoldboi> |
Severity: | high | Docs Contact: | |
Priority: | low | ||
Version: | 6.1 | CC: | abaron, apevec, berrange, dallan, danken, dyuan, eblake, hateya, kxiong, xen-maint, yoyzhang |
Target Milestone: | rc | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | libvirt-0.8.7-2.el6 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2011-05-19 13:20:01 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Moran Goldboim
2010-08-11 14:01:21 UTC
This feature request has been proposed after Feature Freeze and we are unable to resolve it in time for the current Red Hat Enterprise Linux release. It has been denied for the current release and proposed for the next Red Hat Enterprise Linux release. Sorry, Moran and others, I can do it without libvirtd involvmemt. I'll set DAEMON_COREFILE_LIMIT=unlimited in /etc/sysconfig/libvirtd, and that's that. It works fine for qemu-kvm exec'ed by libvirtd, but not for libvirtd itself: `pkill -SEGV libvirtd` does not generate a core dump. Any clue why? Removing feature tag, since this should already work & thus its merely a bug if it doesn't *** Bug 625334 has been marked as a duplicate of this bug. *** In the bz which was closed as a dup some more details can be found: # libvirtd & kill -SEGV $! [1] 11144 [1]+ Segmentation fault (core dumped) libvirtd But if given enough time, libvirt no longer dumps the core: # libvirtd & sleep 3; kill -SEGV $! [1] 11567 [1]+ Segmentation fault libvirtd The first case just kills bash since exec() didn't get a chance to be called. After a while libvirtd process is executed and it doesn't generate cores. However, /proc/PID/limits still says that core file size is unlimited. The reporters are also using /var/log/core/core.%p.%t.dump core patterns in /proc/sys/kernel/core_pattern to change the location where core dumps are stored. The machine is RHEL-6.0, libvirt-0.8.1-28.el6.x86_64, kernel-2.6.32-72.el6.x86_64, SELinux was in permissive mode. Still I couldn't reproduce it locally. I can't reproduce this problem on any machine except for those in TLV showing the problem, which are all running VDSM. For any RHEL or Fedora machine of any vintage, libvirtd always generates core dumps as expected when DAEMON_COREFILE_LIMIT=unlimited is set in sysconfig, or ulimited -c unlimited. Request that the reporter tries reproducing this problem on a machine which has *never* had VSDM installed. Assuming it dumps core correctly, then install VDSM & reboot & try and reproduce it again, to see if the problem now occurs. well, results are not conclusive (tried the first part), maybe i miss some configuration. 1) clean machine (RHEL6) - install libvirt 2) set 'ulimit -c unlimited' 3) service libvirtd start 4) kill -SEGV `pgrep libvirt` 5) result: no cores! 6) rm -rf /var/run/libvirtd.pid 7) /usr/sbin/libvirtd --daemon & (run daemon from command line) 8) kill SEGV `pgrep libvirt` 9) result: core dump to current dir Daniel - where should i configure the 'DAEMON_COREFILE_LIMIT=unlimited' ? tried to put it under /etc/sysconfig/init but libvirt refused to go up. please elaborate. (In reply to comment #8) > Daniel - where should i configure the 'DAEMON_COREFILE_LIMIT=unlimited' ? > tried to put it under /etc/sysconfig/init but libvirt refused to go up. odd. but please try putting it in /etc/sysconfig/libvirtd Where are you looking for daemon cores? If /proc/sys/kernel/core_pattern is unset it should be in libvirt's cwd (which is /). Haim, did Cole's suggestions on IRC yesterday solve it for you? (In reply to comment #11) > (In reply to comment #9) > > Where are you looking for daemon cores? If /proc/sys/kernel/core_pattern is > > unset it should be in libvirt's cwd (which is /). > > Dan - tried to add it to /etc/sysconfig/libvirtd with no luck. > no cores at '/'. What about the directory where you started libvirtd from? No cores even there? core dumps with following commented out in vdsm customized libvirtd.conf #unix_sock_group="kvm" # by vdsm I have no idea why, but I hope libvirt developers will have a clue now. Just to confirm, setting unix_sock_group to any random group e.g. on my laptop: unix_sock_group = "apevec" prevents libvirtd from producing core dumps on sigsegv. WTH, you are right, I reproduced it even on my systems. Thanks a lot Alan for chasing that down. As a temporary workaround, you can do # sysctl fs.suid_dumpable=2 or # sysctl fs.suid_dumpable=1 The first case is more secure and can work only if custom core_pattern which prevents overwriting existing files is set because fs.suid_dumpable == 2 does not overwrite existing files. The second option is dangerous. I'm working on a proper fix in the meantime... libvirtd does old = getgid() setgid(unix_sock_gid); ...create unix sock... setgid(old); So we should *not* in fact be considered as as a setuid/setgid process. The kernel, however, does track the gid changes. It simply looks for any change in GID/UID, and thereafter refuses dump, even if you change back to your original UID/GID: /* dumpability changes */ if (old->euid != new->euid || old->egid != new->egid || old->fsuid != new->fsuid || old->fsgid != new->fsgid || !cap_issubset(new->cap_permitted, old->cap_permitted)) { if (task->mm) set_dumpable(task->mm, suid_dumpable); task->pdeath_signal = 0; smp_wmb(); } So I'd argue this is a kernel bug, but I doubt we'll have any luck getting that behaviour changed. The only option I can think of is to use fchgrp(sockfd) on the socket FD, instead of setgid before/after the socket calls. This is fixed upstream by v0.8.7-19-g5e5acbc: commit 5e5acbc8d67e1ac074320176bbc3682b9ba934c0 Author: Jiri Denemark <jdenemar> Date: Fri Jan 7 12:34:12 2011 +0100 daemon: Fix core dumps if unix_sock_group is set Setting unix_sock_group to something else than default "root" in /etc/libvirt/libvirtd.conf prevents system libvirtd from dumping core on crash. This is because we used setgid(unix_sock_group) before binding to /var/run/libvirt/libvirt-sock* and setgid() back to original group. However, if a process changes its effective or filesystem group ID, it will be forbidden from leaving core dumps unless fs.suid_dumpable sysctl is set to something else then 0 (and it is 0 by default). Changing socket's group ownership after bind works better. And we can do so without introducing a race condition since we loosen access rights by changing the group from root to something else. Patch sent to rhvirt-patches: http://post-office.corp.redhat.com/archives/rhvirt-patches/2011-January/msg00657.html Verified this bug pass with libvirt-0.8.7-2.el6.x86_64 - libvirt-0.8.7-2.el6.x86_64 - qemu-kvm-0.12.1.2-2.129.el6.x86_64 - 2.6.32-94.el6.x86_64 1. # echo DAEMON_COREFILE_LIMIT=unlimited >> /etc/sysconfig/libvirtd 2. Make default unix sock group settings # This is restricted to 'root' by default. #unix_sock_group = "libvirt" 3. # cat /proc/sys/kernel/core_pattern |/usr/libexec/abrt-hook-ccpp /var/spool/abrt %p %s %u %c 4. # service libvirtd restart 5. # pkill -SEGV libvirtd 6. # ls /core* /core.22317 7. Change unix sock group to 'kvm' # This is restricted to 'root' by default. #unix_sock_group = "libvirt" unix_sock_group = "kvm" 8. # service libvirtd restart 9. # pkill -SEGV libvirtd # ls /core* /core.22317 /core.22814 Also reproduced this bug with libvirt-0.8.7-1.el6.x86_64 For step9, cannot get core dump file An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0596.html |