Bug 843651 - libvirtd crash on startup
libvirtd crash on startup
Status: CLOSED WORKSFORME
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt (Show other bugs)
6.1
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Martin Kletzander
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-26 18:02 EDT by Charles R. Anderson
Modified: 2013-01-25 06:35 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-08-08 15:18:34 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
libvirtd.log (320.81 KB, text/plain)
2012-07-26 18:02 EDT, Charles R. Anderson
no flags Details
gdb backtrace (95.53 KB, text/plain)
2012-07-26 18:07 EDT, Charles R. Anderson
no flags Details
libvirtd.log from working fourth server (15.40 KB, text/plain)
2012-07-30 15:02 EDT, Charles R. Anderson
no flags Details
core dump (29.74 MB, application/octet-stream)
2012-07-30 15:13 EDT, Charles R. Anderson
no flags Details

  None (edit)
Description Charles R. Anderson 2012-07-26 18:02:43 EDT
Created attachment 600614 [details]
libvirtd.log

Description of problem:


Version-Release number of selected component (if applicable):
libvirt-0.9.10-21.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1. service libvirtd restart
2. service libvirtd status

Actual results:

# service libvirtd restart
Stopping libvirtd daemon:                                  [FAILED]
Starting libvirtd daemon:                                  [  OK  ]
# service libvirtd status
libvirtd dead but pid file exists

Expected results:
No crash.

Additional info:
Comment 1 Charles R. Anderson 2012-07-26 18:07:53 EDT
Created attachment 600619 [details]
gdb backtrace

Null dereference of "bitmap" here:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fbcd324c700 (LWP 31284)]
virBitmapSetBit (bitmap=0x0, b=90) at util/bitmap.c:107
107	    if (bitmap->size <= b)


Thread 13 (Thread 0x7fbcd324c700 (LWP 31284)):
#0  virBitmapSetBit (bitmap=0x0, b=90) at util/bitmap.c:107
#1  0x00000000004b260d in qemuMonitorJSONCheckCommands (
    mon=<value optimized out>, qemuCaps=0x0, json_hmp=0x7fbcd324ba6c)
---Type <return> to continue, or q <return> to quit---
    at qemu/qemu_monitor_json.c:964
#2  0x00000000004a00dc in qemuMonitorSetCapabilities (mon=0x7fbcc4000cf0, 
    qemuCaps=0x0) at qemu/qemu_monitor.c:1030
#3  0x000000000048b1f1 in qemuConnectMonitor (driver=0x7fbccc069730, 
    vm=0x7fbccc011bd0) at qemu/qemu_process.c:1131
#4  0x000000000048fec2 in qemuProcessReconnect (opaque=<value optimized out>)
    at qemu/qemu_process.c:2953
#5  0x0000003a5f458d79 in virThreadHelper (data=<value optimized out>)
    at util/threads-pthread.c:161
#6  0x0000003a4c807851 in start_thread (arg=0x7fbcd324c700)
    at pthread_create.c:301
#7  0x0000003a4c0e76dd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
Comment 3 Dave Allan 2012-07-26 22:28:00 EDT
I noticed that you set the Version to 6.1 although you have a version of libvirt from what looks like 6.3.  Is just libvirt upgraded, or the whole system?  What version of qemu is on the system?
Comment 4 Charles R. Anderson 2012-07-27 17:20:18 EDT
Sorry, it says SL 6.1 in /etc/issue, so I believe I'm still on 6.1 but maybe not?

I have four systems, three of which are exhibiting this problem.  They all updated themselves to libvirt-0.9.10-21.el6.x86_64 on July 10 around 4am, and the three that have issues crashed about 4 hours after the yum update happened, around 8am.  The fourth's libvirtd is running--I haven't tried to restart it, although it was presumably restarted during the yum update just like the others.

Actually, I know why the fourth might be working--the whole system was rebooted on July 15.  So maybe I just need to do a full reboot on these boxes?

In any case, all four have the same package versions:

libvirt-0.9.10-21.el6.x86_64
gpxe-roms-qemu-0.9.7-6.7.el6.noarch
qemu-kvm-0.12.1.2-2.209.el6_2.4.x86_64
qemu-img-0.12.1.2-2.209.el6_2.4.x86_64

I did try updating one of them to:

libvirt-0.9.10-21.el6.3.x86_64

but it still won't start up.
Comment 5 Martin Kletzander 2012-07-30 01:30:29 EDT
Could it be that there were no machines running on the fourth machine during the update? The process crashed in a part of code that tries to reconnect to the qemu monitor of a running machine. However, I went through the code and I have to say there is no visible path to this point without initialization of the accessed variable that caused the segfault.

I would like to get to the bottom of this, but without reproducing the issue it is hard to find the right cause.

Could you please generate and attach a coredump if it's possible? Are there any other differencies in the systems you described (configuration, special machines, etc.)?

Thanks, Martin
Comment 7 Charles R. Anderson 2012-07-30 15:02:09 EDT
Created attachment 601331 [details]
libvirtd.log from working fourth server

All four had running guests at the time three of them crashed.  It is possible the fourth one crashed as well, but went unnoticed until the whole system was rebooted 5 days later.  Here is the libvirtd.log from the fourth server where libvirtd didn't seem to crash.  It looks like it had been running my local build of 0.9.4-23.el6.1 (which I had originally grabbed from the VirtPreview repo so I could use virt-v2v to convert the guests from Xen to KVM), and then updated itself to 0.9.10-21.el6 on July 10.  There are no mentions of crashes in this log.  I haven't tried to manually restart libvirtd on this server.
Comment 8 Charles R. Anderson 2012-07-30 15:13:49 EDT
Created attachment 601333 [details]
core dump

Core dump generated by:

gdb libvirtd
(gdb) run

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f4d82893700 (LWP 24589)]
0x0000003d38c3d490 in virBitmapSetBit () from /usr/lib64/libvirt.so.0
(gdb) generate-core-file
Comment 9 Charles R. Anderson 2012-07-31 11:26:01 EDT
The difference between the 3 that crashed and the 4th that didn't may be that the 3 that crashed were upgraded by yum from:

Jan 19 18:39:19 Updated: libvirt-client-0.8.7-18.el6_1.1.x86_64
Jan 19 18:39:25 Updated: libvirt-0.8.7-18.el6_1.1.x86_64
Jan 19 18:39:25 Updated: libvirt-python-0.8.7-18.el6_1.1.x86_64

to:

Jul 10 03:52:32 Updated: libvirt-client-0.9.10-21.el6.x86_64
Jul 10 03:52:50 Updated: libvirt-0.9.10-21.el6.x86_64
Jul 10 03:53:13 Updated: libvirt-python-0.9.10-21.el6.x86_64

whereas the 4th one that didn't crash was upgraded from:


Dec 13 14:10:06 Updated: libvirt-client-0.9.4-23.el6.1.x86_64
Dec 13 14:10:06 Updated: libvirt-python-0.9.4-23.el6.1.x86_64
Dec 13 14:10:06 Updated: libvirt-0.9.4-23.el6.1.x86_64

to:

Jul 10 04:05:04 Updated: libvirt-client-0.9.10-21.el6.x86_64
Jul 10 04:05:26 Updated: libvirt-0.9.10-21.el6.x86_64
Jul 10 04:05:50 Updated: libvirt-python-0.9.10-21.el6.x86_64
Comment 10 yanbing du 2012-08-08 04:26:54 EDT
I used RHEL6.1 release os to test this bug.
1). yum update libvirt 
from:
libvirt-0.8.7-18.el6.x86_64.rpm
libvirt-python-0.8.7-18.el6.x86_64.rpm
libvirt-client-0.8.7-18.el6.x86_64.rpm
to:
libvirt-0.9.10-21.el6.x86_64
libvirt-python-0.9.10-21.el6.x86_64
libvirt-client-0.9.10-21.el6.x86_64

2). manually update libvirt 
from: 
libvirt-0.8.7-18.el6.x86_64.rpm
to:
libvirt-0.8.7-18.el6_1.1.x86_64
then yum update to:
libvirt-0.9.10-21.el6.x86_64

Both the two scenarios can NOT reproduce this problem, after update, libvirtd still running, and restart service work well.
Comment 11 Martin Kletzander 2012-08-08 15:18:34 EDT
I installed 6.1, created around 6 guests and I put each of them into different state (some with libvirtd running, some with libvirtd stopped) and after that I upgraded to 6.3 directly. Unfortunately, I wasn't able to reproduce this bug. This leads me to a conclusion that this might have something to do with the system being SL. I have to close it as WORKSFORME because we are really unable to reproduce this bug, but you can of course reopen it in case you have some new information and this really looks like our bug. On the other hand, you might want to try your luck with SL people.

I may offer you a workaround at least. There should be some files in /var/run/libvirt/qemu/ from which libvirt thinks they are running. If they are not, you can move/rename these files and try starting the service again. If they are running, however, you can try that as well with the addition of attaching libvirt to these running guests after that (if you want them to migrate for example). There is a 'virsh qemu-attach' command for that, but even that has some caveats.
Comment 12 Charles R. Anderson 2012-08-08 17:34:13 EDT
Ok.  For the record and anyone else who reads this, I had 5 guests on this one host, and it was running since about April 9 or so on an older kernel 2.6.32-220.7.1.el6.  I just rebooted the entire host, and it works fine now.  I still have 2 other hosts that are broken, I'll try the 'virsh qemu-attach' method first before rebooting those.  Thanks.

#uptime
 17:10:27 up 121 days,  4:27,  3 users,  load average: 0.05, 0.04, 0.15

# last reboot
reboot   system boot  2.6.32-279.1.1.e Wed Aug  8 17:21 - 17:28  (00:06)    
reboot   system boot  2.6.32-220.7.1.e Mon Apr  9 12:43 - 17:18 (121+04:35) 
reboot   system boot  2.6.32-220.2.1.e Mon Apr  9 12:39 - 12:39  (00:00)    

>ps auxw|grep qemu-kvm

qemu     10076  8.3 49.2 8728328 6006440 ?     Sl     Apr 09 10-02:26:53 /usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 8192 -smp 2,sockets=2,cores=1,threads=1 -name mock -uuid 54a46c13-666c-a4e0-e094-8a68c48aca0d -nographic -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/mock.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -drive file=/dev/vmhost1/mock-root,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/vmhost1/mock-swap,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:3e:59:c7:21,bus=pci.0,addr=0x2 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -device usb-tablet,id=input0 -incoming fd:20 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

qemu     10251 12.0 16.8 2459812 2060292 ?     Sl     Jun 19 6-01:28:09 /usr/libexec/qemu-kvm -S -M rhel6.2.0 -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -name wireless -uuid dca45b57-1779-bb32-fd47-b72dd5feb83f -nographic -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/wireless.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -drive file=/dev/vmhost1/wireless-root,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/vmhost1/wireless-swap,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=41,id=hostnet0,vhost=on,vhostfd=45 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:3e:59:06:00,bus=pci.0,addr=0x2 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

qemu     12671  0.5  7.3 1264680 891124 ?      Sl     Apr 09 14:44:23 /usr/libexec/qemu-kvm -S -M rhel6.2.0 -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name sl6test1 -uuid f58e1e9c-47c2-72b1-23fd-d4a0d4d85cd2 -nographic -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/sl6test1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -drive file=/dev/vmhost1/sl6test1-root,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/vmhost1/sl6test1-swap,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:3e:59:82:00,bus=pci.0,addr=0x2 -netdev tap,fd=27,id=hostnet1,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:16:3e:59:82:01,bus=pci.0,addr=0x6 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -device usb-tablet,id=input0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

qemu     12770  0.4  6.8 1274924 829760 ?      Sl     Apr 09 13:46:38 
/usr/libexec/qemu-kvm -S -M rhel6.2.0 -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name sl6test2 -uuid 8eb71df2-e38f-31d8-7bf9-f26ec45f5a16 -nographic -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/sl6test2.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -drive file=/dev/vmhost1/sl6test2-root,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/vmhost1/sl6test2-swap,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:3e:59:83:00,bus=pci.0,addr=0x2 -netdev tap,fd=28,id=hostnet1,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:16:3e:59:83:01,bus=pci.0,addr=0x6 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -device usb-tablet,id=input0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

qemu     12869  0.3  5.9 1264680 721172 ?      Sl     Apr 09 09:30:09 /usr/libexec/qemu-kvm -S -M rhel6.2.0 -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name sl6test3 -uuid 559e2fa4-ac72-ae95-44a6-d9204246fd44 -nographic -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/sl6test3.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -drive file=/dev/vmhost1/sl6test3-root,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/vmhost1/sl6test3-swap,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:3e:59:84:00,bus=pci.0,addr=0x2 -netdev tap,fd=29,id=hostnet1,vhost=on,vhostfd=30 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:16:3e:59:84:01,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -device usb-tablet,id=input0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
Comment 13 Charles R. Anderson 2012-08-08 17:50:44 EDT
On vmhost2:

# cd /var/run/libvirt/qemu
# ls -la
total 16
drwx------. 2 root root 4096 Aug  8 17:42 .
drwxr-xr-x. 5 root root 4096 Jul 26 17:44 ..
-rw-------. 1 root root 4612 Jul 26 17:44 vir.xml
# mv vir.xml  /root

# ls -l /etc/libvirt/qemu/vir.xml /root/vir.xml
-rw-------. 1 root root 2995 Mar  7 17:13 /etc/libvirt/qemu/vir.xml
-rw-------. 1 root root 4612 Jul 26 17:44 /root/vir.xml

# service libvirtd restart
Stopping libvirtd daemon:                                  [FAILED]
Starting libvirtd daemon:                                  [  OK  ]
# service libvirtd status
libvirtd (pid  13960) is running...
# service libvirtd status
libvirtd (pid  13960) is running...
# service libvirtd status
libvirtd (pid  13960) is running...

# virsh qemu-attach 13813
error: internal error missing index/unit/bus parameter in drive 'file=/dev/vmhost2/vir-root,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native'
Comment 14 yanbing du 2012-08-08 22:32:23 EDT
(In reply to comment #13)
> On vmhost2:
> 
> # cd /var/run/libvirt/qemu
> # ls -la
> total 16
> drwx------. 2 root root 4096 Aug  8 17:42 .
> drwxr-xr-x. 5 root root 4096 Jul 26 17:44 ..
> -rw-------. 1 root root 4612 Jul 26 17:44 vir.xml
> # mv vir.xml  /root
> 
> # ls -l /etc/libvirt/qemu/vir.xml /root/vir.xml
> -rw-------. 1 root root 2995 Mar  7 17:13 /etc/libvirt/qemu/vir.xml
> -rw-------. 1 root root 4612 Jul 26 17:44 /root/vir.xml
> 
> # service libvirtd restart
> Stopping libvirtd daemon:                                  [FAILED]
> Starting libvirtd daemon:                                  [  OK  ]
> # service libvirtd status
> libvirtd (pid  13960) is running...
> # service libvirtd status
> libvirtd (pid  13960) is running...
> # service libvirtd status
> libvirtd (pid  13960) is running...
> 
> # virsh qemu-attach 13813
> error: internal error missing index/unit/bus parameter in drive
> 'file=/dev/vmhost2/vir-root,if=none,id=drive-virtio-disk0,format=raw,
> cache=none,aio=native'
Unfortunately, there's already a bug about qemu-attach. See bug 844845 .
Comment 15 Martin Kletzander 2012-08-09 02:17:18 EDT
(In reply to comment #12)
You don't have to reboot those. It ought to be enough to stop the guests, libvirt should start after that and you can start the guests again. I suggested qemu-attach in order not to make you shutdown the guests (I don't know how big of a problem that is in your scenario), but if you can turn them off for a while, then it would be definitely better and easier.

Note You need to log in before you can comment on or make changes to this bug.