915653 – libvirt: Crashes after VM is shutdown

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 915653 - libvirt: Crashes after VM is shutdown

Summary: libvirt: Crashes after VM is shutdown

Keywords:
Status:	CLOSED DUPLICATE of bug 892079
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	6.4
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Libvirt Maintainers
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-02-26 09:33 UTC by Jakub Libosvar
Modified:	2013-02-26 13:33 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-02-26 13:33:15 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
logs (362.20 KB, application/x-gzip) 2013-02-26 09:33 UTC, Jakub Libosvar	no flags	Details
libvirt log & backtrace (128.38 KB, application/x-gzip) 2013-02-26 12:10 UTC, Jakub Libosvar	no flags	Details
View All

Description Jakub Libosvar 2013-02-26 09:33:41 UTC

Created attachment 702717 [details]
logs

Description of problem:
In RHEV-M environment, when VM is shutdown libvirt crashes and isn't started by upstart which leads to failures of vdsm. Vdsm restarts itself in a loop trying to connect to libvirt socket and doesn't communicate with rhevm, therefore host is non-responsive. Happens when using NFS storage.

RHEVM calls shutdown VM:
2013-02-25 22:18:00,125 INFO  [org.ovirt.engine.core.bll.StopVmCommand] (ajp-/127.0.0.1:8702-8) [752506c0] Running command: StopVmCommand internal: false. Entities affected :  ID: 4a9044d5-31b7-4aa0-89b9-5f5175a20427 Type: VM
2013-02-25 22:18:00,130 INFO  [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (ajp-/127.0.0.1:8702-8) [752506c0] START, DestroyVmVDSCommand(HostName = 10.35.160.35, HostId = cb8f27ce-7f82-11e2-a67e-001a4a16977b, vmId=4a9044d5-31b7-4aa0-89b9-5f5175a20427, force=false, secondsToWait=0, gracefully=false), log id: 35f15c8a
2013-02-25 22:18:00,141 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ajp-/127.0.0.1:8702-8) [752506c0] START, DestroyVDSCommand(HostName = 10.35.160.35, HostId = cb8f27ce-7f82-11e2-a67e-001a4a16977b, vmId=4a9044d5-31b7-4aa0-89b9-5f5175a20427, force=false, secondsToWait=0, gracefully=false), log id: 52332986

VDSM calls shutdown VM:
Thread-4092::DEBUG::2013-02-25 22:18:00,164::BindingXMLRPC::903::vds::(wrapper) client [10.35.70.47]::call vmDestroy with ('4a9044d5-31b7-4aa0-89b9-5f5175a20427',) {} flowID [752506c0]
Thread-4092::INFO::2013-02-25 22:18:00,165::API::308::vds::(destroy) vmContainerLock acquired by vm 4a9044d5-31b7-4aa0-89b9-5f5175a20427
Thread-4092::DEBUG::2013-02-25 22:18:00,165::libvirtvm::2423::vm.Vm::(destroy) vmId=`4a9044d5-31b7-4aa0-89b9-5f5175a20427`::destroy Called
Thread-4092::INFO::2013-02-25 22:18:00,165::libvirtvm::2372::vm.Vm::(releaseVm) vmId=`4a9044d5-31b7-4aa0-89b9-5f5175a20427`::Release VM resources
Thread-4092::DEBUG::2013-02-25 22:18:00,184::utils::353::vm.Vm::(stop) vmId=`4a9044d5-31b7-4aa0-89b9-5f5175a20427`::Stop statistics collection
Thread-4092::DEBUG::2013-02-25 22:18:00,184::vmChannels::152::vds::(unregister) Delete fileno 337 from listener.
Thread-4092::WARNING::2013-02-25 22:18:15,253::libvirtvm::2389::vm.Vm::(releaseVm) vmId=`4a9044d5-31b7-4aa0-89b9-5f5175a20427`::Failed to destroy VM '4a9044d5-31b7-4aa0-89b9-5f5175a20427' gracefully
Thread-4092::ERROR::2013-02-25 22:18:55,876::libvirtconnection::93::vds::(wrapper) connection to libvirt broken. taking vdsm down.
Thread-4092::INFO::2013-02-25 22:18:56,003::vmChannels::135::vds::(stop) VM channels listener was stopped.

qemu vm log:
LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name nfsTemplVm -S -M rhel6.3.0 -cpu Westmere -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -uuid 4a9044d5-31b7-4aa0-89b9-5f5175a20427 -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=6Server-6.4.0.4.el6,serial=21886808-411e-4ba2-bc91-6ac013caa34e,uuid=4a9044d5-31b7-4aa0-89b9-5f5175a20427 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/nfsTemplVm.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2013-02-25T20:17:48,driftfix=slew -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/rhev/data-center/e3890cba-246f-45cf-ad3b-d66c6ed1575a/baa66c74-7adb-48fe-8505-a3f7fa26b6d5/images/406af0d4-837a-4e9c-b984-15bc71c09857/a033b52d-d1ba-4be2-8bba-e9e7c33a1f4a,if=none,id=drive-ide0-0-0,format=qcow2,serial=406af0d4-837a-4e9c-b984-15bc71c09857,cache=none,werror=stop,rerror=stop,aio=threads -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:91:bf,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/nfsTemplVm.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/nfsTemplVm.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice port=5900,tls-port=5901,addr=0,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -k en-us -vga qxl -global qxl-vga.vram_size=67108864 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
qemu: terminating on signal 15 from pid 2627
block I/O error in device 'drive-ide0-0-0': No medium found (123)
2013-02-25 20:18:55.515+0000: shutting down

qemu is complaining about drive-ide0-0-0 is not found but according vdsm logs image a033b52d-d1ba-4be2-8bba-e9e7c33a1f4a exists on mounted filesystem.

Then vdsm is being restarted in a loop:
MainThread::INFO::2013-02-25 22:18:56,341::vdsm::88::vds::(run) I am the actual vdsm 4.10-1.6 puma13.scl.lab.tlv.redhat.com (2.6.32-358.el6.x86_64)
MainThread::ERROR::2013-02-25 22:19:06,696::vdsm::91::vds::(run) Exception raised
Traceback (most recent call last):
  File "/usr/share/vdsm/vdsm", line 89, in run
    serve_clients(log)
  File "/usr/share/vdsm/vdsm", line 56, in serve_clients
    cif = clientIF.getInstance(log)
  File "/usr/share/vdsm/clientIF.py", line 126, in getInstance
    cls._instance = clientIF(log)
  File "/usr/share/vdsm/clientIF.py", line 74, in __init__
    self._libvirt = libvirtconnection.get()
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 116, in get
    conn = utils.retry(libvirtOpenAuth, timeout=10, sleep=0.2)
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 934, in retry
    return func()
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 102, in openAuth
    if ret is None:raise libvirtError('virConnectOpenAuth() failed')
libvirtError: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused



Version-Release number of selected component (if applicable):
vdsm-4.10.2-1.6.el6.x86_64
libvirt-0.10.2-18.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.348.el6.x86_64

How reproducible:
Sometimes


Steps to Reproduce:
1. Run VM via RHEV-M
2. Stop VM via RHEV-M
  
Actual results:
Libvirt crashes and vdsm keeps restarting itself

Expected results:
Libvirt doesn't crash and vdsm is happily running on

Additional info:
The problem occurred at 22:18:55 25th Feb near 2013

I'd suggest Zstream fix - rhel‑6.4.z ?

Fails in Jenkins Jobs: http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.1/job/3.1-export_import-nfs/289/consoleFull

Libvirt, engine, vdsm, qemu logs are attached

I see this in libvirt log but I don't understand it:

2013-02-25 22:28:48.359+0000: 2632: info : libvirt version: 0.10.2, package: 18.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2013-01-28-15:42:10, x86-003.build.bos.redhat.com)
2013-02-25 22:28:48.359+0000: 2632: debug : main:1020 : Decided on pid file path '/var/run/libvirtd.pid'
2013-02-25 22:28:48.377+0000: 2632: debug : main:1030 : Decided on socket paths '/var/run/libvirt/libvirt-sock' and '/var/run/libvirt/libvirt-sock-ro'

Comment 3 Jiri Denemark 2013-02-26 10:23:28 UTC

Since you're saying libvirtd crashed, can you get backtrace of all libvirtd threads at the time of the crash? Also, there's no sign of any crash in libvirtd.log.1 you attached. Moreover, the log looks strange, there are mostly just virObjectRef and virObjectUnref messages and there's nothing else at the time vdsm complains about broken connection (2013-02-25 22:18:55). Are you sure you provided the right logs? The timestamps in libvirtd.log are in UTC, btw.

Comment 4 Jakub Libosvar 2013-02-26 12:10:12 UTC

Created attachment 702825 [details]
libvirt log & backtrace

Forgot about that libvirt logs are in UTC, syncing the time it looks more interesting:

2013-02-25 20:18:55.848+0000: 2635: debug : qemuDomainObjBeginJobInternal:808 : Starting job: destroy (async=none)
2013-02-25 20:18:55.848+0000: 2635: debug : qemuDomainDestroyFlags:2076 : Requested operation is not valid: domain is not running
2013-02-25 20:18:55.848+0000: 2635: debug : qemuDomainObjEndJob:937 : Stopping job: destroy (async=none)
2013-02-25 20:18:55.848+0000: 2635: debug : virObjectUnref:135 : OBJECT_UNREF: obj=0x7f4264011600
2013-02-25 20:18:55.848+0000: 2635: debug : virObjectUnref:137 : OBJECT_DISPOSE: obj=0x7f4264011600
Caught Segmentation violation dumping internal log buffer:


Backtrace:Thread 1 (Thread 0x7f42974b6700 (LWP 2635)):
#0  virDomainDefFree (def=0x7f426400f760) at conf/domain_conf.c:1634
#1  0x00000031b50879ee in virDomainObjDispose (obj=0x7f4264011600) at conf/domain_conf.c:1753
#2  0x00000031b506adbb in virObjectUnref (anyobj=<value optimized out>) at util/virobject.c:139
#3  0x00000000004678b7 in qemuDomainDestroyFlags (dom=<value optimized out>, flags=<value optimized out>) at qemu/qemu_driver.c:2095
#4  0x00000031b50f5390 in virDomainDestroy (domain=0x7f428c011eb0) at libvirt.c:2190
#5  0x000000000043f892 in remoteDispatchDomainDestroy (server=<value optimized out>, client=<value optimized out>, msg=<value optimized out>, rerr=0x7f42974b5b80, 
    args=<value optimized out>, ret=<value optimized out>) at remote_dispatch.h:1277
#6  remoteDispatchDomainDestroyHelper (server=<value optimized out>, client=<value optimized out>, msg=<value optimized out>, rerr=0x7f42974b5b80, args=<value optimized out>, 
    ret=<value optimized out>) at remote_dispatch.h:1255
#7  0x00000031b513f162 in virNetServerProgramDispatchCall (prog=0x1260600, server=0x1254ff0, client=0x1262810, msg=0x1260b50) at rpc/virnetserverprogram.c:431
#8  virNetServerProgramDispatch (prog=0x1260600, server=0x1254ff0, client=0x1262810, msg=0x1260b50) at rpc/virnetserverprogram.c:304
#9  0x00000031b513fdfe in virNetServerProcessMsg (srv=<value optimized out>, client=0x1262810, prog=<value optimized out>, msg=0x1260b50) at rpc/virnetserver.c:170
#10 0x00000031b514049c in virNetServerHandleJob (jobOpaque=<value optimized out>, opaque=<value optimized out>) at rpc/virnetserver.c:191
#11 0x00000031b5062c4c in virThreadPoolWorker (opaque=<value optimized out>) at util/threadpool.c:144
#12 0x00000031b5062539 in virThreadHelper (data=<value optimized out>) at util/threads-pthread.c:161
#13 0x00000037aa407851 in start_thread () from /lib64/libpthread.so.0
#14 0x00000037aa0e890d in clone () from /lib64/libc.so.6


You can find more in the attachment

Comment 5 Jiri Denemark 2013-02-26 13:33:15 UTC


*** This bug has been marked as a duplicate of bug 892079 ***

Note You need to log in before you can comment on or make changes to this bug.