Bug 604585 - [vdsm] [libvirt intg]Libvirtd dies and after a while sending an abort signal
Summary: [vdsm] [libvirt intg]Libvirtd dies and after a while sending an abort signal
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt   
(Show other bugs)
Version: 6.0
Hardware: All
OS: Linux
high
urgent
Target Milestone: rc
: ---
Assignee: Daniel Berrange
QA Contact: Virtualization Bugs
URL:
Whiteboard: vdsm & libvirt integration
Keywords:
Depends On:
Blocks: 581275 599016 Rhel6.0LibvirtTier2
TreeView+ depends on / blocked
 
Reported: 2010-06-16 10:01 UTC by Moran Goldboim
Modified: 2010-11-11 14:47 UTC (History)
6 users (show)

Fixed In Version: libvirt-0_8_1-14_el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-11-11 14:47:48 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
core dump (2.05 MB, application/octet-stream)
2010-06-16 10:01 UTC, Moran Goldboim
no flags Details
gdb backtrace (65.94 KB, text/plain)
2010-06-16 10:02 UTC, Moran Goldboim
no flags Details
libvirt log (46.96 KB, application/gzip)
2010-06-22 09:14 UTC, Moran Goldboim
no flags Details
backtrace (614.95 KB, application/octet-stream)
2010-06-29 12:18 UTC, Moran Goldboim
no flags Details
backtrace full (49.97 KB, text/plain)
2010-07-11 11:50 UTC, Moran Goldboim
no flags Details

Description Moran Goldboim 2010-06-16 10:01:39 UTC
Created attachment 424390 [details]
core dump

Description of problem:
Libvirtd dies and after a while sending an abort event, after getting: 04:37:03.691: debug : virEventInterruptLocked:668 : Interrupting
[root@gold-vdsa ~]# service libvirtd status
libvirtd dead but pid file exists
it kills vdsm 24 minutes after.

Version-Release number of selected component (if applicable):
libvirt-0.8.1-8.el6.x86_64
vdsm-4.9-8.el6.x86_64

How reproducible:
happened once

Steps to Reproduce:
1.happened during Rhevm vm sanity automation tests suite but i don't see a correlation 
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Moran Goldboim 2010-06-16 10:02:32 UTC
Created attachment 424391 [details]
gdb backtrace

Comment 3 RHEL Product and Program Management 2010-06-16 10:13:17 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 4 Daniel Berrange 2010-06-16 11:44:44 UTC
The stack trace & core dumps attached here are both from the VDSM python process, not libvirtd. 

Can you provide a stack trace from libvirtd itself. Attach to libvirtd with GDB while it is still running normally, and then do whatever you need to make it crash, and capture a dump with 'thread apply all backtrace'.  NB, make sure the libvirt-debuginfo is installed before running GDB.

Can you also edit libvirtd.conf and set

  log_filters="1:qemu 1:util 1:security"
  log_outputs="1:file:/var/log/libvirt/libvirtd.log"

and attach the libvirtd.log file after the crash occurs.

Finally,can you elaborate on what you mean by "libvirtd sending an abort event" ?

Comment 5 Moran Goldboim 2010-06-16 12:38:07 UTC
I meant Abort signal - will try to reproduce it again

Comment 6 Dan Kenigsberg 2010-06-21 07:54:40 UTC
Here's the tail of libvird.log after vdsm died (with abort signal while in libvirt_virEventRegisterImpl) and libvirtd died (I don't know why):

09:16:27.461: debug : qemuMonitorJSONIOProcessEvent:99 : handle BLOCK_IO_ERROR handler=0x478140 data=0x7fdce5c1a9d0
09:16:27.461: warning : qemuMonitorJSONHandleIOError:597 : missing reason in disk io error event
09:16:27.461: debug : qemuMonitorEmitIOError:849 : mon=0x7fdd0c0109f0
09:16:27.461: debug : qemuMonitorJSONIOProcess:188 : Total used 204 bytes out of 204 available in buffer
09:16:27.462: debug : qemuMonitorJSONIOProcessLine:115 : Line [{"timestamp": {"seconds": 1277100987, "microseconds": 462251}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-ide0-0-0", "__com.redhat_reason": "eother", "operation": "write", "action": "report"}}]
09:16:27.462: debug : virJSONValueFromString:901 : string={"timestamp": {"seconds": 1277100987, "microseconds": 462251}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-ide0-0-0", "__com.redhat_reason": "eother", "operation": "write", "action": "report"}}
09:16:27.462: debug : virJSONParserHandleStartMap:783 : parser=0x7fdd1a159660
09:16:27.462: debug : virJSONParserHandleMapKey:764 : parser=0x7fdd1a159660 key=0x7fdce5b63062
09:16:27.462: debug : virJSONParserHandleStartMap:783 : parser=0x7fdd1a159660
09:16:27.462: debug : virJSONParserHandleMapKey:764 : parser=0x7fdd1a159660 key=0x7fdce5b63070
09:16:27.462: debug : virJSONParserHandleNumber:723 : parser=0x7fdd1a159660 str=(null)
09:16:27.462: debug : virJSONParserHandleMapKey:764 : parser=0x7fdd1a159660 key=0x7fdce5b63087
09:16:27.462: debug : virJSONParserHandleNumber:723 : parser=0x7fdd1a159660 str=(null)
09:16:27.462: debug : virJSONParserHandleEndMap:810 : parser=0x7fdd1a159660
09:16:27.462: debug : virJSONParserHandleMapKey:764 : parser=0x7fdd1a159660 key=0x7fdce5b630a0
09:16:27.462: debug : virJSONParserHandleString:744 : parser=0x7fdd1a159660 str=0x7fdce5b630a9
09:16:27.462: debug : virJSONParserHandleMapKey:764 : parser=0x7fdd1a159660 key=0x7fdce5b630bb
09:16:27.462: debug : virJSONParserHandleStartMap:783 : parser=0x7fdd1a159660
09:16:27.462: debug : virJSONParserHandleMapKey:764 : parser=0x7fdd1a159660 key=0x7fdce5b630c4
09:16:27.462: debug : virJSONParserHandleString:744 : parser=0x7fdd1a159660 str=0x7fdce5b630ce
09:16:27.462: debug : virJSONParserHandleMapKey:764 : parser=0x7fdd1a159660 key=0x7fdce5b630e0
09:16:27.462: debug : virJSONParserHandleString:744 : parser=0x7fdd1a159660 str=0x7fdce5b630f7
09:16:27.462: debug : virJSONParserHandleMapKey:764 : parser=0x7fdd1a159660 key=0x7fdce5b63101
09:16:27.462: debug : virJSONParserHandleString:744 : parser=0x7fdd1a159660 str=0x7fdce5b6310e
09:16:27.462: debug : virJSONParserHandleMapKey:764 : parser=0x7fdd1a159660 key=0x7fdce5b63117
09:16:27.462: debug : virJSONParserHandleString:744 : parser=0x7fdd1a159660 str=0x7fdce5b63121
09:16:27.462: debug : virJSONParserHandleEndMap:810 : parser=0x7fdd1a159660
09:16:27.462: debug : virJSONParserHandleEndMap:810 : parser=0x7fdd1a159660
09:16:27.462: debug : virJSONValueFromString:931 : result=0x7fdce5c1b350
09:16:27.462: debug : qemuMonitorJSONIOProcessEvent:86 : mon=0x7fdd040d0970 obj=0x7fdce5c1b350
09:16:27.462: debug : qemuMonitorJSONIOProcessEvent:99 : handle BLOCK_IO_ERROR handler=0x478140 data=0x7fdce5c1a1a0
09:16:27.462: warning : qemuMonitorJSONHandleIOError:597 : missing reason in disk io error event
09:16:27.462: debug : qemuMonitorEmitIOError:849 : mon=0x7fdd040d0970
09:16:27.462: debug : qemuMonitorJSONIOProcess:188 : Total used 204 bytes out of 204 available in buffer
09:16:27.463: debug : qemuMonitorIO:572 : Triggering EOF callback error? 1
09:16:27.463: debug : qemuHandleMonitorEOF:1106 : Received EOF on 0x7fdcfc021000 'VmStrApiTestVm'
09:16:27.463: debug : qemudShutdownVMDaemon:4120 : Shutting down VM 'VmStrApiTestVm' migrated=0
09:16:27.466: debug : qemuMonitorJSONCommandWithFd:222 : Receive command reply ret=-1 errno=104 0 bytes '(null)'
09:16:27.466: error : qemuMonitorJSONCommandWithFd:242 : cannot send monitor command '{"execute":"query-blockstats"}': Connection reset by peer
09:16:27.466: debug : qemuMonitorClose:682 : mon=0x7fdcfc024ea0

Comment 7 Moran Goldboim 2010-06-22 09:14:36 UTC
Created attachment 425872 [details]
libvirt log

Comment 8 Moran Goldboim 2010-06-22 09:15:14 UTC
In another instance of VDSM getting an abort signal- libvirt didn't died. happened right after a create domain was started on VDSM.
No indication of it had been seen in libvirtd log (on 16:28:54,520 with- log_filters="1:qemu 1:util 1:security"), only after vdsm was up again and several other create had happened afterward libvirt generated the following errors (log is attached):
16:36:40.763: error : qemudDomainBlockStats:9596 : Domain not found: no domain with matching uuid 'e41ced35-4019-4784-a97d-f29f1f4cbf06'
16:37:31.427: error : qemuMonitorOpenUnix:277 : monitor socket did not show up.: Connection refused
16:37:31.427: error : qemuConnectMonitor:1584 : Failed to connect monitor for VmUpApiTestVm
16:37:31.427: error : qemudWaitForMonitor:2515 : internal error process exited while connecting to monitor: do_spice_init: starting 0.4.2-10..el6

Comment 9 Daniel Berrange 2010-06-22 14:25:17 UTC
The log from comment #7 doesn't show any problem with libvirt dieing.

All that's seen is that a QEMU guest is failing to startup due to a SPICE error

16:37:28.421: debug : qemudStartVMDaemon:4017 : Waiting for monitor to show up
16:37:28.421: debug : qemudWaitForMonitor:2479 : Connect monitor to 0x7f01a000eb00 'VmUpApiTestVm'
16:37:31.427: error : qemuMonitorOpenUnix:277 : monitor socket did not show up.: Connection refused
16:37:31.427: debug : qemuMonitorClose:682 : mon=0x7f01a0003010
16:37:31.427: debug : qemuMonitorFree:200 : mon=0x7f01a0003010
16:37:31.427: error : qemuConnectMonitor:1584 : Failed to connect monitor for VmUpApiTestVm
16:37:31.427: error : qemudWaitForMonitor:2515 : internal error process exited while connecting to monitor: do_spice_init: starting 0.4.2-10..el6
do_spice_init: statistics shm_open failed, Permission denied

Comment 10 Daniel Berrange 2010-06-22 14:31:20 UTC
Were there any SELinux AVC errors after the shm_open() call failure ?  It is possible there might need to be policy additions if SPICE is using SHM

Comment 11 Moran Goldboim 2010-06-29 12:18:49 UTC
Created attachment 427661 [details]
backtrace

bug was recreated with selinux enabled - backtrace is attached

Comment 12 Daniel Berrange 2010-06-29 12:47:53 UTC
That was a core dump, rather than a stack trace. Unfortunately this is not any help because I can't reliably extract data from the coredump without having the exact same set of RPM versions of everything libvirt links to, as you do. Can you capture a stack trace by running this command in GDB:

 'thread apply all backtrace'

Comment 13 Moran Goldboim 2010-07-11 11:50:35 UTC
Created attachment 430987 [details]
backtrace full

added a full backtrace on all threads

Comment 14 Daniel Berrange 2010-07-12 13:32:27 UTC
Posted patch upstream

https://www.redhat.com/archives/libvir-list/2010-July/msg00214.html

Comment 16 Dave Allan 2010-07-14 06:16:06 UTC
libvirt-0_8_1-14_el6 has been built in RHEL-6-candidate with the fix.

Dave

Comment 18 Moran Goldboim 2010-08-22 10:06:23 UTC
Verified on: libvirt-0.8.1-24.el6.x86_64

Comment 19 releng-rhel@redhat.com 2010-11-11 14:47:48 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.