Bug 604585

Summary: [vdsm] [libvirt intg]Libvirtd dies and after a while sending an abort signal
Product: Red Hat Enterprise Linux 6 Reporter: Moran Goldboim <mgoldboi>
Component: libvirtAssignee: Daniel Berrangé <berrange>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: urgent Docs Contact:
Priority: high    
Version: 6.0CC: berrange, dallan, danken, hbrock, mjenner, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: vdsm & libvirt integration
Fixed In Version: libvirt-0_8_1-14_el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-11-11 14:47:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 581275, 599016, 609432    
Attachments:
Description Flags
core dump
none
gdb backtrace
none
libvirt log
none
backtrace
none
backtrace full none

Description Moran Goldboim 2010-06-16 10:01:39 UTC
Created attachment 424390 [details]
core dump

Description of problem:
Libvirtd dies and after a while sending an abort event, after getting: 04:37:03.691: debug : virEventInterruptLocked:668 : Interrupting
[root@gold-vdsa ~]# service libvirtd status
libvirtd dead but pid file exists
it kills vdsm 24 minutes after.

Version-Release number of selected component (if applicable):
libvirt-0.8.1-8.el6.x86_64
vdsm-4.9-8.el6.x86_64

How reproducible:
happened once

Steps to Reproduce:
1.happened during Rhevm vm sanity automation tests suite but i don't see a correlation 
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Moran Goldboim 2010-06-16 10:02:32 UTC
Created attachment 424391 [details]
gdb backtrace

Comment 3 RHEL Program Management 2010-06-16 10:13:17 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 4 Daniel Berrangé 2010-06-16 11:44:44 UTC
The stack trace & core dumps attached here are both from the VDSM python process, not libvirtd. 

Can you provide a stack trace from libvirtd itself. Attach to libvirtd with GDB while it is still running normally, and then do whatever you need to make it crash, and capture a dump with 'thread apply all backtrace'.  NB, make sure the libvirt-debuginfo is installed before running GDB.

Can you also edit libvirtd.conf and set

  log_filters="1:qemu 1:util 1:security"
  log_outputs="1:file:/var/log/libvirt/libvirtd.log"

and attach the libvirtd.log file after the crash occurs.

Finally,can you elaborate on what you mean by "libvirtd sending an abort event" ?

Comment 5 Moran Goldboim 2010-06-16 12:38:07 UTC
I meant Abort signal - will try to reproduce it again

Comment 6 Dan Kenigsberg 2010-06-21 07:54:40 UTC
Here's the tail of libvird.log after vdsm died (with abort signal while in libvirt_virEventRegisterImpl) and libvirtd died (I don't know why):

09:16:27.461: debug : qemuMonitorJSONIOProcessEvent:99 : handle BLOCK_IO_ERROR handler=0x478140 data=0x7fdce5c1a9d0
09:16:27.461: warning : qemuMonitorJSONHandleIOError:597 : missing reason in disk io error event
09:16:27.461: debug : qemuMonitorEmitIOError:849 : mon=0x7fdd0c0109f0
09:16:27.461: debug : qemuMonitorJSONIOProcess:188 : Total used 204 bytes out of 204 available in buffer
09:16:27.462: debug : qemuMonitorJSONIOProcessLine:115 : Line [{"timestamp": {"seconds": 1277100987, "microseconds": 462251}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-ide0-0-0", "__com.redhat_reason": "eother", "operation": "write", "action": "report"}}]
09:16:27.462: debug : virJSONValueFromString:901 : string={"timestamp": {"seconds": 1277100987, "microseconds": 462251}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-ide0-0-0", "__com.redhat_reason": "eother", "operation": "write", "action": "report"}}
09:16:27.462: debug : virJSONParserHandleStartMap:783 : parser=0x7fdd1a159660
09:16:27.462: debug : virJSONParserHandleMapKey:764 : parser=0x7fdd1a159660 key=0x7fdce5b63062
09:16:27.462: debug : virJSONParserHandleStartMap:783 : parser=0x7fdd1a159660
09:16:27.462: debug : virJSONParserHandleMapKey:764 : parser=0x7fdd1a159660 key=0x7fdce5b63070
09:16:27.462: debug : virJSONParserHandleNumber:723 : parser=0x7fdd1a159660 str=(null)
09:16:27.462: debug : virJSONParserHandleMapKey:764 : parser=0x7fdd1a159660 key=0x7fdce5b63087
09:16:27.462: debug : virJSONParserHandleNumber:723 : parser=0x7fdd1a159660 str=(null)
09:16:27.462: debug : virJSONParserHandleEndMap:810 : parser=0x7fdd1a159660
09:16:27.462: debug : virJSONParserHandleMapKey:764 : parser=0x7fdd1a159660 key=0x7fdce5b630a0
09:16:27.462: debug : virJSONParserHandleString:744 : parser=0x7fdd1a159660 str=0x7fdce5b630a9
09:16:27.462: debug : virJSONParserHandleMapKey:764 : parser=0x7fdd1a159660 key=0x7fdce5b630bb
09:16:27.462: debug : virJSONParserHandleStartMap:783 : parser=0x7fdd1a159660
09:16:27.462: debug : virJSONParserHandleMapKey:764 : parser=0x7fdd1a159660 key=0x7fdce5b630c4
09:16:27.462: debug : virJSONParserHandleString:744 : parser=0x7fdd1a159660 str=0x7fdce5b630ce
09:16:27.462: debug : virJSONParserHandleMapKey:764 : parser=0x7fdd1a159660 key=0x7fdce5b630e0
09:16:27.462: debug : virJSONParserHandleString:744 : parser=0x7fdd1a159660 str=0x7fdce5b630f7
09:16:27.462: debug : virJSONParserHandleMapKey:764 : parser=0x7fdd1a159660 key=0x7fdce5b63101
09:16:27.462: debug : virJSONParserHandleString:744 : parser=0x7fdd1a159660 str=0x7fdce5b6310e
09:16:27.462: debug : virJSONParserHandleMapKey:764 : parser=0x7fdd1a159660 key=0x7fdce5b63117
09:16:27.462: debug : virJSONParserHandleString:744 : parser=0x7fdd1a159660 str=0x7fdce5b63121
09:16:27.462: debug : virJSONParserHandleEndMap:810 : parser=0x7fdd1a159660
09:16:27.462: debug : virJSONParserHandleEndMap:810 : parser=0x7fdd1a159660
09:16:27.462: debug : virJSONValueFromString:931 : result=0x7fdce5c1b350
09:16:27.462: debug : qemuMonitorJSONIOProcessEvent:86 : mon=0x7fdd040d0970 obj=0x7fdce5c1b350
09:16:27.462: debug : qemuMonitorJSONIOProcessEvent:99 : handle BLOCK_IO_ERROR handler=0x478140 data=0x7fdce5c1a1a0
09:16:27.462: warning : qemuMonitorJSONHandleIOError:597 : missing reason in disk io error event
09:16:27.462: debug : qemuMonitorEmitIOError:849 : mon=0x7fdd040d0970
09:16:27.462: debug : qemuMonitorJSONIOProcess:188 : Total used 204 bytes out of 204 available in buffer
09:16:27.463: debug : qemuMonitorIO:572 : Triggering EOF callback error? 1
09:16:27.463: debug : qemuHandleMonitorEOF:1106 : Received EOF on 0x7fdcfc021000 'VmStrApiTestVm'
09:16:27.463: debug : qemudShutdownVMDaemon:4120 : Shutting down VM 'VmStrApiTestVm' migrated=0
09:16:27.466: debug : qemuMonitorJSONCommandWithFd:222 : Receive command reply ret=-1 errno=104 0 bytes '(null)'
09:16:27.466: error : qemuMonitorJSONCommandWithFd:242 : cannot send monitor command '{"execute":"query-blockstats"}': Connection reset by peer
09:16:27.466: debug : qemuMonitorClose:682 : mon=0x7fdcfc024ea0

Comment 7 Moran Goldboim 2010-06-22 09:14:36 UTC
Created attachment 425872 [details]
libvirt log

Comment 8 Moran Goldboim 2010-06-22 09:15:14 UTC
In another instance of VDSM getting an abort signal- libvirt didn't died. happened right after a create domain was started on VDSM.
No indication of it had been seen in libvirtd log (on 16:28:54,520 with- log_filters="1:qemu 1:util 1:security"), only after vdsm was up again and several other create had happened afterward libvirt generated the following errors (log is attached):
16:36:40.763: error : qemudDomainBlockStats:9596 : Domain not found: no domain with matching uuid 'e41ced35-4019-4784-a97d-f29f1f4cbf06'
16:37:31.427: error : qemuMonitorOpenUnix:277 : monitor socket did not show up.: Connection refused
16:37:31.427: error : qemuConnectMonitor:1584 : Failed to connect monitor for VmUpApiTestVm
16:37:31.427: error : qemudWaitForMonitor:2515 : internal error process exited while connecting to monitor: do_spice_init: starting 0.4.2-10..el6

Comment 9 Daniel Berrangé 2010-06-22 14:25:17 UTC
The log from comment #7 doesn't show any problem with libvirt dieing.

All that's seen is that a QEMU guest is failing to startup due to a SPICE error

16:37:28.421: debug : qemudStartVMDaemon:4017 : Waiting for monitor to show up
16:37:28.421: debug : qemudWaitForMonitor:2479 : Connect monitor to 0x7f01a000eb00 'VmUpApiTestVm'
16:37:31.427: error : qemuMonitorOpenUnix:277 : monitor socket did not show up.: Connection refused
16:37:31.427: debug : qemuMonitorClose:682 : mon=0x7f01a0003010
16:37:31.427: debug : qemuMonitorFree:200 : mon=0x7f01a0003010
16:37:31.427: error : qemuConnectMonitor:1584 : Failed to connect monitor for VmUpApiTestVm
16:37:31.427: error : qemudWaitForMonitor:2515 : internal error process exited while connecting to monitor: do_spice_init: starting 0.4.2-10..el6
do_spice_init: statistics shm_open failed, Permission denied

Comment 10 Daniel Berrangé 2010-06-22 14:31:20 UTC
Were there any SELinux AVC errors after the shm_open() call failure ?  It is possible there might need to be policy additions if SPICE is using SHM

Comment 11 Moran Goldboim 2010-06-29 12:18:49 UTC
Created attachment 427661 [details]
backtrace

bug was recreated with selinux enabled - backtrace is attached

Comment 12 Daniel Berrangé 2010-06-29 12:47:53 UTC
That was a core dump, rather than a stack trace. Unfortunately this is not any help because I can't reliably extract data from the coredump without having the exact same set of RPM versions of everything libvirt links to, as you do. Can you capture a stack trace by running this command in GDB:

 'thread apply all backtrace'

Comment 13 Moran Goldboim 2010-07-11 11:50:35 UTC
Created attachment 430987 [details]
backtrace full

added a full backtrace on all threads

Comment 14 Daniel Berrangé 2010-07-12 13:32:27 UTC
Posted patch upstream

https://www.redhat.com/archives/libvir-list/2010-July/msg00214.html

Comment 16 Dave Allan 2010-07-14 06:16:06 UTC
libvirt-0_8_1-14_el6 has been built in RHEL-6-candidate with the fix.

Dave

Comment 18 Moran Goldboim 2010-08-22 10:06:23 UTC
Verified on: libvirt-0.8.1-24.el6.x86_64

Comment 19 releng-rhel@redhat.com 2010-11-11 14:47:48 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.