Bug 1261980 - failed LSM due to connection lost with qemu process
Summary: failed LSM due to connection lost with qemu process
Keywords:
Status: CLOSED DUPLICATE of bug 1279777
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ovirt-3.6.1
: 3.6.1
Assignee: Francesco Romani
QA Contact: Aharon Canan
URL:
Whiteboard: virt
Depends On: 1279777
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-09-10 14:30 UTC by Kevin Alon Goldblatt
Modified: 2019-12-16 04:59 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1279777 (view as bug list)
Environment:
Last Closed: 2015-11-15 15:29:11 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine, vdsm, server logs (2.08 MB, application/x-gzip)
2015-09-10 15:12 UTC, Kevin Alon Goldblatt
no flags Details
new vdsm, engine. gdb and libvirt logs added (466.30 KB, application/x-gzip)
2015-10-05 20:56 UTC, Kevin Alon Goldblatt
no flags Details
vdsm.log, engine.log, gdb (1.02 MB, application/x-gzip)
2015-10-12 12:41 UTC, Natalie Gavrielov
no flags Details
engine, vdsm, server logs (902.47 KB, application/x-gzip)
2015-11-10 15:09 UTC, Kevin Alon Goldblatt
no flags Details

Description Kevin Alon Goldblatt 2015-09-10 14:30:30 UTC
Description of problem:
During I/O to disk on guest start LSM. After a minute the operation hangs and an error is reported that connection has been lost with the qemu process

Version-Release number of selected component (if applicable):
rhevm-3.6.0-0.12.master.el6.noarch
vdsm-4.17.3-1.el7ev.noarch

How reproducible:
Ran this once

Steps to Reproduce:
1. Create a Vm with a 14g nfs thin disk, run the VM and install OS
2. Start writing to the disk with DD 
3. During the I/O on the guest start the LSM >>>>> After a minute the operation hangs and an error is reported indicating that connection to the qemu process has been lost

Actual results:
Connection to the qemu process is lost during I/O to the disk and LSM of the disk


Expected results:
The LSM and should suceed and the LSM should complete

Additional info:
Engine Log:
--------------------------
2015-09-10 13:54:48,905 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-3) [] FINISH, DestroyVDSCommand, log id: 4174c7cc
2015-09-10 13:54:49,004 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-3) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM VM18 is down with error. Exit mess
age: Lost connection with qemu process.
2015-09-10 13:54:49,006 INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (ForkJoinPool-1-worker-3) [] VM '082836ca-4a6d-418e-b46f-82e36e6906af(VM18) is running in db and not running in VDS 'level1_vm01'
2015-09-10 13:54:49,034 INFO  [org.ovirt.engine.core.bll.ProcessDownVmCommand] (org.ovirt.thread.pool-7-thread-5) [55d6e382] Running command: ProcessDownVmCommand internal: true.
2015-09-10 14:11:46,582 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler_Worker-29) [6f96321] SPMAsyncTask::PollTask: Polling task 'f642551d-3ab9-4108-ad15-7468cfe9179b' (Parent Command 'LiveMigrateDisk', Para
meters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.
2015-09-10 14:11:46,595 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler_Worker-29) [6f96321] BaseAsyncTask::onTaskEndSuccess: Task 'f642551d-3ab9-4108-ad15-7468cfe9179b' (Parent Command 'LiveMigrateDisk', Par
ameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
2015-09-10 14:11:46,599 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (DefaultQuartzScheduler_Worker-29) [6f96321] CommandAsyncTask::endActionIfNecessary: All tasks of command 'ed2ec1ed-eafa-485e-934d-711fdffe17fd' has ended -
> executing 'endAction'
2015-09-10 14:11:46,599 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (DefaultQuartzScheduler_Worker-29) [6f96321] CommandAsyncTask::endAction: Ending action for '1' tasks (command ID: 'ed2ec1ed-eafa-485e-934d-711fdffe17fd'): 
calling endAction '.
2015-09-10 14:11:46,599 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-7-thread-30) [6f96321] CommandAsyncTask::endCommandAction [within thread] context: Attempting to endAction 'LiveMigrateDisk', executi
onIndex: '1'
2015-09-10 14:11:46,863 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (org.ovirt.thread.pool-7-thread-30) [3b06388e] Ending command 'org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand' successfully.
2015-09-10 14:11:46,864 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (org.ovirt.thread.pool-7-thread-30) [3b06388e] Running command: LiveMigrateDiskCommand Task handler: VmReplicateDiskFinishTaskHandler internal: false. E
ntities affected :  ID: 3242921b-9642-4137-9e36-df7c2802e887 Type: DiskAction group CONFIGURE_DISK_STORAGE with role type USER,  ID: ca046d48-3cf1-4d80-a6e0-2bf076224033 Type: StorageAction group CREATE_DISK with role type USER
2015-09-10 14:11:46,890 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (org.ovirt.thread.pool-7-thread-30) [3b06388e] Cannot get vdsManager for vdsid='00000000-0000-0000-0000-000000000000'.
2015-09-10 14:11:46,891 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (org.ovirt.thread.pool-7-thread-30) [3b06388e] CreateCommand failed: EngineException: Vds with id: 00000000-0000-0000-0000-000000000000 was not found (Failed
 with error RESOURCE_MANAGER_VDS_NOT_FOUND and code 5004)
2015-09-10 14:11:46,892 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (org.ovirt.thread.pool-7-thread-30) [3b06388e] Exception: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) [rt.jar:1.8.0_51]
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) [rt.jar:1.8.0_51]
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) [rt.jar:1.8.0_51]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:422) [rt.jar:1.8.0_51]
        at org.ovirt.engine.core.vdsbroker.ResourceManager.instantiateInjectedCommand(ResourceManager.java:432) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.ResourceManager.CreateCommand(ResourceManager.java:416) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:464) [vdsbroker.jar:]
        at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.RunVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:]
        at org.ovirt.engine.core.bll.AbstractSPMAsyncTaskHandler.runVdsCommand(AbstractSPMAsyncTaskHandler.java:48) [bll.jar:]
        at org.ovirt.engine.core.bll.lsm.VmReplicateDiskFinishTaskHandler.beforeTask(VmReplicateDiskFinishTaskHandler.java:60) [bll.jar:]
        at org.ovirt.engine.core.bll.AbstractSPMAsyncTaskHandler.execute(AbstractSPMAsyncTaskHandler.java:38) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1209) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1355) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1979) [bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:174) [utils.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:116) [utils.jar:]
        at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1392) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.internalEndSuccessfully(CommandBase.java:644) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.endActionInTransactionScope(CommandBase.java:591) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1981) [bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:174) [utils.jar:]
:



VDSM log
------------------
Thread-30::DEBUG::2015-09-10 13:54:50,110::persistentDict::192::Storage.PersistentDict::(__init__) Created a persistent dict with FileMetadataRW backend
Thread-28214::DEBUG::2015-09-10 13:54:49,972::fileSD::169::Storage.Misc.excCmd::(getReadDelay) /usr/bin/dd if=/rhev/data-center/mnt/10.35.160.108:_Storage_kev2/b0ae1c01-b71d-4ad4-a9a8-84ce1dc536f8/dom_md/metadata iflag=direct of=/dev/nul
l bs=4096 count=1 (cwd None)
Thread-28214::DEBUG::2015-09-10 13:54:50,160::fileSD::169::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n329 bytes (329 B) copied, 0.015572 s, 21.1 kB/s\n'; <rc> = 0
Thread-30::DEBUG::2015-09-10 13:54:50,198::persistentDict::234::Storage.PersistentDict::(refresh) read lines (FileMetadataRW)=['CLASS=Data', 'DESCRIPTION=nfs1', 'IOOPTIMEOUTSEC=10', 'LEASERETRIES=3', 'LEASETIMESEC=60', 'LOCKPOLICY=', 'LO
CKRENEWALINTERVALSEC=5', 'MASTER_VERSION=1', 'POOL_DESCRIPTION=dc1', 'POOL_DOMAINS=7e76fac6-3777-48e1-8533-1dce9ddff4e1:Active', 'POOL_SPM_ID=-1', 'POOL_SPM_LVER=-1', 'POOL_UUID=8cf6b080-2ca3-44ce-a0cb-f57cb0f715d2', 'REMOTE_PATH=10.35.1
60.108:/Storage/kev1', 'ROLE=Master', 'SDUUID=7e76fac6-3777-48e1-8533-1dce9ddff4e1', 'TYPE=NFS', 'VERSION=3', '_SHA_CKSUM=23f0390c1e39e94d4dd39b5d9ba4dc953dae3a11']
Thread-30::DEBUG::2015-09-10 13:54:50,210::fileSD::643::Storage.StorageDomain::(imageGarbageCollector) Removing remnants of deleted images []
Thread-30::INFO::2015-09-10 13:54:50,211::sd::442::Storage.StorageDomain::(_registerResourceNamespaces) Resource namespace 7e76fac6-3777-48e1-8533-1dce9ddff4e1_imageNS already registered
Thread-30::INFO::2015-09-10 13:54:50,211::sd::450::Storage.StorageDomain::(_registerResourceNamespaces) Resource namespace 7e76fac6-3777-48e1-8533-1dce9ddff4e1_volumeNS already registered
Thread-30::DEBUG::2015-09-10 13:54:50,225::fileSD::169::Storage.Misc.excCmd::(getReadDelay) /usr/bin/dd if=/rhev/data-center/mnt/10.35.160.108:_Storage_kev1/7e76fac6-3777-48e1-8533-1dce9ddff4e1/dom_md/metadata iflag=direct of=/dev/null b
s=4096 count=1 (cwd None)
Thread-30::DEBUG::2015-09-10 13:54:50,248::fileSD::169::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n455 bytes (455 B) copied, 0.00454271 s, 100 kB/s\n'; <rc> = 0
periodic/141::WARNING::2015-09-10 13:54:50,346::periodic::253::virt.periodic.VmDispatcher::(__call__) could not run <class 'virt.periodic.DriveWatermarkMonitor'> on [u'082836ca-4a6d-418e-b46f-82e36e6906af']
Thread-28228::DEBUG::2015-09-10 13:54:50,479::monitor::288::Storage.Monitor::(_refreshDomain) Refreshing domain eba7102c-4651-44d4-9285-2b45056710e9
Thread-28228::DEBUG::2015-09-10 13:54:50,480::persistentDict::192::Storage.PersistentDict::(__init__) Created a persistent dict with LvMetadataRW backend
Thread-28228::DEBUG::2015-09-10 13:54:50,480::blockSD::337::Storage.Misc.excCmd::(readlines) /usr/bin/dd iflag=direct skip=0 bs=2048 if=/dev/eba7102c-4651-44d4-9285-2b45056710e9/metadata count=1 (cwd None)
VM Channels Listener::DEBUG::2015-09-10 13:54:50,661::vmchannels::54::vds::(_handle_event) Received 00000019 on fileno 15
VM Channels Listener::DEBUG::2015-09-10 13:54:50,662::guestagent::218::virt.vm::(_connect) vmId=`082836ca-4a6d-418e-b46f-82e36e6906af`::Attempting connection to /var/lib/libvirt/qemu/channels/082836ca-4a6d-418e-b46f-82e36e6906af.com.redh
at.rhevm.vdsm
VM Channels Listener::DEBUG::2015-09-10 13:54:50,662::guestagent::232::virt.vm::(_connect) vmId=`082836ca-4a6d-418e-b46f-82e36e6906af`::Failed to connect to /var/lib/libvirt/qemu/channels/082836ca-4a6d-418e-b46f-82e36e6906af.com.redhat.r
hevm.vdsm with 2
periodic/137::ERROR::2015-09-10 13:54:50,663::vm::960::virt.vm::(_getExtendCandidates) vmId=`082836ca-4a6d-418e-b46f-82e36e6906af`::Unable to get watermarks for drive vda: Unable to read from monitor: Connection reset by peer
periodic/137::DEBUG::2015-09-10 13:54:50,663::executor::178::Executor::(_run) Worker was discarded
Thread-28228::DEBUG::2015-09-10 13:54:50,684::blockSD::337::Storage.Misc.excCmd::(readlines) SUCCESS: <err> = '1+0 records in\n1+0 records out\n2048 bytes (2.0 kB) copied, 0.0147833 s, 139 kB/s\n'; <rc> = 0
Thread-28228::DEBUG::2015-09-10 13:54:50,685::misc::260::Storage.Misc::(validateDDBytes) err: ['1+0 records in', '1+0 records out', '2048 bytes (2.0 kB) copied, 0.0147833 s, 139 kB/s'], size: 2048
Thread-28228::DEBUG::2015-09-10 13:54:50,685::persistentDict::234::Storage.PersistentDict::(refresh) read lines (LvMetadataRW)=[]
Thread-28228::DEBUG::2015-09-10 13:54:50,685::persistentDict::252::Storage.PersistentDict::(refresh) Empty metadata
Thread-28228::DEBUG::2015-09-10 13:54:50,685::persistentDict::192::Storage.PersistentDict::(__init__) Created a persistent dict with VGTagMetadataRW backend
Thread-28228::DEBUG::2015-09-10 13:54:50,686::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' got the operation mutex
Thread-28228::DEBUG::2015-09-10 13:54:50,686::lvm::506::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' released the operation mutex
Thread-28228::DEBUG::2015-09-10 13:54:50,686::lvm::514::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' got the operation mutex
Thread-28228::DEBUG::2015-09-10 13:54:50,687::lvm::526::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' released the operation mutex
Thread-28228::DEBUG::2015-09-10 13:54:50,687::lvm::371::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex
Thread-28228::DEBUG::2015-09-10 13:54:50,688::lvm::291::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_e
rror_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/360060160f4a03000798dd6205b34e511|/dev/mapper/360060160f4a030007b8dd6205b34e511|/dev/mapper/360060160f4a030007d8dd6205b34e511|/dev/mapper/360060160f4a03000979e521de
bece411|/dev/mapper/360060160f4a03000999e521debece411|/dev/mapper/360060160f4a030009b9e521debece411|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50
  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name eba7102c-4651-44d4-9285
-2b45056710e9 (cwd None)

Comment 1 Kevin Alon Goldblatt 2015-09-10 15:12:31 UTC
Created attachment 1072250 [details]
engine, vdsm, server logs

Added logs

Comment 2 Kevin Alon Goldblatt 2015-09-10 21:55:40 UTC
I REFINED THE TEST AND HERE ARE THE RESULTS:

1. Ran the scenario again and the error occured again - so this SPECIFICALLY happens when running dd BEFORE starting the LSM and AGAIN a second session during the SLM

2. Ran the same scenario as above running the dd session BEFORE the lsm BUT did NOT start a second dd write during the LSM and it worked FINE

3. Ran the same scenario with only installing the OS and no dd and it worked FINE!

ERROR in engine:
2015-09-10 19:36:04,935 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-9) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm20 is down with error. Exit message: Lost connection with qemu process.


Error in vdsm:
Thread-69696::DEBUG::2015-09-10 19:36:04,422::resourceManager::542::Storage.ResourceManager::(registerResource) Trying to register resource 'Storage.ca046d48-3cf1-4d80-a6e0-2bf076224033' for lock type 'shared'
Thread-69696::DEBUG::2015-09-10 19:36:04,422::resourceManager::566::Storage.ResourceManager::(registerResource) Resource 'Storage.ca046d48-3cf1-4d80-a6e0-2bf076224033' found in shared state and queue is empty, Joining current shared lock
 (2 active users)
Thread-69696::DEBUG::2015-09-10 19:36:04,423::resourceManager::238::Storage.ResourceManager.Request::(grant) ResName=`Storage.ca046d48-3cf1-4d80-a6e0-2bf076224033`ReqID=`ce5f0cdf-a7e1-4a68-9286-6611ec296cc9`::Granted request
Thread-69696::DEBUG::2015-09-10 19:36:04,423::task::827::Storage.TaskManager.Task::(resourceAcquired) Task=`004a38fa-7909-4371-872b-115ea7d845a4`::_resourcesAcquired: Storage.ca046d48-3cf1-4d80-a6e0-2bf076224033 (shared)
Thread-69696::DEBUG::2015-09-10 19:36:04,423::task::993::Storage.TaskManager.Task::(_decref) Task=`004a38fa-7909-4371-872b-115ea7d845a4`::ref 1 aborting False
Thread-69696::DEBUG::2015-09-10 19:36:04,424::resourceManager::198::Storage.ResourceManager.Request::(__init__) ResName=`ca046d48-3cf1-4d80-a6e0-2bf076224033_imageNS.ec548166-bfec-4fd1-9eeb-d81e3fae2348`ReqID=`27d27c49-5ac6-43cc-aa04-3bd
1ea5cd526`::Request was made in '/usr/share/vdsm/storage/sp.py' line '1915' at 'setVolumeDescription'
Thread-69696::DEBUG::2015-09-10 19:36:04,424::resourceManager::542::Storage.ResourceManager::(registerResource) Trying to register resource 'ca046d48-3cf1-4d80-a6e0-2bf076224033_imageNS.ec548166-bfec-4fd1-9eeb-d81e3fae2348' for lock type
 'exclusive'
Thread-69696::DEBUG::2015-09-10 19:36:04,425::lvm::428::Storage.OperationMutex::(_reloadlvs) Operation 'lvm reload operation' got the operation mutex
Thread-69696::DEBUG::2015-09-10 19:36:04,425::lvm::291::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /usr/sbin/lvm lvs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_e
rror_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/360060160f4a03000798dd6205b34e511|/dev/mapper/360060160f4a030007b8dd6205b34e511|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_fo
r_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags ca046d48-3cf1-4d80-a6e0-2bf07622
4033 (cwd None)
periodic/255::WARNING::2015-09-10 19:36:04,905::periodic::253::virt.periodic.VmDispatcher::(__call__) could not run <class 'virt.periodic.DriveWatermarkMonitor'> on [u'169380d1-5def-4942-8a31-f42ffa9c9ba3']
mailbox.SPMMonitor::DEBUG::2015-09-10 19:36:06,192::storage_mailbox::735::Storage.Misc.excCmd::(_checkForMail) dd if=/rhev/data-center/8cf6b080-2ca3-44ce-a0cb-f57cb0f715d2/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000 (
cwd None)
mailbox.SPMMonitor::DEBUG::2015-09-10 19:36:06,244::storage_mailbox::735::Storage.Misc.excCmd::(_checkForMail) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0368906 s, 27.8 MB/s\n'; <rc> = 0
VM Channels Listener::DEBUG::2015-09-10 19:36:06,677::vmchannels::54::vds::(_handle_event) Received 00000019 on fileno 15
VM Channels Listener::DEBUG::2015-09-10 19:36:06,677::guestagent::218::virt.vm::(_connect) vmId=`169380d1-5def-4942-8a31-f42ffa9c9ba3`::Attempting connection to /var/lib/libvirt/qemu/channels/169380d1-5def-4942-8a31-f42ffa9c9ba3.com.redh
at.rhevm.vdsm
VM Channels Listener::DEBUG::2015-09-10 19:36:06,678::guestagent::232::virt.vm::(_connect) vmId=`169380d1-5def-4942-8a31-f42ffa9c9ba3`::Failed to connect to /var/lib/libvirt/qemu/channels/169380d1-5def-4942-8a31-f42ffa9c9ba3.com.redhat.r
hevm.vdsm with 111
periodic/251::ERROR::2015-09-10 19:36:06,892::vm::960::virt.vm::(_getExtendCandidates) vmId=`169380d1-5def-4942-8a31-f42ffa9c9ba3`::Unable to get watermarks for drive vda: Unable to read from monitor: Connection reset by peer
periodic/251::DEBUG::2015-09-10 19:36:06,892::executor::178::Executor::(_run) Worker was discarded
libvirtEventLoop::DEBUG::2015-09-10 19:36:06,894::vm::4415::virt.vm::(onLibvirtLifecycleEvent) vmId=`169380d1-5def-4942-8a31-f42ffa9c9ba3`::event Stopped detail 5 opaque None
libvirtEventLoop::INFO::2015-09-10 19:36:06,894::vm::807::virt.vm::(_onQemuDeath) vmId=`169380d1-5def-4942-8a31-f42ffa9c9ba3`::underlying process disconnected
libvirtEventLoop::INFO::2015-09-10 19:36:06,895::vm::3687::virt.vm::(releaseVm) vmId=`169380d1-5def-4942-8a31-f42ffa9c9ba3`::Release VM resources
libvirtEventLoop::DEBUG::2015-09-10 19:36:06,897::vmchannels::214::vds::(unregister) Delete fileno 15 from listener.
libvirtEventLoop::DEBUG::2015-09-10 19:36:06,897::task::595::Storage.TaskManager.Task::(_updateState) Task=`cedaa805-48d0-42e3-b98d-b36d3ce66c11`::moving from state init -> state preparing
libvirtEventLoop::INFO::2015-09-10 19:36:06,898::logUtils::48::dispatcher::(wrapper) Run and protect: teardownImage(sdUUID=u'7e76fac6-3777-48e1-8533-1dce9ddff4e1', spUUID=u'8cf6b080-2ca3-44ce-a0cb-f57cb0f715d2', imgUUID=u'1f7c93ee-a26a-4a24-8607-d9c36b58ecb9', volUUID=None)

Comment 3 Allon Mureinik 2015-09-13 08:05:51 UTC
Daniel - please take a look.

Kevin - what versions of qmeu and libvirt are you using?

Comment 4 Markus Stockhausen 2015-09-13 17:48:32 UTC
Maybe related: We saw similar behaviour during Live Merge. QEMU got unresponsive round about 30 seconds after start of action because of uninterruptible bitmap task. More details in BZ1262251

Comment 5 Nir Soffer 2015-09-15 21:19:54 UTC
Thanks Markus!

Kevin, can this be another issue related to the qemu patch Markus mention 
in https://bugzilla.redhat.com/show_bug.cgi?id=1262251#c13 ?

Comment 6 Kevin Wolf 2015-09-16 07:57:07 UTC
I think that's unlikely. What Markus reported stalls guest I/O while starting the
mirror block job (when it checks which parts of the image are allocated and must
be copied), but not when it's active and mirroring new data. In contrast, here the
problem seems to only occur when you write new data while it's active (as Kevin
described in comment 2). This is after the point where that patch could have an
effect.

Comment 7 Kevin Alon Goldblatt 2015-09-16 08:59:39 UTC
(In reply to Allon Mureinik from comment #3)
> Daniel - please take a look.
> 
> Kevin - what versions of qmeu and libvirt are you using?

qemu 
----------------------
qemu-kvm-rhev-2.3.0-19.el7.x86_64
libvirt-daemon-driver-qemu-1.2.17-5.el7.x86_64
qemu-kvm-tools-rhev-2.3.0-19.el7.x86_64
qemu-img-rhev-2.3.0-19.el7.x86_64
qemu-kvm-common-rhev-2.3.0-19.el7.x86_64
ipxe-roms-qemu-20130517-7.gitc4bce43.el7.noarch

libvirt
---------------------
libvirt-daemon-driver-qemu-1.2.17-5.el7.x86_64
libvirt-daemon-1.2.17-5.el7.x86_64
libvirt-daemon-driver-secret-1.2.17-5.el7.x86_64
libvirt-daemon-driver-network-1.2.17-5.el7.x86_64
libvirt-daemon-kvm-1.2.17-5.el7.x86_64
libvirt-client-1.2.17-5.el7.x86_64
libvirt-daemon-driver-nwfilter-1.2.17-5.el7.x86_64
libvirt-daemon-config-nwfilter-1.2.17-5.el7.x86_64
libvirt-daemon-driver-nodedev-1.2.17-5.el7.x86_64
libvirt-daemon-driver-interface-1.2.17-5.el7.x86_64
libvirt-daemon-driver-storage-1.2.17-5.el7.x86_64
libvirt-python-1.2.17-2.el7.x86_64
libvirt-lock-sanlock-1.2.17-5.el7.x86_64

Comment 8 Yaniv Kaul 2015-09-20 13:27:19 UTC
Aharon - what is the frequency of this issue? Additional repro (comment 2) seems to suggest it is happening in more specific scenarios - so why is it blocking testing the above BZs?

Let's try to find more issues in the other features and not hold up because of this one, if possible.

Comment 9 Daniel Erez 2015-09-20 14:07:05 UTC
Hi Kevin,

According to the described scenario, connection to qemu process is lost while dd to the guest during LSM [1]. What additional data is needed to determine whether there's an issue in qemu side (rather than vdsm)?

Thanks!,
Daniel

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1261980#c2
("Message: VM vm20 is down with error. Exit message: Lost connection with qemu process.")

Comment 10 Yaniv Kaul 2015-09-20 14:09:56 UTC
If it's an easily reproducible scenario:
1. What about the qemu log?
2. Please have Virt-QE reproduce it.
3. Attach GDB to the QEMU process (need to be quick - or VDSM will think the VM is not responsive!) and get a nice stack when it blows up.

Comment 11 Aharon Canan 2015-09-21 12:45:31 UTC
(In reply to Yaniv Kaul from comment #8)
> Aharon - what is the frequency of this issue? Additional repro (comment 2)
> seems to suggest it is happening in more specific scenarios - so why is it
> blocking testing the above BZs?
> 
> Let's try to find more issues in the other features and not hold up because
> of this one, if possible.

It happens every time when running steps from comment #2.
I blocked the tracker bugs because it looks very basic to me - we can discuss it and decide if it is not.

Kevin already completed running all other cases so other than that we are good.

Kevin will answer comment #10 asap.

Comment 12 Kevin Wolf 2015-09-28 11:16:18 UTC
Yaniv already answered for me in comment 10, and I have nothing to add at this
point, so I'm just unsetting my needinfo.

Comment 13 Kevin Alon Goldblatt 2015-10-05 20:33:35 UTC
(In reply to Yaniv Kaul from comment #10)
> If it's an easily reproducible scenario:
> 1. What about the qemu log?
> 2. Please have Virt-QE reproduce it.
> 3. Attach GDB to the QEMU process (need to be quick - or VDSM will think the
> VM is not responsive!) and get a nice stack when it blows up.

Ran this scenario again. This time connection to the VM was lost during the first write and LSM as follows:


Steps to Reproduce:
1. Create a Vm with a 14g nfs thin disk, run the VM and install OS
2. Start writing to the disk with DD 
3. During the I/O on the guest start the LSM >>>>> After a minute the operation hangs and an error is reported indicating that connection to the qemu process has been lost

Actual results:
Connection to the qemu process is lost during I/O to the disk and LSM of the disk


Expected results:
The LSM should suceed and the LSM should complete

Comment 14 Kevin Alon Goldblatt 2015-10-05 20:54:22 UTC
Engine log from previous run:

2015-10-05 22:09:41,935 INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-60) [4724dd39] VM 'f21b318b-2118-4980-95eb-13f68e689eb2'(vm24) moved from 'Up' --> 'NotResponding'
2015-10-05 22:09:41,981 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-60) [4724dd39] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm24 is not responding.


vdsm log:

ll bs=4096 count=1 (cwd None)
Thread-34::DEBUG::2015-10-05 19:10:01,980::fileSD::169::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n455 bytes (455 B) copied, 0.000604022 s, 753 kB/s\n'; <rc> = 0
Thread-344277::ERROR::2015-10-05 19:10:02,171::vm::3127::virt.vm::(snapshot) vmId=`f21b318b-2118-4980-95eb-13f68e689eb2`::Unable to take snapshot
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 3125, in snapshot
    self._dom.snapshotCreateXML(snapxml, snapFlags)

Comment 15 Kevin Alon Goldblatt 2015-10-05 20:56:31 UTC
Created attachment 1080078 [details]
new vdsm, engine. gdb and libvirt logs added

Added logs

Comment 16 Tal Nisan 2015-10-07 13:42:14 UTC
Kevin, any insights given the new information?

Comment 17 Aharon Canan 2015-10-07 13:48:25 UTC
Following my discussion with Allon and Yaniv, 
As this doesn't happen only on mixed LSM and as it is blocker bug to the version, 

Removing it from blocking the mixed LSM features as those were verified and working.

Comment 18 Kevin Wolf 2015-10-08 09:25:00 UTC
(In reply to Tal Nisan from comment #16)
> Kevin, any insights given the new information?

The uploadde archive contains a file qemu_trace_05_10_2015 that has the output
of a gdb session, however the only command used there was 'quit', so it's not
very helpful. 'thread apply all bt full' is what I would need at the very least
to get any clue about what qemu is doing.

Also, I don't really know yet what the "connection lost" message in VDSM
actually means in detail. Can someone explain? Is this about a monitor socket
being closed, or a command timing out or anything like that? And is libvirt
involved or is VDSM talking directly to qemu there?

Comment 19 Nir Soffer 2015-10-08 11:05:34 UTC
(In reply to Kevin Wolf from comment #18)
> Also, I don't really know yet what the "connection lost" message in VDSM
> actually means in detail. Can someone explain? Is this about a monitor socket
> being closed, or a command timing out or anything like that? And is libvirt
> involved or is VDSM talking directly to qemu there?

We use libvirt to preform virDomainBlockCopy (or virDomainBlockRebase if block copy is not supported, but it should be in libvirt 1.2.17).

The "connection lost" message is this:

1. We fail to get watermarks for drive vda (using virDomainBlockInfo, need to improve this log message):

periodic/137::ERROR::2015-09-10 13:54:50,663::vm::960::virt.vm::(_getExtendCandidates) vmId=`082836ca-4a6d-418e-b46f-82e36e6906af`::Unable to get watermarks for drive vda: Unable to read from monitor: Connection reset by peer

2. Immediately after that:

libvirtEventLoop::DEBUG::2015-09-10 13:54:50,859::vm::4415::virt.vm::(onLibvirtLifecycleEvent) vmId=`082836ca-4a6d-418e-b46f-82e36e6906af`::event Stopped detail 5 opaque None
libvirtEventLoop::INFO::2015-09-10 13:54:50,859::vm::807::virt.vm::(_onQemuDeath) vmId=`082836ca-4a6d-418e-b46f-82e36e6906af`::underlying process disconnected
 
This can be seen in the first vdsm log (attachment 1072250 [details])

I suggest to open a separate qemu bug for this issue and continue there.
Hopefully we can create a simple reproducer using virsh.

In second log the operation fails when trying to create a snapshot, and it
does not seems related to qemu, looks like libvirt issue:

1. We try to take a snapshot:

Thread-344277::INFO::2015-10-05 19:09:02,161::vm::3112::virt.vm::(snapshot) vmId=`f21b318b-2118-4980-95eb-13f68e689eb2`::<domainsnapshot>
        <disks>
                <disk name="vda" snapshot="external" type="file">
                        <source file="/rhev/data-center/8cf6b080-2ca3-44ce-a0cb-f57cb0f715d2/7e76fac6-3777-48e1-8533-1dce9ddff4e1/images/e0f320d0-ae7a-47cd-a027-c6b297f78535/07
f25c30-4a01-49ae-9e4a-6daa79c48171" type="file"/>
                </disk>
        </disks>
</domainsnapshot>

2. We freeze the guest file systems

Thread-344277::INFO::2015-10-05 19:09:02,161::vm::2889::virt.vm::(freeze) vmId=`f21b318b-2118-4980-95eb-13f68e689eb2`::Freezing guest filesystems

3. We failed to freeze (timeout)

Thread-344277::WARNING::2015-10-05 19:09:32,167::vm::2894::virt.vm::(freeze) vmId=`f21b318b-2118-4980-95eb-13f68e689eb2`::Unable to freeze guest filesystems: Timed out during operation: cannot acquire state change lock (held by remoteDispatchConnectGetAllDomainStats)

Since generally libvirt error reporting is not specific enough, we assume
that this failure means there is no guest agent on the host, so we continue
without freezing.

4. We try to take a snapshot (again, fails with timeout)

Thread-344277::ERROR::2015-10-05 19:10:02,171::vm::3127::virt.vm::(snapshot) vmId=`f21b318b-2118-4980-95eb-13f68e689eb2`::Unable to take snapshot
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 3125, in snapshot
    self._dom.snapshotCreateXML(snapxml, snapFlags)
  File "/usr/share/vdsm/virt/vm.py", line 214, in f
    raise toe
TimeoutError: Timed out during operation: cannot acquire state change lock (held by remoteDispatchConnectGetAllDomainStats)

5. We thaw guest file systems (always done if we tried to freeze, so we never leave guest in freezed state)

Thread-344277::INFO::2015-10-05 19:10:02,277::vm::2912::virt.vm::(thaw) vmId=`f21b318b-2118-4980-95eb-13f68e689eb2`::Thawing guest filesystems

6. thaw fail also with timeout

Thread-344277::WARNING::2015-10-05 19:10:32,278::vm::2917::virt.vm::(thaw) vmId=`f21b318b-2118-4980-95eb-13f68e689eb2`::Unable to thaw guest filesystems: Timed out during operation: cannot acquire state change lock (held by remoteDispatchConnectGetAllDomainStats)

This may be an issue with libvirt but it seems like regression in our
sampling code introduced during 3.6 development.

Francesco, would you open a virt bug for this and work with kgoldbla on
this issue?

Comment 20 Natalie Gavrielov 2015-10-12 12:41:40 UTC
Created attachment 1081998 [details]
vdsm.log, engine.log, gdb

I believe I have the same problem when running LSM.

Attached:
vdsm.log
engine.log
and gdb with "thread apply all bt full" - which I'm not sure if it's any good.

Versions:
vdsm-4.17.8-1.el7ev.noarch
rhevm-3.6.0-0.18.el6.noarch

qemu:

qemu-kvm-common-rhev-2.3.0-29.el7.x86_64
qemu-kvm-rhev-2.3.0-29.el7.x86_64
libvirt-daemon-driver-qemu-1.2.17-5.el7.x86_64
qemu-img-rhev-2.3.0-29.el7.x86_64
qemu-kvm-tools-rhev-2.3.0-29.el7.x86_64
ipxe-roms-qemu-20130517-7.gitc4bce43.el7.noarch

libvirt:

libvirt-daemon-kvm-1.2.17-5.el7.x86_64
libvirt-daemon-driver-nwfilter-1.2.17-5.el7.x86_64
libvirt-daemon-driver-lxc-1.2.17-5.el7.x86_64
libvirt-daemon-driver-storage-1.2.17-5.el7.x86_64
libvirt-lock-sanlock-1.2.17-5.el7.x86_64
libvirt-daemon-driver-network-1.2.17-5.el7.x86_64
libvirt-daemon-driver-qemu-1.2.17-5.el7.x86_64
libvirt-daemon-driver-interface-1.2.17-5.el7.x86_64
libvirt-daemon-1.2.17-5.el7.x86_64
libvirt-daemon-config-network-1.2.17-5.el7.x86_64
libvirt-daemon-driver-secret-1.2.17-5.el7.x86_64
libvirt-python-1.2.17-2.el7.x86_64
libvirt-client-1.2.17-5.el7.x86_64
libvirt-daemon-config-nwfilter-1.2.17-5.el7.x86_64
libvirt-daemon-driver-nodedev-1.2.17-5.el7.x86_64
libvirt-1.2.17-5.el7.x86_64

Relevant time-line in logs, starting @ 13:24

Comment 21 Kevin Alon Goldblatt 2015-10-13 14:37:42 UTC
(In reply to Kevin Wolf from comment #18)
> (In reply to Tal Nisan from comment #16)
> > Kevin, any insights given the new information?
> 
> The uploadde archive contains a file qemu_trace_05_10_2015 that has the
> output
> of a gdb session, however the only command used there was 'quit', so it's not
> very helpful. 'thread apply all bt full' is what I would need at the very
> least
> to get any clue about what qemu is doing.
> 
> Also, I don't really know yet what the "connection lost" message in VDSM
> actually means in detail. Can someone explain? Is this about a monitor socket
> being closed, or a command timing out or anything like that? And is libvirt
> involved or is VDSM talking directly to qemu there?

See comment 20 for the gdb trace.

Comment 22 Kevin Wolf 2015-10-14 09:38:28 UTC
This qemu process looks alive. Can you try whether it's possible to attach a
new monitor connection to the running process? I guess libvirt can't do that,
but you should be able to manually connect to the socket.

Comment 23 Francesco Romani 2015-10-16 15:39:20 UTC
(In reply to Nir Soffer from comment #19)
> (In reply to Kevin Wolf from comment #18)
> > Also, I don't really know yet what the "connection lost" message in VDSM
> > actually means in detail. Can someone explain? Is this about a monitor socket
> > being closed, or a command timing out or anything like that? And is libvirt
> > involved or is VDSM talking directly to qemu there?

> Francesco, would you open a virt bug for this and work with kgoldbla on
> this issue?

I'll be more than happy to work on this (and to find a simple reproducer)

So far, it seems to share the same root cause of https://bugzilla.redhat.com/show_bug.cgi?id=1250839

Will now properly review the logs, and file a new bug if this is not the case.

Comment 24 Daniel Erez 2015-10-26 19:18:00 UTC
(In reply to Francesco Romani from comment #23)
> (In reply to Nir Soffer from comment #19)
> > (In reply to Kevin Wolf from comment #18)
> > > Also, I don't really know yet what the "connection lost" message in VDSM
> > > actually means in detail. Can someone explain? Is this about a monitor socket
> > > being closed, or a command timing out or anything like that? And is libvirt
> > > involved or is VDSM talking directly to qemu there?
> 
> > Francesco, would you open a virt bug for this and work with kgoldbla on
> > this issue?
> 
> I'll be more than happy to work on this (and to find a simple reproducer)
> 
> So far, it seems to share the same root cause of
> https://bugzilla.redhat.com/show_bug.cgi?id=1250839
> 
> Will now properly review the logs, and file a new bug if this is not the
> case.

Thanks Francesco! Please let us know if there's any new outcome.
Adding needinfo for visibility..

Comment 25 Allon Mureinik 2015-10-28 14:08:30 UTC
Francesco - thanks!
Reassigning so it's easier to track this. Please send it back Daniel's way if it's more of a storage issue than a virt one.

And once again - thanks!

Comment 26 Francesco Romani 2015-11-02 18:49:00 UTC
I finally managed to find some time to play with this.

TL;DR: Kevin (Goldblatt), can you please check if this still fails with qemu-kvm-rhev >= 2.3.0-28?

+++

First, I wanted to reproduce locally the qemu-kvm-rhev being stuck.

I had a RHEL7.2 box with quite recent packages, and run the reproducer much like description:
- 3.6.0-ish (but Engine is irrelevant here, VDSM not yet into play) environment with NFS and ISCSI storage domains
- created VM on NFS storage, thin provisioned, 14 GB disk
- booted VM using live image, started brutal dd if=/dev/urandom of=/dev/vda oflag=direct. Please note that this *is* different from description, but at qemu side should not make any difference, it is just I/O on disk
- after something like 20-30s, started Live Storage Migration
- qemu was responsive all along (inspected using ps, virsh, vdsClient, through spice console)

after few minutes, stopped dd inside guest, stopped vm, removed snapshot and disk, all succeeded.

But I noticed that I was running a much more recent qemu-kvm-*EV* build. Checking the differences, I found some interesting patches to fix bugs in the drive-mirror area, like
https://bugzilla.redhat.com/show_bug.cgi?id=1251487
https://bugzilla.redhat.com/show_bug.cgi?id=1259229

Some reported behaviour matches our findings here.

Hence, let's see if the original issue is still present

Comment 27 Francesco Romani 2015-11-04 14:51:03 UTC
(In reply to Francesco Romani from comment #26)
> I finally managed to find some time to play with this.
> 
> TL;DR: Kevin (Goldblatt), can you please check if this still fails with
> qemu-kvm-rhev >= 2.3.0-28?
> 
> +++
> 
> First, I wanted to reproduce locally the qemu-kvm-rhev being stuck.
[...]
> But I noticed that I was running a much more recent qemu-kvm-*EV* build.

Which is identical to the -rhev builds minus the branding, so the above is still significant, even more the question:

> Hence, let's see if the original issue is still present

Comment 28 Michal Skrivanek 2015-11-05 18:31:14 UTC
Francesco, what about comment #20, that has 2.3.0-29 (which is not latest, but more recent than you pointed out in comment #27)?

Goldblatt, does it reproduce on latest build?

Comment 29 Francesco Romani 2015-11-06 08:08:28 UTC
(In reply to Michal Skrivanek from comment #28)
> Francesco, what about comment #20, that has 2.3.0-29 (which is not latest,
> but more recent than you pointed out in comment #27)?

Right, I missed that. So it seems it is still happening with recent qemu-kvm-rhev, will try again to reproduce locally; if reproduces, will attach new monitor and inspect what's going on.

Natalie, are you doing LSM from NFS to block storage (ISCSI maybe), right?
Are the images on NFS thin provisioned?

Please share all the details about the storage migration being performed, to make sure I'm reproducing the same environment.

Comment 30 Francesco Romani 2015-11-09 12:08:45 UTC
After additional investigation (and some failed attempts to reproduce), we should move this bug down to QEMU(-kvm).

On VDSM side the issue as per https://bugzilla.redhat.com/show_bug.cgi?id=1250839 is confirmed, but we con continue here.

Comment 31 Francesco Romani 2015-11-10 09:45:19 UTC
cloned this bug to https://bugzilla.redhat.com/show_bug.cgi?id=1279777 and moved the clone to qemu-kvm-rhev.

Keeping this bug open to consume the fix from 1279777.

As far as we know right now, no further action is expected on RHEV side.

Comment 32 Kevin Alon Goldblatt 2015-11-10 15:02:19 UTC
(In reply to Francesco Romani from comment #26)
> I finally managed to find some time to play with this.
> 
> TL;DR: Kevin (Goldblatt), can you please check if this still fails with
> qemu-kvm-rhev >= 2.3.0-28?
> 
> +++
> 
> First, I wanted to reproduce locally the qemu-kvm-rhev being stuck.
> 
> I had a RHEL7.2 box with quite recent packages, and run the reproducer much
> like description:
> - 3.6.0-ish (but Engine is irrelevant here, VDSM not yet into play)
> environment with NFS and ISCSI storage domains
> - created VM on NFS storage, thin provisioned, 14 GB disk
> - booted VM using live image, started brutal dd if=/dev/urandom of=/dev/vda
> oflag=direct. Please note that this *is* different from description, but at
> qemu side should not make any difference, it is just I/O on disk
> - after something like 20-30s, started Live Storage Migration
> - qemu was responsive all along (inspected using ps, virsh, vdsClient,
> through spice console)
> 
> after few minutes, stopped dd inside guest, stopped vm, removed snapshot and
> disk, all succeeded.
> 
> But I noticed that I was running a much more recent qemu-kvm-*EV* build.
> Checking the differences, I found some interesting patches to fix bugs in
> the drive-mirror area, like
> https://bugzilla.redhat.com/show_bug.cgi?id=1251487
> https://bugzilla.redhat.com/show_bug.cgi?id=1259229
> 
> Some reported behaviour matches our findings here.
> 
> Hence, let's see if the original issue is still present

I ran the scenario again with the following code:
--------------------------------------------------------
rhevm-3.6.0.3-0.1.el6.noarch - engine
vdsm-4.17.10.1-0.el7ev.noarch - vdsm

libvirt--------------------------------
libvirt-client-1.2.17-13.el7.x86_64
libvirt-python-1.2.17-2.el7.x86_64
libvirt-daemon-driver-interface-1.2.17-13.el7.x86_64
libvirt-daemon-driver-storage-1.2.17-13.el7.x86_64
libvirt-daemon-1.2.17-13.el7.x86_64
libvirt-daemon-driver-nwfilter-1.2.17-13.el7.x86_64
libvirt-daemon-config-nwfilter-1.2.17-13.el7.x86_64
libvirt-daemon-driver-nodedev-1.2.17-13.el7.x86_64
libvirt-daemon-driver-secret-1.2.17-13.el7.x86_64
libvirt-daemon-driver-qemu-1.2.17-13.el7.x86_64
libvirt-daemon-kvm-1.2.17-13.el7.x86_64
libvirt-daemon-driver-network-1.2.17-13.el7.x86_64
libvirt-lock-sanlock-1.2.17-13.el7.x86_64


qemu---------------------------------
ipxe-roms-qemu-20130517-7.gitc4bce43.el7.noarch
qemu-kvm-tools-rhev-2.3.0-31.el7.x86_64
qemu-kvm-common-rhev-2.3.0-31.el7.x86_64
libvirt-daemon-driver-qemu-1.2.17-13.el7.x86_64
qemu-img-rhev-2.3.0-31.el7.x86_64
qemu-kvm-rhev-2.3.0-31.el7.x86_64



I ran the following scenario:
--------------------------------------------------------
1. Created VM with 1 nfs thin disk of 14g
2. Installed Rhel7.1 on the VM
3. Started a dd (dd if=/dev/zero of/aaa/aa bs=512k count=5000) on the OS
4. After about 30 seconds started LSM to a block SD
5. Ran a 2nd dd on the vm OS when the first finished during the LSM and a 3 dd after that.
6. 5 minutes later the Mixed LSM failed
7. The qemu process was still alive


Engine log
--------------------------------------------------------
2015-11-10 16:34:30,803 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (org.ovirt.thread.pool-7-thread-35) [3e229702] Failed in 'VmReplicateDiskFinishVDS' method
2015-11-10 16:34:30,809 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-35) [3e229702] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM blond-vdsf com
mand failed: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainGetBlockInfo)
2015-11-10 16:34:30,809 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (org.ovirt.thread.pool-7-thread-35) [3e229702] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand' 
return value 'StatusOnlyReturnForXmlRpc [status=StatusForXmlRpc [code=-32603, message=Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainGetBlockInfo)]]'
2015-11-10 16:34:30,809 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (org.ovirt.thread.pool-7-thread-35) [3e229702] HostName = blond-vdsf
2015-11-10 16:34:30,809 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (org.ovirt.thread.pool-7-thread-35) [3e229702] Command 'VmReplicateDiskFinishVDSCommand(HostName = blond-vdsf, VmReplicateDiskParameters:{runAsync='true', hostId='e338a2b6-7a21-4a1a-b305-895682b0b7d2', vmId='827992ea-9404-4ce1-b438-679e3c83ed6a'})' execution failed: VDSGenericException: VDSErrorException: Failed to VmReplicateDiskFinishVDS, error = Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainGetBlockInfo), code = -32603
2015-11-10 16:34:30,809 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (org.ovirt.thread.pool-7-thread-35) [3e229702] FINISH, VmReplicateDiskFinishVDSCommand, log id: 1824df3e
2015-11-10 16:34:30,841 ERROR [org.ovirt.engine.core.bll.AbstractSPMAsyncTaskHandler] (org.ovirt.thread.pool-7-thread-35) [3e229702] Failed VmReplicateDiskFinish (Disk 'c2b79f7a-a587-443f-885f-93bf1edfbe2a', VM '827992ea-9404-4ce1-b438-679e3c83ed6a')
2015-11-10 16:34:30,841 ERROR [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (org.ovirt.thread.pool-7-thread-35) [3e229702] Command 'org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to VmReplicateDiskFinishVDS, error = Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainGetBlockInfo), code = -32603 (Failed with error unexpected and code 16)
2015-11-10 16:34:30,845 ERROR [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (org.ovirt.thread.pool-7-thread-35) [3e229702] Reverting task 'unknown', handler 'org.ovirt.engine.core.bll.lsm.VmReplicateDiskStartTaskHandler'
2015-11-10 16:34:30,846 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (org.ovirt.thread.pool-7-thread-35) [3e229702] START, VmReplicateDiskFinishVDSCommand(HostName = blond-vdsf, VmReplicateDiskParameters:{runAsync='true', hostId='e338a2b6-7a21-4a1a-b305-895682b0b7d2', vmId='827992ea-9404-4ce1-b438-679e3c83ed6a'}), log id: 7d83513c
2015-11-10 16:35:00,866 WARN  [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (org.ovirt.thread.pool-7-thread-35) [3e229702] Unexpected return value: StatusForXmlRpc [code=-32603, message=Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainGetBlockInfo)]
2015-11-10 16:35:00,866 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (org.ovirt.thread.pool-7-thread-35) [3e229702] Failed in 'VmReplicateDiskFinishVDS' method
2015-11-10 16:35:00,873 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-35) [3e229702] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM blond-vdsf command failed: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainGetBlockInfo)
2015-11-10 16:35:00,873 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (org.ovirt.thread.pool-7-thread-35) [3e229702] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand' return value 'StatusOnlyReturnForXmlRpc [status=StatusForXmlRpc [code=-32603, message=Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainGetBlockInfo)]]'
2015-11-10 16:35:00,873 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (org.ovirt.thread.pool-7-thread-35) [3e229702] HostName = blond-vdsf
2015-11-10 16:35:00,873 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (org.ovirt.thread.pool-7-thread-35) [3e229702] Command 'VmReplicateDiskFinishVDSCommand(HostName = blond-vdsf, VmReplicateDiskParameters:{runAsync='true', hostId='e338a2b6-7a21-4a1a-b305-895682b0b7d2', vmId='827992ea-9404-4ce1-b438-679e3c83ed6a'})' execution failed: VDSGenericException: VDSErrorException: Failed to VmReplicateDiskFinishVDS, error = Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainGetBlockInfo), code = -32603
2015-11-10 16:35:00,874 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (org.ovirt.thread.pool-7-thread-35) [3e229702] FINISH, VmReplicateDiskFinishVDSCommand, log id: 7d83513c
2015-11-10 16:35:00,878 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-7-thread-35) [3e229702] CommandAsyncTask::HandleEndActionResult [within thread]: endAction for action type 'LiveMigrateDisk' completed, handling the result.
2015-11-10 16:35:00,879 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-7-thread-35) [3e229702] CommandAsyncTask::HandleEndActionResult [within thread]: endAction for action type 'LiveMigrateDisk' hasn't succeeded, not clearing tasks, will attempt again next polling.
2015-11-10 16:35:10,587 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler_Worker-93) [6d91d3ce] BaseAsyncTask::onTaskEndSuccess: Task 'd61d55c7-f1da-45b9-ba4f-c665b66d1d46' (Parent Command 'LiveMigrateDisk', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
2015-11-10 16:35:10,591 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (DefaultQuartzScheduler_Worker-93) [6d91d3ce] CommandAsyncTask::endActionIfNecessary: All tasks of command '7fa5af6d-de94-4ed6-b995-2cb0cc7ea2ce' has ended -> executing 'endAction'
2015-11-10 16:35:10,591 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (DefaultQuartzScheduler_Worker-93) [6d91d3ce] CommandAsyncTask::endAction: Ending action for '1' tasks (command ID: '7fa5af6d-de94-4ed6-b995-2cb0cc7ea2ce'): calling endAction '.
2015-11-10 16:35:10,591 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-7-thread-17) [6d91d3ce] CommandAsyncTask::endCommandAction [within thread] context: Attempting to endAction 'LiveMigrateDisk', executionIndex: '0'
2015-11-10 16:35:10,637 ERROR [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (org.ovirt.thread.pool-7-thread-17) [3e229702] Ending command 'org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand' with failure.
2015-11-10 16:35:10,641 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (org.ovirt.thread.pool-7-thread-17) [3e229702] Lock freed to object 'EngineLock:{exclusiveLocks='null', sharedLocks='[827992ea-9404-4ce1-b438-679e3c83ed6a=<VM, ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED$DiskName vmtest2_Disk1>]'}'
2015-11-10 16:35:10,671 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-17) [3e229702] Correlation ID: 3e229702, Job ID: 0cfc1adc-c038-40d4-9602-f4b4c6497ede, Call Stack: null, Custom Event ID: -1, Message: User admin@internal have failed to move disk vmtest2_Disk1 to domain block1.



vdsm logs:
----------------------------------------------------------------------------
Thread-33734::DEBUG::2015-11-10 16:34:30,362::fileSD::173::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n461 bytes (461 B) copied, 0.000403639 s, 1.1 MB/s\n'; <rc> = 0
periodic/15::WARNING::2015-11-10 16:34:30,834::periodic::253::virt.periodic.VmDispatcher::(__call__) could not run <class 'virt.periodic.NumaInfoMonitor'> on [u'827992ea-9404-4ce1-b438-679e3c83ed6a']
Thread-69668::DEBUG::2015-11-10 16:34:31,187::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'StoragePool.getSpmStatus' in bridge with {u'storagepoolID': u'87566c2f-6c56-4b30-9669-0231fa11e957'}
Thread-69668::DEBUG::2015-11-10 16:34:31,189::task::595::Storage.TaskManager.Task::(_updateState) Task=`69ec61ea-a2df-4adb-a2d2-4916362a72b5`::moving from state init -> state preparing
Thread-69668::INFO::2015-11-10 16:34:31,189::logUtils::48::dispatcher::(wrapper) Run and protect: getSpmStatus(spUUID=u'87566c2f-6c56-4b30-9669-0231fa11e957', options=None)
Thread-69668::INFO::2015-11-10 16:34:31,202::logUtils::51::dispatcher::(wrapper) Run and protect: getSpmStatus, Return response: {'spm_st': {'spmId': 1, 'spmStatus': 'SPM', 'spmLver': 5L}}
Thread-69668::DEBUG::2015-11-10 16:34:31,202::task::1191::Storage.TaskManager.Task::(prepare) Task=`69ec61ea-a2df-4adb-a2d2-4916362a72b5`::finished: {'spm_st': {'spmId': 1, 'spmStatus': 'SPM', 'spmLver': 5L}}
Thread-69668::DEBUG::2015-11-10 16:34:31,203::task::595::Storage.TaskManager.Task::(_updateState) Task=`69ec61ea-a2df-4adb-a2d2-4916362a72b5`::moving from state preparing -> state finished
Thread-69668::DEBUG::2015-11-10 16:34:31,203::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-69668::DEBUG::2015-11-10 16:34:31,204::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-69668::DEBUG::2015-11-10 16:34:31,205::task::993::Storage.TaskManager.Task::(_decref) Task=`69ec61ea-a2df-4adb-a2d2-4916362a72b5`::ref 0 aborting False
Thread-69668::DEBUG::2015-11-10 16:34:31,205::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) Return 'StoragePool.getSpmStatus' in bridge with {'spmId': 1, 'spmStatus': 'SPM', 'spmLver': 5L}
Thread-69657::ERROR::2015-11-10 16:34:31,097::__init__::526::jsonrpc.JsonRpcServer::(_serveRequest) Internal server error
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 521, in _serveRequest
    res = method(**params)
  File "/usr/share/vdsm/rpc/Bridge.py", line 277, in _dynamicMethod
    result = fn(*methodArgs)
  File "/usr/share/vdsm/API.py", line 615, in diskReplicateFinish
    return v.diskReplicateFinish(srcDisk, dstDisk)
  File "/usr/share/vdsm/virt/vm.py", line 3271, in diskReplicateFinish
    blkJobInfo = self._dom.blockJobInfo(drive.name, 0)
  File "/usr/share/vdsm/virt/virdomain.py", line 76, in f
    raise toe
TimeoutError: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainGetBlockInfo)
Thread-69669::DEBUG::2015-11-10 16:34:31,246::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'StoragePool.getInfo' in bridge with {u'storagepoolID': u'87566c2f-6c56-4b30-9669-0231fa11e957'}
Thread-69669::DEBUG::2015-11-10 16:34:31,247::task::595::Storage.TaskManager.Task::(_updateState) Task=`4729dcdf-b4a1-4bf5-9ec9-fa28ade0365a`::moving from state init -> state preparing
Thread-69669::INFO::2015-11-10 16:34:31,247::logUtils::48::dispatcher::(wrapper) Run and protect: getStoragePoolInfo(spUUID=u'87566c2f-6c56-4b30-9669-0231fa11e957', options=None)
Thread-69669::DEBUG::2015-11-10 16:34:31,248::resourceManager::198::Storage.ResourceManager.Request::(__init__) ResName=`Storage.87566c2f-6c56-4b30-9669-0231fa11e957`ReqID=`60b459cb-b406-458c-927d-87da642bfd22`::Request was made in '/usr
/share/vdsm/storage/hsm.py' line '2634' at 'getStoragePoolInfo'
Thread-69669::DEBUG::2015-11-10 16:34:31,248::resourceManager::542::Storage.ResourceManager::(registerResource) Trying to register resource 'Storage.87566c2f-6c56-4b30-9669-0231fa11e957' for lock type 'shared'
Thread-69669::DEBUG::2015-11-10 16:34:31,249::resourceManager::601::Storage.ResourceManager::(registerResource) Resource 'Storage.87566c2f-6c56-4b30-9669-0231fa11e957' is free. Now locking as 'shared' (1 active user)
Thread-69669::DEBUG::2015-11-10 16:34:31,249::resourceManager::238::Storage.ResourceManager.Request::(grant) ResName=`Storage.87566c2f-6c56-4b30-9669-0231fa11e957`ReqID=`60b459cb-b406-458c-927d-87da642bfd22`::Granted request
Thread-69669::DEBUG::2015-11-10 16:34:31,250::task::827::Storage.TaskManager.Task::(resourceAcquired) Task=`4729dcdf-b4a1-4bf5-9ec9-fa28ade0365a`::_resourcesAcquired: Storage.87566c2f-6c56-4b30-9669-0231fa11e957 (shared)
Thread-69669::DEBUG::2015-11-10 16:34:31,250::task::993::Storage.TaskManager.Task::(_decref) Task=`4729dcdf-b4a1-4bf5-9ec9-fa28ade0365a`::ref 1 aborting False



gdb of the qemu process:
----------------------------------------------------------------------------
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 30618
Reading symbols from /usr/libexec/qemu-kvm...Reading symbols from /usr/libexec/qemu-kvm...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Reading symbols from /lib64/libssl3.so...Reading symbols from /lib64/libssl3.so...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libssl3.so
Reading symbols from /lib64/libsmime3.so...Reading symbols from /lib64/libsmime3.so...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libsmime3.so
Reading symbols from /lib64/libnss3.so...Reading symbols from /lib64/libnss3.so...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss3.so
Reading symbols from /lib64/libnssutil3.so...Reading symbols from /lib64/libnssutil3.so...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libnssutil3.so
Reading symbols from /lib64/libplds4.so...Reading symbols from /lib64/libplds4.so...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libplds4.so
Reading symbols from /lib64/libplc4.so...Reading symbols from /lib64/libplc4.so...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libplc4.so
Reading symbols from /lib64/libnspr4.so...Reading symbols from /lib64/libnspr4.so...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libnspr4.so
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[New LWP 4560]
[New LWP 4559]
[New LWP 4558]
[New LWP 4557]
[New LWP 4432]
[New LWP 4149]
[New LWP 4148]
[New LWP 4147]
[New LWP 4146]
[New LWP 4051]
[New LWP 4047]
[New LWP 4045]
[New LWP 4024]
[New LWP 4023]
[New LWP 3818]
[New LWP 3817]
[New LWP 3816]
[New LWP 3810]
[New LWP 30625]
[New LWP 30622]
[New LWP 30619]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libgthread-2.0.so.0...Reading symbols from /lib64/libgthread-2.0.so.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libgthread-2.0.so.0
Reading symbols from /lib64/libglib-2.0.so.0...Reading symbols from /lib64/libglib-2.0.so.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libglib-2.0.so.0
Reading symbols from /lib64/libz.so.1...Reading symbols from /lib64/libz.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libz.so.1
Reading symbols from /lib64/libaio.so.1...Reading symbols from /lib64/libaio.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libaio.so.1
Reading symbols from /usr/lib64/iscsi/libiscsi.so.2...Reading symbols from /usr/lib64/iscsi/libiscsi.so.2...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/iscsi/libiscsi.so.2
Reading symbols from /lib64/libcurl.so.4...Reading symbols from /lib64/libcurl.so.4...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libcurl.so.4
Reading symbols from /lib64/librbd.so.1...Reading symbols from /lib64/librbd.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/librbd.so.1
Reading symbols from /lib64/librados.so.2...Reading symbols from /lib64/librados.so.2...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/librados.so.2
Reading symbols from /lib64/libgfapi.so.0...Reading symbols from /lib64/libgfapi.so.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libgfapi.so.0
Reading symbols from /lib64/libglusterfs.so.0...Reading symbols from /lib64/libglusterfs.so.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libglusterfs.so.0
Reading symbols from /lib64/libgfrpc.so.0...Reading symbols from /lib64/libgfrpc.so.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libgfrpc.so.0
Reading symbols from /lib64/libgfxdr.so.0...Reading symbols from /lib64/libgfxdr.so.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libgfxdr.so.0
Reading symbols from /lib64/libssh2.so.1...Reading symbols from /lib64/libssh2.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libssh2.so.1
Reading symbols from /lib64/libtcmalloc.so.4...Reading symbols from /lib64/libtcmalloc.so.4...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libtcmalloc.so.4
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /lib64/libpixman-1.so.0...Reading symbols from /lib64/libpixman-1.so.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libpixman-1.so.0
Reading symbols from /lib64/libutil.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libutil.so.1
Reading symbols from /lib64/libnuma.so.1...Reading symbols from /lib64/libnuma.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libnuma.so.1
Reading symbols from /lib64/libasound.so.2...Reading symbols from /lib64/libasound.so.2...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libasound.so.2
Reading symbols from /lib64/libpulse.so.0...Reading symbols from /lib64/libpulse.so.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libpulse.so.0
Reading symbols from /lib64/libuuid.so.1...Reading symbols from /lib64/libuuid.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libuuid.so.1
Reading symbols from /lib64/libpng15.so.15...Reading symbols from /lib64/libpng15.so.15...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libpng15.so.15
Reading symbols from /lib64/libsasl2.so.3...Reading symbols from /lib64/libsasl2.so.3...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libsasl2.so.3
Reading symbols from /lib64/libgnutls.so.28...Reading symbols from /lib64/libgnutls.so.28...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libgnutls.so.28
Reading symbols from /lib64/liblzo2.so.2...Reading symbols from /lib64/liblzo2.so.2...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/liblzo2.so.2
Reading symbols from /lib64/libsnappy.so.1...Reading symbols from /lib64/libsnappy.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libsnappy.so.1
Reading symbols from /lib64/libseccomp.so.2...Reading symbols from /lib64/libseccomp.so.2...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libseccomp.so.2
Reading symbols from /lib64/librdmacm.so.1...Reading symbols from /lib64/librdmacm.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/librdmacm.so.1
Reading symbols from /lib64/libibverbs.so.1...Reading symbols from /lib64/libibverbs.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libibverbs.so.1
Reading symbols from /lib64/libspice-server.so.1...Reading symbols from /lib64/libspice-server.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libspice-server.so.1
Reading symbols from /lib64/libusb-1.0.so.0...Reading symbols from /lib64/libusb-1.0.so.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libusb-1.0.so.0
Reading symbols from /lib64/libusbredirparser.so.1...Reading symbols from /lib64/libusbredirparser.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libusbredirparser.so.1
Reading symbols from /lib64/libstdc++.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libstdc++.so.6
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgcc_s.so.1
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libgcrypt.so.11...Reading symbols from /lib64/libgcrypt.so.11...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libgcrypt.so.11
Reading symbols from /lib64/libidn.so.11...Reading symbols from /lib64/libidn.so.11...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libidn.so.11
Reading symbols from /lib64/libgssapi_krb5.so.2...Reading symbols from /lib64/libgssapi_krb5.so.2...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libgssapi_krb5.so.2
Reading symbols from /lib64/libkrb5.so.3...Reading symbols from /lib64/libkrb5.so.3...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libkrb5.so.3
Reading symbols from /lib64/libk5crypto.so.3...Reading symbols from /lib64/libk5crypto.so.3...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libk5crypto.so.3
Reading symbols from /lib64/libcom_err.so.2...Reading symbols from /lib64/libcom_err.so.2...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libcom_err.so.2
Reading symbols from /lib64/liblber-2.4.so.2...Reading symbols from /lib64/liblber-2.4.so.2...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/liblber-2.4.so.2
Reading symbols from /lib64/libldap-2.4.so.2...Reading symbols from /lib64/libldap-2.4.so.2...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libldap-2.4.so.2
Reading symbols from /lib64/libboost_thread-mt.so.1.53.0...Reading symbols from /lib64/libboost_thread-mt.so.1.53.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libboost_thread-mt.so.1.53.0
Reading symbols from /lib64/libboost_system-mt.so.1.53.0...Reading symbols from /lib64/libboost_system-mt.so.1.53.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libboost_system-mt.so.1.53.0
Reading symbols from /lib64/libacl.so.1...Reading symbols from /lib64/libacl.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libacl.so.1
Reading symbols from /lib64/libcrypto.so.10...Reading symbols from /lib64/libcrypto.so.10...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libcrypto.so.10
Reading symbols from /lib64/libssl.so.10...Reading symbols from /lib64/libssl.so.10...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libssl.so.10
Reading symbols from /lib64/libunwind.so.8...Reading symbols from /lib64/libunwind.so.8...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libunwind.so.8
Reading symbols from /lib64/libjson-c.so.2...Reading symbols from /lib64/libjson-c.so.2...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libjson-c.so.2
Reading symbols from /usr/lib64/pulseaudio/libpulsecommon-6.0.so...Reading symbols from /usr/lib64/pulseaudio/libpulsecommon-6.0.so...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/pulseaudio/libpulsecommon-6.0.so
Reading symbols from /lib64/libdbus-1.so.3...Reading symbols from /lib64/libdbus-1.so.3...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libdbus-1.so.3
Reading symbols from /lib64/libcap.so.2...Reading symbols from /lib64/libcap.so.2...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libcap.so.2
Reading symbols from /lib64/libresolv.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libp11-kit.so.0...Reading symbols from /lib64/libp11-kit.so.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libp11-kit.so.0
Reading symbols from /lib64/libtspi.so.1...Reading symbols from /lib64/libtspi.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libtspi.so.1
Reading symbols from /lib64/libtasn1.so.6...Reading symbols from /lib64/libtasn1.so.6...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libtasn1.so.6
Reading symbols from /lib64/libnettle.so.4...Reading symbols from /lib64/libnettle.so.4...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libnettle.so.4
Reading symbols from /lib64/libhogweed.so.2...Reading symbols from /lib64/libhogweed.so.2...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libhogweed.so.2
Reading symbols from /lib64/libgmp.so.10...Reading symbols from /lib64/libgmp.so.10...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libgmp.so.10
Reading symbols from /lib64/libnl-route-3.so.200...Reading symbols from /lib64/libnl-route-3.so.200...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libnl-route-3.so.200
Reading symbols from /lib64/libnl-3.so.200...Reading symbols from /lib64/libnl-3.so.200...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libnl-3.so.200
Reading symbols from /lib64/libcelt051.so.0...Reading symbols from /lib64/libcelt051.so.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libcelt051.so.0
Reading symbols from /lib64/libjpeg.so.62...Reading symbols from /lib64/libjpeg.so.62...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libjpeg.so.62
Reading symbols from /lib64/libgpg-error.so.0...Reading symbols from /lib64/libgpg-error.so.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libgpg-error.so.0
Reading symbols from /lib64/libkrb5support.so.0...Reading symbols from /lib64/libkrb5support.so.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libkrb5support.so.0
Reading symbols from /lib64/libkeyutils.so.1...Reading symbols from /lib64/libkeyutils.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libkeyutils.so.1
Reading symbols from /lib64/libattr.so.1...Reading symbols from /lib64/libattr.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libattr.so.1
Reading symbols from /lib64/libX11-xcb.so.1...Reading symbols from /lib64/libX11-xcb.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libX11-xcb.so.1
Reading symbols from /lib64/libX11.so.6...Reading symbols from /lib64/libX11.so.6...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libX11.so.6
Reading symbols from /lib64/libxcb.so.1...Reading symbols from /lib64/libxcb.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libxcb.so.1
Reading symbols from /lib64/libICE.so.6...Reading symbols from /lib64/libICE.so.6...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libICE.so.6
Reading symbols from /lib64/libSM.so.6...Reading symbols from /lib64/libSM.so.6...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libSM.so.6
Reading symbols from /lib64/libXtst.so.6...Reading symbols from /lib64/libXtst.so.6...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libXtst.so.6
Reading symbols from /lib64/libsystemd.so.0...Reading symbols from /lib64/libsystemd.so.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libsystemd.so.0
Reading symbols from /lib64/libwrap.so.0...Reading symbols from /lib64/libwrap.so.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libwrap.so.0
Reading symbols from /lib64/libsndfile.so.1...Reading symbols from /lib64/libsndfile.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libsndfile.so.1
Reading symbols from /lib64/libasyncns.so.0...Reading symbols from /lib64/libasyncns.so.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libasyncns.so.0
Reading symbols from /lib64/libfreebl3.so...Reading symbols from /lib64/libfreebl3.so...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libfreebl3.so
Reading symbols from /lib64/libffi.so.6...Reading symbols from /lib64/libffi.so.6...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libffi.so.6
Reading symbols from /lib64/libselinux.so.1...Reading symbols from /lib64/libselinux.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libselinux.so.1
Reading symbols from /lib64/libXau.so.6...Reading symbols from /lib64/libXau.so.6...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libXau.so.6
Reading symbols from /lib64/libXext.so.6...Reading symbols from /lib64/libXext.so.6...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libXext.so.6
Reading symbols from /lib64/libXi.so.6...Reading symbols from /lib64/libXi.so.6...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libXi.so.6
Reading symbols from /lib64/liblzma.so.5...Reading symbols from /lib64/liblzma.so.5...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/liblzma.so.5
Reading symbols from /lib64/libdw.so.1...Reading symbols from /lib64/libdw.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libdw.so.1
Reading symbols from /lib64/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/libgsm.so.1...Reading symbols from /lib64/libgsm.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libgsm.so.1
Reading symbols from /lib64/libFLAC.so.8...Reading symbols from /lib64/libFLAC.so.8...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libFLAC.so.8
Reading symbols from /lib64/libvorbisenc.so.2...Reading symbols from /lib64/libvorbisenc.so.2...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libvorbisenc.so.2
Reading symbols from /lib64/libvorbis.so.0...Reading symbols from /lib64/libvorbis.so.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libvorbis.so.0
Reading symbols from /lib64/libogg.so.0...Reading symbols from /lib64/libogg.so.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libogg.so.0
Reading symbols from /lib64/libpcre.so.1...Reading symbols from /lib64/libpcre.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libpcre.so.1
Reading symbols from /lib64/libelf.so.1...Reading symbols from /lib64/libelf.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libelf.so.1
Reading symbols from /lib64/libbz2.so.1...Reading symbols from /lib64/libbz2.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libbz2.so.1
Reading symbols from /usr/lib64/sasl2/libplain.so...Reading symbols from /usr/lib64/sasl2/libplain.so...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/sasl2/libplain.so
Reading symbols from /usr/lib64/sasl2/libsasldb.so...Reading symbols from /usr/lib64/sasl2/libsasldb.so...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/sasl2/libsasldb.so
Reading symbols from /lib64/libdb-5.3.so...Reading symbols from /lib64/libdb-5.3.so...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libdb-5.3.so
Reading symbols from /usr/lib64/sasl2/liblogin.so...Reading symbols from /usr/lib64/sasl2/liblogin.so...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/sasl2/liblogin.so
Reading symbols from /usr/lib64/sasl2/libdigestmd5.so...Reading symbols from /usr/lib64/sasl2/libdigestmd5.so...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/sasl2/libdigestmd5.so
Reading symbols from /usr/lib64/sasl2/libanonymous.so...Reading symbols from /usr/lib64/sasl2/libanonymous.so...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/sasl2/libanonymous.so
Reading symbols from /usr/lib64/sasl2/libcrammd5.so...Reading symbols from /usr/lib64/sasl2/libcrammd5.so...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/sasl2/libcrammd5.so
0x00007f1319ba2c3f in ppoll () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install qemu-kvm-rhev-2.3.0-31.el7.x86_64
(gdb) thread apply all bt full

Thread 22 (Thread 0x7f131134c700 (LWP 30619)):
#0  0x00007f1319ba7949 in syscall () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007f13216ff432 in qemu_event_wait ()
No symbol table info available.
#2  0x00007f132170d6e6 in call_rcu_thread ()
No symbol table info available.
#3  0x00007f131ffc3dc5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x00007f1319bad1cd in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 21 (Thread 0x7f130fb63700 (LWP 30622)):
#0  0x00007f1319ba4257 in ioctl () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007f13214ca075 in kvm_vcpu_ioctl ()
No symbol table info available.
#2  0x00007f13214ca12e in kvm_cpu_exec ()
No symbol table info available.
#3  0x00007f13214b7dca in qemu_kvm_cpu_thread_fn ()
No symbol table info available.
#4  0x00007f131ffc3dc5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#5  0x00007f1319bad1cd in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 20 (Thread 0x7f12c81ff700 (LWP 30625)):
#0  0x00007f1319ba2b7d in poll () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007f131aafa1e7 in red_worker_main () from /lib64/libspice-server.so.1
No symbol table info available.
#2  0x00007f131ffc3dc5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#3  0x00007f1319bad1cd in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 19 (Thread 0x7f12c71ff700 (LWP 3810)):
#0  0x00007f131ffc9870 in sem_timedwait () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007f13216ff287 in qemu_sem_timedwait ()
No symbol table info available.
#2  0x00007f1321682a6c in worker_thread ()
No symbol table info available.
#3  0x00007f131ffc3dc5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x00007f1319bad1cd in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 18 (Thread 0x7f12c31f7700 (LWP 3816)):
#0  0x00007f131ffc9870 in sem_timedwait () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007f13216ff287 in qemu_sem_timedwait ()
No symbol table info available.
#2  0x00007f1321682a6c in worker_thread ()
No symbol table info available.
#3  0x00007f131ffc3dc5 in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---

Comment 33 Kevin Alon Goldblatt 2015-11-10 15:09:53 UTC
Created attachment 1092296 [details]
engine, vdsm, server logs

Added logs

Comment 34 Francesco Romani 2015-11-11 13:42:41 UTC
Thanks for the updates.

I see some thread leaking from the VDSM logs. This is a proof that you are hitting https://bugzilla.redhat.com/show_bug.cgi?id=1250839 and then you need patch https://gerrit.ovirt.org/#/c/48348/ - this can makes things worse if storage is already blocked.

But let me stress that bz1250839 is triggered by *blocked* storage, so I don't think patch 48348 fixes this bug. I think there is a common yet unknown cause
that triggers both this bug *and* the leak.

The evidence points in the lower levels of the stack: maybe libvirt, maybe qemu itself. The error message in turn points toward QEMU, so I cloned this issue
to QEMU on https://bugzilla.redhat.com/show_bug.cgi?id=1279777

As per https://bugzilla.redhat.com/show_bug.cgi?id=1261980#c31 , until (and if) new evidence comes out, I keep this bug open only to consume the fix -if any, on the lower level of the stack.

Please adopt bz1279777 (or move it to the appropriate persone) to continue the investigation on the low level and to learn why QEMU is unresponsive.


Note You need to log in before you can comment on or make changes to this bug.