RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1607130 - Live merge fails with "ret = libvirtmod.virDomainBlockCommit(self._o, disk, base, top, bandwidth, flags)"
Summary: Live merge fails with "ret = libvirtmod.virDomainBlockCommit(self._o, disk, b...
Keywords:
Status: CLOSED DUPLICATE of bug 1601212
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.5
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: pre-dev-freeze
: ---
Assignee: Libvirt Maintainers
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 1583424
TreeView+ depends on / blocked
 
Reported: 2018-07-22 09:04 UTC by Elad
Modified: 2019-01-21 15:27 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-30 17:08:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs (3.49 MB, application/x-gzip)
2018-07-22 09:04 UTC, Elad
no flags Details
Failure to start other VMs - vdsm.log (7.06 MB, text/plain)
2018-07-22 09:13 UTC, Elad
no flags Details
logs with libvirtd.log (4.36 MB, application/x-gzip)
2018-08-08 22:09 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3625291 0 None None None 2018-09-26 04:41:57 UTC

Description Elad 2018-07-22 09:04:10 UTC
Created attachment 1469745 [details]
logs

Description of problem:
Second live merge operation failed on vdsm

Version-Release number of selected component (if applicable):
vdsm-4.20.34-1.el7ev.x86_64                                                                                                                                                                                                                  
libvirt-3.9.0-14.el7_5.6.x86_64                                                                                                                                                                                                              
qemu-kvm-rhev-2.10.0-21.el7_5.4.x86_64                                                                                                                                                                                                       
kernel 3.10.0-862.9.1.el7.x86_64 
rhvm-4.2.5.1-0.1.el7ev.noarch

How reproducible:
Happened once

Steps to Reproduce:
Occurred to me after the following scenario:
1. Create a VM with 1 thin provisioned disk on iscsi attached and install RHEL7.5 
2. Create and attach to the VM 2 disks reside on iscsi, one thin and one preallocated 
3. Within the guest, create FS and write some content to the disks 
4. Create a snapshot - snap1
5. Create a second snapshot - snap2
5. Live merge snap1 - succeeds
6. Create a third snapshot - snap3
7. Live merge snap2


Actual results:
Live merge fails on vdsm:

2018-07-22 11:38:30,455+0300 INFO  (jsonrpc/6) [api.host] FINISH getAllVmIoTunePolicies return={'status': {'message': 'Done', 'code': 0}, 'io_tune_policies_dict': {'460b64cd-19c2-435b-b231-f475d39ad062': {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L
, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': '/rhev/data-center/mnt/blockSD/e0443a0d-bdb9-4590-a5e5-c54a3c9712ee/images/4c569de9-7376-4126-969e-477c960577be/1b1aef19-bc7c-4c0a-8ae0-0dd95e3a4ec4',
 'name': 'sda'}]}, '9ff4a31f-0311-463a-99ea-d55febab4abf': {'policy': [], 'current_values':
 [{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': '/rhev/data-center/mnt/blockSD/e0443a0d-bdb9-4590-a5e5-c54a3c9712ee/images/9bc02e2f-28bf-4115-9853-d9c96bf891c3/de3a309d-6ae8-4594-8c0f-7c467f450b19', 'name': 'sda'}, {'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': '/rhev/data-center/mnt/blockSD/e0443a0d-bdb9-4590-a5e5-c54a3c9712ee/images/c8ef0ae4-3aab-4452-917f-86e8de535dce/bf7f72cc-0cf5-4d9f-93cd-1b8b809a2ab0', 'name': 'sdb'}, {'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': '/rhev/data-center/mnt/blockSD/fd1659df-b9fb-4c43-b264-edf3d54689d4/images/ad7fd9c4-c514-4d56-8414-3fca938b2d7a/d27dad12-ba66-4d20-ab7c-7294d3765cd4', 'name': 'sdc'}]}}} from=::1,47950 (api:52)
2018-07-22 11:38:30,455+0300 INFO  (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmIoTunePolicies succeeded in 0.01 seconds (__init__:573)
2018-07-22 11:38:31,051+0300 WARN  (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=jsonrpc/5 running <Task <JsonRpcTask {'params': {'topVolUUID': 'b5564fdd-be28-48c1-80e5-7bdb239b6928', 'vmID': '9ff4a31f-0311-463a-99ea-d55febab4abf', 'drive': {'imageID': '9bc02e2f-28bf-4115-9853-d9c96bf891c3', 'volumeID': 'de3a309d-6ae8-4594-8c0f-7c467f450b19', 'domainID': 'e0443a0d-bdb9-4590-a5e5-c54a3c9712ee', 'poolID': '62fa9c06-8969-11e8-9e8c-001a4a168bf8'}, 'bandwidth': '0', 'jobUUID': '830e37b4-0135-4beb-9cf3-ff04ab3c81b4', 'baseVolUUID': 'df05835d-f92f-4e55-9cf0-e630108c76ed'}, 'jsonrpc': '2.0', 'method': 'VM.merge', 'id': '093b974f-2de2-4482-91f3-b132496addca'} at 0x7ff3defe7690> timeout=60, duration=600 at 0x7ff3defe7f90> task#=80371 at 0x7ff3dc52fc90>, traceback:
File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
  self.run()
File: "/usr/lib64/python2.7/threading.py", line 765, in run
  self.__target(*self.__args, **self.__kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 194, in run
  ret = func(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run
  self._execute_task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task
  task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__
  self._callable()
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 523, in __call__
  self._handler(self._ctx, self._req)
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 566, in _serveRequest
  response = self._handle_request(req, ctx)
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 606, in _handle_request
  res = method(**params)
File: "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 201, in _dynamicMethod
  result = fn(*methodArgs)
File: "<string>", line 2, in merge
File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method
  ret = func(*args, **kwargs)
File: "<string>", line 2, in merge
File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 122, in method
  ret = func(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/API.py", line 736, in merge
  drive, baseVolUUID, topVolUUID, bandwidth, jobUUID)
File: "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5976, in merge
  bandwidth, flags)
File: "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f
  ret = attr(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
  ret = f(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
  return func(inst, *args, **kwargs)
File: "/usr/lib64/python2.7/site-packages/libvirt.py", line 707, in blockCommit
  ret = libvirtmod.virDomainBlockCommit(self._o, disk, base, top, bandwidth, flags) (executor:363)

Expected results:
Live merge should succeed.


Additional info:

Comment 1 Elad 2018-07-22 09:11:58 UTC
After this failure, the VM enters 'Not Responding' state and starting other VMs on the same host ends up with these VMs enter 'Not Responding' too.


2018-07-22 12:11:31,130+0300 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:573)
2018-07-22 12:11:31,133+0300 WARN  (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=jsonrpc/1 running <Task <JsonRpcTask {'params': {'topVolUUID': '939efebc-7dbc-49a6-bfb3-0ade9a9857b6', 'vmID': '9ff4a31f-0311-463a-99ea-d55febab4abf', 'drive': {'imageID': 'c8ef0a
e4-3aab-4452-917f-86e8de535dce', 'volumeID': 'bf7f72cc-0cf5-4d9f-93cd-1b8b809a2ab0', 'domainID': 'e0443a0d-bdb9-4590-a5e5-c54a3c9712ee', 'poolID': '62fa9c06-8969-11e8-9e8c-001a4a168bf8'}, 'bandwidth': '0', 'jobUUID': 'fb8db2b5-5741-4c41-ba22-6101f99fe930', 'baseVolUUID':
 '8f5a668e-0059-4ef4-9669-ebebdf03c6aa'}, 'jsonrpc': '2.0', 'method': 'VM.merge', 'id': 'e66cf155-acb5-43a1-a259-6a1927fcc05d'} at 0x7ff3dee026d0> timeout=60, duration=2580 at 0x7ff3dee02410> task#=80354 at 0x7ff4100b4f10>, traceback:
File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
  self.run()
File: "/usr/lib64/python2.7/threading.py", line 765, in run
  self.__target(*self.__args, **self.__kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 194, in run
  ret = func(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run
  self._execute_task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task
  task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__
  self._callable()
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 523, in __call__
  self._handler(self._ctx, self._req)
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 566, in _serveRequest
  response = self._handle_request(req, ctx)
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 606, in _handle_request
  res = method(**params)
File: "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 201, in _dynamicMethod
  result = fn(*methodArgs)
File: "<string>", line 2, in merge
File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method
  ret = func(*args, **kwargs)
File: "<string>", line 2, in merge
File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 122, in method
  ret = func(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/API.py", line 736, in merge
  drive, baseVolUUID, topVolUUID, bandwidth, jobUUID)
File: "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5959, in merge
  with self._jobsLock:
File: "/usr/lib/python2.7/site-packages/pthreading.py", line 60, in __enter__
  self.acquire()
File: "/usr/lib/python2.7/site-packages/pthreading.py", line 68, in acquire
  rc = self.lock() if blocking else self.trylock()
File: "/usr/lib/python2.7/site-packages/pthread.py", line 96, in lock
  return _libpthread.pthread_mutex_lock(self._mutex) (executor:363)

Comment 2 Elad 2018-07-22 09:13:49 UTC
Created attachment 1469746 [details]
Failure to start other VMs - vdsm.log

Comment 3 Eyal Shenitzky 2018-08-07 05:16:22 UTC
Elad, Can you please attach libvirt and QEMU log.

Comment 4 Eyal Shenitzky 2018-08-07 05:17:51 UTC
Peter,

I suspect that there is a Libvirt bug that causes this error, could be a duplication of bug 1601212.

Can you please take a look at the libvirtd.log after Elad will add it?

Comment 5 Elad 2018-08-07 06:44:49 UTC
Eyal, from the time the bug happened, the attached logs are the ones we have

Comment 6 Peter Krempa 2018-08-08 11:37:46 UTC
(In reply to Eyal Shenitzky from comment #4)
> Peter,
> 
> I suspect that there is a Libvirt bug that causes this error, could be a
> duplication of bug 1601212.

From the VDSM log it looks like the thread calling the 'libvirtmod.virDomainBlockCommit' operation got stuck:

2018-07-22 11:38:31,051+0300 WARN  (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=jsonrpc/5 running <Task <JsonRpcTask {'params': {'topVolUUID': 'b5564fdd-be28-48c1-80e5-7bdb239b6928', 'vmID': '9ff4a31f-0311-463a-99ea-d55febab4abf', 'drive': {'imageID': '9bc02e2f-28bf-4115-9853-d9c96bf891c3', 'volumeID': 'de3a309d-6ae8-4594-8c0f-7c467f450b19', 'domainID': 'e0443a0d-bdb9-4590-a5e5-c54a3c9712ee', 'poolID': '62fa9c06-8969-11e8-9e8c-001a4a168bf8'}, 'bandwidth': '0', 'jobUUID': '830e37b4-0135-4beb-9cf3-ff04ab3c81b4', 'baseVolUUID': 'df05835d-f92f-4e55-9cf0-e630108c76ed'}, 'jsonrpc': '2.0', 'method': 'VM.merge', 'id': '093b974f-2de2-4482-91f3-b132496addca'} at 0x7ff3defe7690> timeout=60, duration=600 at 0x7ff3defe7f90> task#=80371 at 0x7ff3dc52fc90>, traceback:

[SNIP]

File: "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
  return func(inst, *args, **kwargs)
File: "/usr/lib64/python2.7/site-packages/libvirt.py", line 707, in blockCommit
  ret = libvirtmod.virDomainBlockCommit(self._o, disk, base, top, bandwidth, flags) (executor:363)

It is very probable that it was caused by the stuck qemu process from the mentioned BZ.

> Can you please take a look at the libvirtd.log after Elad will add it?

Unfortunately the libvirtd log in this BZ does not contain debug information here so I can't draw any more conclusions.

Comment 7 Eyal Shenitzky 2018-08-08 12:33:10 UTC
Ademar, 
Can you please take a look?

Comment 8 Ademar Reis 2018-08-08 14:51:54 UTC
(In reply to Eyal Shenitzky from comment #7)
> Ademar, 
> Can you please take a look?

The current information doesn't ring any bells. Please clone this BZ or transfer it to the QEMU component so it can be prioritized. Thanks.

Comment 9 Elad 2018-08-08 22:09:12 UTC
Created attachment 1474544 [details]
logs with libvirtd.log

Encountered once again, this time with libvirt log.


vdsm.log:


2018-08-08 14:12:49,380+0300 WARN  (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=jsonrpc/6 running <Task <JsonRpcTask {'params': {'topVolUUID': '44c2ce74-72c5-42aa-8c2c-12d902b8d975', 'vmID': '4e1fb14
d-ebcc-4792-a6bc-ddfd54832e38', 'drive': {'imageID': 'c41bc042-c19c-41a0-bc9c-ed58404e461d', 'volumeID': '5663c639-0379-4556-9034-4730194c5670', 'domainID': 'e741013d-9c36-4ea7-9d48-d487a71337f5', 'poolID': '9f3
31444-276b-4f73-b936-b3443a05f9dd'}, 'bandwidth': '0', 'jobUUID': 'a892d328-ed20-48ba-a94f-5b891fc19145', 'baseVolUUID': 'c47d9574-a9dd-4b7d-9780-db31a0bd802a'}, 'jsonrpc': '2.0', 'method': 'VM.merge', 'id': '42
c460ca-521e-4c29-8043-faf087dca39f'} at 0x7f66decc9f10> timeout=60, duration=60 at 0x7f6724e857d0> task#=179 at 0x7f66ec0bf710>, traceback:
File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
  self.run()
File: "/usr/lib64/python2.7/threading.py", line 765, in run
  self.__target(*self.__args, **self.__kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 194, in run
  ret = func(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run
  self._execute_task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task
  task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__
  self._callable()
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 523, in __call__
  self._handler(self._ctx, self._req)
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 566, in _serveRequest
  response = self._handle_request(req, ctx)
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 606, in _handle_request
  res = method(**params)
File: "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 201, in _dynamicMethod
  result = fn(*methodArgs)
File: "<string>", line 2, in merge
File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method
  ret = func(*args, **kwargs)
File: "<string>", line 2, in merge
File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 122, in method
  ret = func(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/API.py", line 738, in merge
  drive, baseVolUUID, topVolUUID, bandwidth, jobUUID)
File: "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5976, in merge
  bandwidth, flags)
File: "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f
  ret = attr(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
  ret = f(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
  return func(inst, *args, **kwargs)
File: "/usr/lib64/python2.7/site-packages/libvirt.py", line 707, in blockCommit
  ret = libvirtmod.virDomainBlockCommit(self._o, disk, base, top, bandwidth, flags) (executor:363)





One minute earlier in libvirtd.log (there is a 3 hours time difference between libvird.log and vdsm.log):

2018-08-08 11:12:22.972+0000: 4516: error : qemuDomainObjBeginJobInternal:4721 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainBlockCommit)

And keeps for the next few minutes (every 1 minute):


2018-08-08 11:14:07.979+0000: 4517: error : qemuDomainObjBeginJobInternal:4721 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainBlockCommit)

Last time:

2018-08-08 11:18:52.977+0000: 4517: error : qemuDomainObjBeginJobInternal:4721 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainBlockCommit)

Comment 11 Ademar Reis 2018-08-15 14:36:33 UTC
(In reply to Elad from comment #9)
> Created attachment 1474544 [details]
> logs with libvirtd.log
> 
> Encountered once again, this time with libvirt log.
> 
> 
> vdsm.log:
> 
> 
> 2018-08-08 14:12:49,380+0300 WARN  (vdsm.Scheduler) [Executor] Worker
> blocked: <Worker name=jsonrpc/6 running <Task <JsonRpcTask {'params':
> {'topVolUUID': '44c2ce74-72c5-42aa-8c2c-12d902b8d975', 'vmID': '4e1fb14
> d-ebcc-4792-a6bc-ddfd54832e38', 'drive': {'imageID':
> 'c41bc042-c19c-41a0-bc9c-ed58404e461d', 'volumeID':
> '5663c639-0379-4556-9034-4730194c5670', 'domainID':
> 'e741013d-9c36-4ea7-9d48-d487a71337f5', 'poolID': '9f3
> 31444-276b-4f73-b936-b3443a05f9dd'}, 'bandwidth': '0', 'jobUUID':
> 'a892d328-ed20-48ba-a94f-5b891fc19145', 'baseVolUUID':
> 'c47d9574-a9dd-4b7d-9780-db31a0bd802a'}, 'jsonrpc': '2.0', 'method':
> 'VM.merge', 'id': '42
> c460ca-521e-4c29-8043-faf087dca39f'} at 0x7f66decc9f10> timeout=60,
> duration=60 at 0x7f6724e857d0> task#=179 at 0x7f66ec0bf710>, traceback:
> File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
>   self.__bootstrap_inner()
> File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
>   self.run()
> File: "/usr/lib64/python2.7/threading.py", line 765, in run
>   self.__target(*self.__args, **self.__kwargs)
> File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line
> 194, in run
>   ret = func(*args, **kwargs)
> File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run
>   self._execute_task()
> File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in
> _execute_task
>   task()
> File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in
> __call__
>   self._callable()
> File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 523, in
> __call__
>   self._handler(self._ctx, self._req)
> File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 566, in
> _serveRequest
>   response = self._handle_request(req, ctx)
> File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 606, in
> _handle_request
>   res = method(**params)
> File: "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 201, in
> _dynamicMethod
>   result = fn(*methodArgs)
> File: "<string>", line 2, in merge
> File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in
> method
>   ret = func(*args, **kwargs)
> File: "<string>", line 2, in merge
> File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 122, in
> method
>   ret = func(*args, **kwargs)
> File: "/usr/lib/python2.7/site-packages/vdsm/API.py", line 738, in merge
>   drive, baseVolUUID, topVolUUID, bandwidth, jobUUID)
> File: "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5976, in merge
>   bandwidth, flags)
> File: "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in
> f
>   ret = attr(*args, **kwargs)
> File: "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py",
> line 130, in wrapper
>   ret = f(*args, **kwargs)
> File: "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92,
> in wrapper
>   return func(inst, *args, **kwargs)
> File: "/usr/lib64/python2.7/site-packages/libvirt.py", line 707, in
> blockCommit
>   ret = libvirtmod.virDomainBlockCommit(self._o, disk, base, top, bandwidth,
> flags) (executor:363)
> 
> 
> One minute earlier in libvirtd.log (there is a 3 hours time difference
> between libvird.log and vdsm.log):
> 
> 2018-08-08 11:12:22.972+0000: 4516: error :
> qemuDomainObjBeginJobInternal:4721 : Timed out during operation: cannot
> acquire state change lock (held by remoteDispatchDomainBlockCommit)
> 
> And keeps for the next few minutes (every 1 minute):
> 
> 
> 2018-08-08 11:14:07.979+0000: 4517: error :
> qemuDomainObjBeginJobInternal:4721 : Timed out during operation: cannot
> acquire state change lock (held by remoteDispatchDomainBlockCommit)
> 
> Last time:
> 
> 2018-08-08 11:18:52.977+0000: 4517: error :
> qemuDomainObjBeginJobInternal:4721 : Timed out during operation: cannot
> acquire state change lock (held by remoteDispatchDomainBlockCommit)


Peter, does it ring any bells? Can you tell what is it that libvirt is waiting for here?

Comment 12 Peter Krempa 2018-08-29 11:09:19 UTC
As said above, it is most probably a duplicate of 1601212 but there were not enough information to be 100% sure in this conclusion.

Basically what is happening is that the call to block-commit gets stuck in qemu which is the reason the libvirt API does not return.

Comment 13 Jaroslav Suchanek 2018-08-29 13:49:34 UTC
Elad, Can you please provide libvirt debug logs as was indicated in the comment 6. Thanks.

Comment 14 Elad 2018-08-29 14:13:20 UTC
See comment #9

Comment 15 Peter Krempa 2018-08-30 17:08:13 UTC
So in the log there is one VM with monitor object 0x7f057801b450 which issued the following qmp command:

{"execute":"block-commit","arguments":{"device":"drive-ua-c41bc042-c19c-41a0-bc9c-ed58404e461d","top":"/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge6__nfs__0/e741013d-9c36-4ea7-9d48-d487a71337f5/images/c41bc042-c19c-41a0-bc9c-ed58404e461d/44c2ce74-72c5-42aa-8c2c-12d902b8d975","base":"/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge6__nfs__0/e741013d-9c36-4ea7-9d48-d487a71337f5/images/c41bc042-c19c-41a0-bc9c-ed58404e461d/c47d9574-a9dd-4b7d-9780-db31a0bd802a","backing-file":"c47d9574-a9dd-4b7d-9780-db31a0bd802a"},"id":"libvirt-727"}

at timestamp 2018-08-08 11:11:49.757

And never got any response after that. The whole life of the VM is captured in the log.

The log ends with timestamp: 

2018-08-08 12:03:38.796+0000

So it looks like the same thing as pointed out in comment 5

*** This bug has been marked as a duplicate of bug 1601212 ***


Note You need to log in before you can comment on or make changes to this bug.