Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1636758

Summary: PrepareImageVDSCommand fails - Unable to update the image info - VDSM (vdsm.Scheduler) [Executor] Worker blocked
Product: [oVirt] ovirt-engine Reporter: Avihai <aefrat>
Component: BLL.StorageAssignee: Tal Nisan <tnisan>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Lucie Leistnerova <lleistne>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.2.6.2CC: aefrat, bugs, eshames, lsvaty, mperina, nsoffer, pkliczew, reliezer, tnisan
Target Milestone: ---Keywords: Automation
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-14 09:06:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine and vdsm logs
none
Full engine logs for several TC's with the failiure none

Description Avihai 2018-10-07 12:42:16 UTC
Created attachment 1491322 [details]
engine and vdsm logs

Description of problem:

Occurred on several tests during automation tier3 on HE after creating VM & starting it I add disk to the VM.
Sometimes the issue is seen in "disks_create" method and sometimes in "vms_create"

Failure is "Unable to update the image info".
I checked both host status and storage domain status and it was 'up'

Engine:
2018-09-21 00:17:18,383+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-80) [disks_create_0f2d16c8-5eec-4d82] Failed in 'PrepareImageVD
S' method, for vds: 'host_mixed_1'; host: 'lynx14.lab.eng.tlv2.redhat.com': null
2018-09-21 00:17:18,383+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-80) [disks_create_0f2d16c8-5eec-4d82] Command 'PrepareImageVDSC
ommand(HostName = host_mixed_1, PrepareImageVDSCommandParameters:{hostId='309aaa0d-4eda-4a52-b64a-adc9394891d4'})' execution failed: null
2018-09-21 00:17:18,383+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-80) [disks_create_0f2d16c8-5eec-4d82] FINISH, PrepareImageVDSCo
mmand, log id: 1ac3b01c
2018-09-21 00:17:18,383+03 ERROR [org.ovirt.engine.core.bll.storage.disk.image.AddImageFromScratchCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-80) [disks_create_0f2d16c8-5eec-4d82] Unable to update the image info for image '9c382740-baa0-4383-9aca-886fd3f0347e' (image group: '6bf68208-3c93-4e1e-b458-8d4bc5366bb3') on domain 'a133c0a9-cb59-4e06-b777-6a10e9ff189d'

VDSM:
2018-09-21 00:18:18,967+0300 WARN  (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=jsonrpc/6 running <Task <JsonRpcTask {'params': {'allowIllegal': True, 'storagepoolID': '9855db9c-bc2a-11e8-8310-001a4a168bf8', 'imageID': '1054f133-c558-4216-b9ab-0ca4c87e1adf', 'volumeID': '164acf8d-a5c1-4682-aaed-f4c8e34c719b', 'storagedomainID': 'a133c0a9-cb59-4e06-b777-6a10e9ff189d'}, 'jsonrpc': '2.0', 'method': 'Image.prepare', 'id': '834ff44a-c37f-416a-9d8d-a8cbb9949807'} at 0x7fd3b403ef90> timeout=60, duration=600 at 0x7fd3b403e650> task#=9293 at 0x7fd3b4186e50>, traceback:
File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
  self.run()
File: "/usr/lib64/python2.7/threading.py", line 765, in run
  self.__target(*self.__args, **self.__kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 194, in run
  ret = func(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run
  self._execute_task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task
  task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__
  self._callable()
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 523, in __call__
  self._handler(self._ctx, self._req)
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 566, in _serveRequest
  response = self._handle_request(req, ctx)
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 606, in _handle_request
  res = method(**params)
File: "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 201, in _dynamicMethod
  result = fn(*methodArgs)
File: "/usr/lib/python2.7/site-packages/vdsm/API.py", line 914, in prepare
  allowIllegal=allowIllegal)
File: "/usr/lib/python2.7/site-packages/vdsm/storage/dispatcher.py", line 73, in wrapper
  result = ctask.prepare(func, *args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 108, in wrapper
  return m(self, *a, **kw)
File: "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 1179, in prepare
  result = self._run(func, *args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
  return fn(*args, **kargs)
File: "<string>", line 2, in prepareImage
File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method
  ret = func(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3173, in prepareImage
  dom = sdCache.produce(sdUUID)
File: "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110, in produce
  domain.getRealDomain()
File: "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain
  return self._cache._realProduce(self._sdUUID)
File: "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 125, in _realProduce
  self._syncroot.wait()
File: "/usr/lib/python2.7/site-packages/pthreading.py", line 127, in wait
  return self.__cond.wait()
File: "/usr/lib/python2.7/site-packages/pthread.py", line 131, in wait
  return _libpthread.pthread_cond_wait(self._cond, m.mutex()) (executor:363)
2018-09-21 00:18:19,227+0300 DEBUG (periodic/12) [storage.TaskManager.Task] (Task='d90682e5-e52f-4730-a7f5-d95e6eb93c03') moving from state init -> state preparing (task:602)


Version-Release number of selected component (if applicable):
ovirt-engine-4.2.7-0.1.el7ev.noarch
vdsm-4.20.40-1.el7ev.x86_64
Red Hat Enterprise Linux Server 7.6 Beta (Maipo)


How reproducible:
Reproduce only once on several TC's , did not see it on other runs done locally.

Steps to Reproduce:
1.Create VM from template
2.Start VM
3.Add 4 disks to VM

Actual results:
PrepareImageVDSCommand fails -  Unable to update the image info

Expected results:


Additional info:
This is all the time it failed in this multiple tests sometimes on vms_create and sometime on disks_create method :

> zcat engine.log-20180921.gz | grep "Unable to update" -B 2                                                                
2018-09-21 00:06:00,215+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (EE-ManagedThreadFactory-engine-Thread-39587) [vms_syncAction_4ce8918e-ef3e-45eb] Command 'PrepareImageVDSCommand(HostName = host_mixed_1, PrepareImageVDSCommandParameters:{hostId='309aaa0d-4eda-4a52-b64a-adc9394891d4'})' execution failed: null
2018-09-21 00:06:00,215+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (EE-ManagedThreadFactory-engine-Thread-39587) [vms_syncAction_4ce8918e-ef3e-45eb] FINISH, PrepareImageVDSCommand, log id: 3becef05
2018-09-21 00:06:00,216+03 ERROR [org.ovirt.engine.core.bll.snapshots.TryBackToSnapshotCommand] (EE-ManagedThreadFactory-engine-Thread-39587) [vms_syncAction_4ce8918e-ef3e-45eb] Unable to update the image info for image 'c73d7eee-b4c0-4e82-9bb8-bf94c236e509' (image group: '46766572-0e81-49b5-884a-ea2958330d1f') on domain 'a133c0a9-cb59-4e06-b777-6a10e9ff189d'
--
2018-09-21 00:11:19,293+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-62) [vms_create_0cc6676d-1105-4ff2] Command 'PrepareImageVDSCommand(HostName = host_mixed_1, PrepareImageVDSCommandParameters:{hostId='309aaa0d-4eda-4a52-b64a-adc9394891d4'})' execution failed: null
2018-09-21 00:11:19,293+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-62) [vms_create_0cc6676d-1105-4ff2] FINISH, PrepareImageVDSCommand, log id: 217ead5d
2018-09-21 00:11:19,293+03 ERROR [org.ovirt.engine.core.bll.snapshots.CreateSnapshotFromTemplateCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-62) [vms_create_0cc6676d-1105-4ff2] Unable to update the image info for image '164acf8d-a5c1-4682-aaed-f4c8e34c719b' (image group: '1054f133-c558-4216-b9ab-0ca4c87e1adf') on domain 'a133c0a9-cb59-4e06-b777-6a10e9ff189d'
--
2018-09-21 00:17:18,383+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-80) [disks_create_0f2d16c8-5eec-4d82] Command 'PrepareImageVDSCommand(HostName = host_mixed_1, PrepareImageVDSCommandParameters:{hostId='309aaa0d-4eda-4a52-b64a-adc9394891d4'})' execution failed: null
2018-09-21 00:17:18,383+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-80) [disks_create_0f2d16c8-5eec-4d82] FINISH, PrepareImageVDSCommand, log id: 1ac3b01c
2018-09-21 00:17:18,383+03 ERROR [org.ovirt.engine.core.bll.storage.disk.image.AddImageFromScratchCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-80) [disks_create_0f2d16c8-5eec-4d82] Unable to update the image info for image '9c382740-baa0-4383-9aca-886fd3f0347e' (image group: '6bf68208-3c93-4e1e-b458-8d4bc5366bb3') on domain 'a133c0a9-cb59-4e06-b777-6a10e9ff189d'
--
2018-09-21 00:20:43,935+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-37) [vms_create_95356fa3-c8df-4cf0] Command 'PrepareImageVDSCommand(HostName = host_mixed_1, PrepareImageVDSCommandParameters:{hostId='309aaa0d-4eda-4a52-b64a-adc9394891d4'})' execution failed: null
2018-09-21 00:20:43,935+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-37) [vms_create_95356fa3-c8df-4cf0] FINISH, PrepareImageVDSCommand, log id: 7d8c2587
2018-09-21 00:20:43,935+03 ERROR [org.ovirt.engine.core.bll.snapshots.CreateSnapshotFromTemplateCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-37) [vms_create_95356fa3-c8df-4cf0] Unable to update the image info for image '968e0e03-54ce-4660-89d6-0f34f34a3204' (image group: 'c979fb4c-e14f-4e83-897d-cb895a379190') on domain 'a133c0a9-cb59-4e06-b777-6a10e9ff189d'
--
2018-09-21 00:27:28,534+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-30) [disks_create_4291f5e2-c829-4bc0] Command 'PrepareImageVDSCommand(HostName = host_mixed_1, PrepareImageVDSCommandParameters:{hostId='309aaa0d-4eda-4a52-b64a-adc9394891d4'})' execution failed: null
2018-09-21 00:27:28,534+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-30) [disks_create_4291f5e2-c829-4bc0] FINISH, PrepareImageVDSCommand, log id: b8336a
2018-09-21 00:27:28,534+03 ERROR [org.ovirt.engine.core.bll.storage.disk.image.AddImageFromScratchCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-30) [disks_create_4291f5e2-c829-4bc0] Unable to update the image info for image '9e4b17aa-dd2e-42e3-96bb-2d87e59cffbc' (image group: '4d0493d0-2366-4855-a910-a29a0fac2cbb') on domain 'a133c0a9-cb59-4e06-b777-6a10e9ff189d'
--
2018-09-21 00:30:32,579+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-42) [disks_create_ecbdf1d8-f02e-4550] Command 'PrepareImageVDSCommand(HostName = host_mixed_1, PrepareImageVDSCommandParameters:{hostId='309aaa0d-4eda-4a52-b64a-adc9394891d4'})' execution failed: null
2018-09-21 00:30:32,579+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-42) [disks_create_ecbdf1d8-f02e-4550] FINISH, PrepareImageVDSCommand, log id: 251a9e1a
2018-09-21 00:30:32,579+03 ERROR [org.ovirt.engine.core.bll.storage.disk.image.AddImageFromScratchCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-42) [disks_create_ecbdf1d8-f02e-4550] Unable to update the image info for image 'e59428c3-5e94-4c8e-8182-846d9ef11af4' (image group: '1351f219-b204-4067-b5b9-a7ba8ee45a2a') on domain 'a133c0a9-cb59-4e06-b777-6a10e9ff189d'
--
2018-09-21 00:33:35,125+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-2) [vms_create_78f537e5-fdd6-4d0c] Command 'PrepareImageVDSCommand(HostName = host_mixed_1, PrepareImageVDSCommandParameters:{hostId='309aaa0d-4eda-4a52-b64a-adc9394891d4'})' execution failed: null
2018-09-21 00:33:35,125+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-2) [vms_create_78f537e5-fdd6-4d0c] FINISH, PrepareImageVDSCommand, log id: 5ec1d0f2
2018-09-21 00:33:35,125+03 ERROR [org.ovirt.engine.core.bll.snapshots.CreateSnapshotFromTemplateCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-2) [vms_create_78f537e5-fdd6-4d0c] Unable to update the image info for image '56e795b6-1141-4416-aaf5-418a443f35f3' (image group: 'f84a2ab0-3fff-47ca-abda-95df2621d26c') on domain 'a133c0a9-cb59-4e06-b777-6a10e9ff189d'
--
2018-09-21 00:36:41,670+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-49) [disks_create_ad18e25e-751c-4558] Command 'PrepareImageVDSCommand(HostName = host_mixed_1, PrepareImageVDSCommandParameters:{hostId='309aaa0d-4eda-4a52-b64a-adc9394891d4'})' execution failed: null
2018-09-21 00:36:41,670+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-49) [disks_create_ad18e25e-751c-4558] FINISH, PrepareImageVDSCommand, log id: 195eb5eb
2018-09-21 00:36:41,670+03 ERROR [org.ovirt.engine.core.bll.storage.disk.image.AddImageFromScratchCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-49) [disks_create_ad18e25e-751c-4558] Unable to update the image info for image '303d0951-b608-4034-bb1c-eeddab78efaf' (image group: '7c6b1f46-2ce7-4c81-a07b-26a5ebfdd7ac') on domain 'a133c0a9-cb59-4e06-b777-6a10e9ff189d'

Comment 1 Avihai 2018-10-07 12:45:34 UTC
Created attachment 1491323 [details]
Full engine logs for several TC's with the failiure

Comment 2 Tal Nisan 2018-10-07 14:19:53 UTC
Seems like more of an infra issue since the workers are blocked

Comment 3 Roni 2018-12-20 12:13:12 UTC
see similar bug: https://bugzilla.redhat.com/show_bug.cgi?id=1653258

Comment 4 Piotr Kliczewski 2018-12-20 12:54:49 UTC
Correct worker is blocked due to sdc module. We can assume that thread executor is broken because a thread was stuck due to the code running inside.

From my perspective the storage code should handle it and make sure a worker won't get stuck.

Comment 6 Nir Soffer 2019-01-08 15:02:29 UTC
Avihay, worker blocked is not a bug in storage (or other system). When you have issues with
block workers you need to check what is blocking the workers.

Please attach these logs to the bug:
- sanlock.log
- /var/log/messages

And output of:
- grep WARN vdsm.log

Most likely the storage (NFS?) is not responding, this can quickly block all workers in the 
system.

Comment 7 Avihai 2019-01-08 15:31:55 UTC
(In reply to Nir Soffer from comment #6)
> Avihay, worker blocked is not a bug in storage (or other system). When you
> have issues with
> block workers you need to check what is blocking the workers.
Can you be more specific on what exactly to check?

> Please attach these logs to the bug:
> - sanlock.log
> - /var/log/messages
> 
> And output of:
> - grep WARN vdsm.log
> 
> Most likely the storage (NFS?) is not responding, this can quickly block all
> workers in the 
> system.

Nir, It's been almost 3 months since this bug was opened and obviously, by now I do not have the requested logs.
We(QE) will try to see if this bug reproduces in our latest runs and will provide new logs once it reoccurs.

Elad, please tell notify if you encounter this issue during automation Tier3 runs.

Comment 8 Nir Soffer 2019-01-08 15:46:45 UTC
Avihay, without the logs we cannot do anything. Please close as insifficient data and reopen with proper logs if this happen again.

Comment 9 Avihai 2019-01-09 06:26:49 UTC
(In reply to Nir Soffer from comment #8)
> Avihay, without the logs we cannot do anything. Please close as insifficient
> data and reopen with proper logs if this happen again.

I leave it up to you as the developer to close it with whatever reason you see fit.