1565052 – Sometimes failed jobs on the engine only terminated after 50 hours leaving disk locked

Bug 1565052 - Sometimes failed jobs on the engine only terminated after 50 hours leaving disk locked

Summary: Sometimes failed jobs on the engine only terminated after 50 hours leaving di...

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	BLL.Storage
Sub Component:
Version:	4.2.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	ovirt-4.3.0
Target Release:	---
Assignee:	Idan Shaby
QA Contact:	Elad
Docs Contact:
URL:
Whiteboard:
Depends On:	1565040
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-09 09:05 UTC by Kevin Alon Goldblatt
Modified:	2018-07-08 12:55 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-07-08 12:55:57 UTC
oVirt Team:	Storage
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-4.3+ ylavi: exception+

Attachments	(Terms of Use)
engine and vdsm logs (5.22 MB, application/x-gzip) 2018-04-09 09:05 UTC, Kevin Alon Goldblatt	no flags	Details
View All

Description Kevin Alon Goldblatt 2018-04-09 09:05:41 UTC

Created attachment 1419156 [details]
engine and vdsm logs

Description of problem:
During Cold Move, access to the target Domain was blocked. After restoring the acces, the job is completed by the vdsm and reported as finished, but the Engine is still stuck waiting for the job completion. The disk is still locked and only  is released after 50 hours when the task is failed. This means that customers  could have to wait very long to continue working with this disk. Hanging task should be managed with a shorter timeout and failed so the customer can retry 



Version-Release number of selected component (if applicable):
ovirt-engine-4.2.2.6-0.1.el7.noarch
vdsm-4.20.23-1.el7ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. On a VM with disks write some data
2. Stop the VM and start cold move of the disks
3. During the cold move block the access to the target storage with iptables
4. The host is non-operational
5. After about 2 hours restore the connection and the host comes back up
6. The job is completed by the vdsm and reported as finished, however the engine never finishes the job and the disk remains locked
7. The timeout set for terminating this job is 50 hours. A better solution would be to fail the job sooner and allow the customer to retry the operation

Actual results:
The job is completed by the vdsm and reported as finished, however the engine never finishes the job and the disk remains locked

Expected results:
The job is completed by the vdsm and reported as finished, and the engine completes the task. The disk is no longer locked


Additional info:
------ *****  START THE MOVE DISK OPERATION HERE  ***** ----
2018-04-08 16:23:23,709+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-20) [02801675-9a0c-4bf8-ae3b-18e3828ff78a] EVENT_ID: USER_MOVED_DISK(2,008), User admin@inter
nal-authz moving disk vm08042018a_Disk1 to domain nfs_2.
2018-04-08 16:23:23,720+03 INFO  [org.ovirt.engine.core.bll.storage.disk.MoveOrCopyDiskCommand] (default task-20) [02801675-9a0c-4bf8-ae3b-18e3828ff78a] Lock freed to object 'EngineLock:{exclusiveLocks='[6d30921
6-37e7-4138-b09c-596a2838dfab=DISK]', sharedLocks=''}'
2018-04-08 16:23:23,730+03 WARN  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (default task-20) [02801675-9a0c-4bf8-ae3b-18e3828ff78a] Trying to release exclusive lock which does not exist, lock key: '6d
309216-37e7-4138-b09c-596a2838dfabDISK'
2018-04-08 16:23:23,730+03 INFO  [org.ovirt.engine.core.bll.storage.disk.MoveDiskCommand] (default task-20) [02801675-9a0c-4bf8-ae3b-18e3828ff78a] Lock freed to object 'EngineLock:{exclusiveLocks='[6d309216-37e7
-4138-b09c-596a2838dfab=DISK]', sharedLocks=''}'
2018-04-08 16:23:23,770+03 INFO  [org.ovirt.engine.core.bll.storage.disk.MoveDiskCommand] (default task-25) [7add9547-a996-4797-8cf4-2821e8eb1e69] Lock Acquired to object 'EngineLock:{exclusiveLocks='[874c3126-1
44e-4dd1-abc3-292a208782d2=DISK]', sharedLocks=''}'
2018-04-08 16:23:23,778+03 INFO  [org.ovirt.engine.core.bll.storage.disk.MoveDiskCommand] (default task-25) [7add9547-a996-4797-8cf4-2821e8eb1e69] Running command: MoveDiskCommand internal: false. Entities affec
ted :  ID: 874c3126-144e-4dd1-abc3-292a208782d2 Type: DiskAction group CONFIGURE_DISK_STORAGE with role type USER,  ID: f3d0fa77-a26f-4e99-99db-2d0a1770acbe Type: StorageAction group CREATE_DISK with role type U
SER
2018-04-08 16:23:24,116+03 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-91) [02801675-9a0c-4bf8-ae3b-18e3828ff78a] Command 'MoveOrCop
yDisk' (id: '546f807e-6ba4-438b-85a8-cc644e4ccfbd') waiting on child command id: '77d60ad1-4767-4ca6-ab03-81e5626e8599' type:'MoveImageGroup' to complete
2018-04-08 16:23:24,126+03 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-91) [02801675-9a0c-4bf8-ae3b-18e3828ff78a] Command 'CopyImageGrou
pWithData' (id: 'f5db3294-c18a-4a97-9ea6-d47eccfe53f1') waiting on child command id: '4feb21b1-695d-46f4-a2b9-d542cb61a1ec' type:'CloneImageGroupVolumesStructure' to complete
2018-04-08 16:23:24,127+03 INFO  [org.ovirt.engine.core.bll.storage.disk.MoveOrCopyDiskCommand] (default task-25) [7add9547-a996-4797-8cf4-2821e8eb1e69] Running command: MoveOrCopyDiskCommand internal: true. Ent
ities affected :  ID: 874c3126-144e-4dd1-abc3-292a208782d2 Type: DiskAction group CONFIGURE_DISK_STORAGE with role type USER,  ID: f3d0fa77-a26f-4e99-99db-2d0a1770acbe Type: StorageAction group CREATE_DISK with 
role type USER
2018-04-08 16:23:24,131+03 INFO  [org.ovirt.engine.core.bll.storage.disk.image.CloneImageGroupVolumesStructureCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-91) [02801675-9a0c-4bf8-ae3b-18e3828ff78a] S
tarting child command 1 of 1, image 'b82bed28-b15d-4c1e-8e0f-08a312697933'
2018-04-08 16:23:24,193+03 INFO  [org.ovirt.engine.core.bll.storage.disk.image.CreateVolumeContainerCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-91) [02801675-9a0c-4bf8-ae3b-18e3828ff78a] Running com
mand: CreateVolumeContainerCommand internal: true.
.
.
. ---- ***** ADDING TASK HERE ***** ----
2018-04-08 16:23:25,132+03 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedThreadFactory-engineScheduled-Thread-91) [02801675-9a0c-4bf8-ae3b-18e3828ff78a] Adding task 'afcd1e8e-877e-4035-93ba
-b9ff29168239' (Parent Command 'CreateVolumeContainer', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'), polling hasn't started yet..
2018-04-08 16:23:25,150+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-91) [02801675-9a0c-4bf8-ae3b-18e3828ff78a] BaseAsyncTask::startPollingTask: Starting to poll task 'afcd1e8e-877e-4035-93ba-b9ff29168239'.
2018-04-08 16:23:25,178+03 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-91) [02801675-9a0c-4bf8-ae3b-18e3828ff78a] Command 'MoveImageGroup' (id: '77d60ad1-4767-4ca6-ab03-81e5626e8599') waiting on child command id: 'f5db3294-c18a-4a97-9ea6-d47eccfe53f1' type:'CopyImageGroupWithData' to complete
.
.
.---- ******* STORAGE HEARTBEAT NO LONGER RECEIVED ***** ----
74c3126-144e-4dd1-abc3-292a208782d2', postZeros='false', discard='true', forceDelete='false'}), log id: 17fffde
2018-04-08 16:25:19,107+03 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Heartbeat exceeded for host 'storage-ge10-vdsm1.scl.lab.tlv.redhat.com', last response arrived 7121 
ms ago.
2018-04-08 16:25:19,119+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-4) [5c533c95] EVENT_ID: IRS_BROKER_COMMAND_FAILURE(10,803),
 VDSM command SetVolumeDescriptionVDS failed: Heartbeat exceeded
2018-04-08 16:25:19,121+03 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-4) [5c533c95] ERROR, SetVolumeDescriptionVDSCommand( SetVolumeDescrip
tionVDSCommandParameters:{storagePoolId='a9e2b4b2-6c07-4af8-8fb0-5675878e2376', ignoreFailoverLimit='false', storageDomainId='2d0e3644-2ba7-428b-855c-13d35c4e650f', imageGroupId='ee2bb254-ffab-4d7c-9fc4-46712b2c
f0b2', imageId='f369d96d-f77c-44b3-9871-647b479d3c50'}), exception: VDSGenericException: VDSNetworkException: Heartbeat exceeded, log id: 3d9ea579
2018-04-08 16:25:19,121+03 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-4) [5c533c95] Exception
2018-04-08 16:25:19,149+03 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to storage-ge10-vdsm1.scl.lab.tlv.redhat.com/10.35.82.87
2018-04-08 16:25:19,153+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-4) [5c533c95] Command 'SpmStatusVDSCommand(HostName = host_mixed_1, SpmStatusVDSCommandParameters:{hostId='f6e3b61d-a07a-40f1-820d-d7fd9c67f6f7', storagePoolId='a9e2b4b2-6c07-4af8-8fb0-5675878e2376'})' execution failed: java.net.ConnectException: Connection refused
2018-04-08 16:25:19,153+03 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engineScheduled-Thread-4) [5c533c95] Irs placed on server 'f6e3b61d-a07a-40f1-820d-d7fd9c67f6f7' failed. Proceed Failover
.
.
. ---- ***** HOST BACK UP HERE *** ---
2018-04-08 18:33:00,984+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [247a258f] EVENT_ID: VDS_DETECTED(13), Status of host host_mixed_1 was set to Up.
.
.
. ---- ***** Still waiting for task to complete several hours later - THIS CONTINUES FOR 50 HOURS *** ---
2018-04-09 01:23:54,333+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHostJobsVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-96) [02801675-9a0c-4bf8-ae3b-18e3828ff78a] START, GetHostJobsVDS
Command(HostName = host_mixed_1, GetHostJobsVDSCommandParameters:{hostId='f6e3b61d-a07a-40f1-820d-d7fd9c67f6f7', type='storage', jobIds='[1babe44f-82e2-451b-9d0d-87bd024c607c]'}), log id: 3fa59fa5
2018-04-09 01:23:54,344+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHostJobsVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-96) [02801675-9a0c-4bf8-ae3b-18e3828ff78a] FINISH, GetHostJobsVD
SCommand, return: {}, log id: 3fa59fa5
2018-04-09 01:23:54,349+03 INFO  [org.ovirt.engine.core.bll.StorageJobCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-96) [02801675-9a0c-4bf8-ae3b-18e3828ff78a] Command CopyData id: '5e547444-7c34-4aa1
-b50f-518d36d2931c': attempting to determine the job status by polling the entity.
2018-04-09 01:23:54,361+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-96) [02801675-9a0c-4bf8-ae3b-18e3828ff78a] START, GetVolumeInf
oVDSCommand(HostName = host_mixed_2, GetVolumeInfoVDSCommandParameters:{hostId='fea2403a-5581-4920-80db-a015412f72ef', storagePoolId='a9e2b4b2-6c07-4af8-8fb0-5675878e2376', storageDomainId='f3d0fa77-a26f-4e99-99
db-2d0a1770acbe', imageGroupId='6d309216-37e7-4138-b09c-596a2838dfab', imageId='b82bed28-b15d-4c1e-8e0f-08a312697933'}), log id: 61872b66
2018-04-09 01:23:54,436+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-96) [02801675-9a0c-4bf8-ae3b-18e3828ff78a] FINISH, GetVolumeIn
foVDSCommand, return: org.ovirt.engine.core.common.businessentities.storage.DiskImage@177c3c9a, log id: 61872b66
2018-04-09 01:23:54,436+03 INFO  [org.ovirt.engine.core.bll.storage.disk.image.VdsmImagePoller] (EE-ManagedThreadFactory-engineScheduled-Thread-96) [02801675-9a0c-4bf8-ae3b-18e3828ff78a] Command CopyData id: '5e547444-7c34-4aa1-b50f-518d36d2931c': the volume lease is not FREE - the job is running
2018-04-09 01:23:54,436+03 INFO  [org.ovirt.engine.core.bll.StorageJobCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-96) [02801675-9a0c-4bf8-ae3b-18e3828ff78a] Command CopyData id: '5e547444-7c34-4aa1-b50f-518d36d2931c': waiting for job '1babe44f-82e2-451b-9d0d-87bd024c607c' on host 'host_mixed_1' (id: 'f6e3b61d-a07a-40f1-820d-d7fd9c67f6f7') to complete


VDSM.log:
--------------------------------------------------------------
--- *** START COPY OPERATION **** ---
2018-04-08 16:23:44,471+0300 INFO  (jsonrpc/7) [vdsm.api] START sdm_copy_data(job_id='1babe44f-82e2-451b-9d0d-87bd024c607c', source={'img_id': '6d309216-37e7-4138-b09c-596a2838dfab', 'sd_id': '74668615-6112-4f5c-b0f0-8f817da34df6', 'endpoint_type': 'div', 'vol_id': 'b82bed28-b15d-4c1e-8e0f-08a312697933'}, destination={'generation': 0, 'img_id': '6d309216-37e7-4138-b09c-596a2838dfab', 'sd_id': 'f3d0fa77-a26f-4e99-99db-2d0a1770acbe', 'endpoint_type': 'div', 'vol_id': 'b82bed28-b15d-4c1e-8e0f-08a312697933'}) from=::ffff:10.35.162.9,43264, flow_id=02801675-9a0c-4bf8-ae3b-18e3828ff78a, task_id=b65a563b-e25e-42d1-b9ec-4aeda4a84abd (api:46)
2018-04-08 16:23:44,473+0300 DEBUG (jsonrpc/7) [storage.TaskManager] scheduled job copy_data for task b65a563b-e25e-42d1-b9ec-4aeda4a84abd  (taskManager:79)
2018-04-08 16:23:44,474+0300 INFO  (jsonrpc/7) [vdsm.api] FINISH sdm_copy_data return=None from=::ffff:10.35.162.9,43264, flow_id=02801675-9a0c-4bf8-ae3b-18e3828ff78a, task_id=b65a563b-e25e-42d1-b9ec-4aeda4a84abd (api:52)
2018-04-08 16:23:44,474+0300 DEBUG (jsonrpc/7) [storage.TaskManager.Task] (Task='b65a563b-e25e-42d1-b9ec-4aeda4a84abd') Prepare: 1 jobs exist, move to acquiring (task:1193)
2018-04-08 16:23:44,475+0300 DEBUG (jsonrpc/7) [storage.TaskManager.Task] (Task='b65a563b-e25e-42d1-b9ec-4aeda4a84abd') moving from state preparing -> state acquiring (task:602)
2018-04-08 16:23:44,475+0300 DEBUG (jsonrpc/7) [storage.TaskManager.Task] (Task='b65a563b-e25e-42d1-b9ec-4aeda4a84abd') moving from state acquiring -> state queued (task:602)
2018-04-08 16:23:44,476+0300 DEBUG (jsonrpc/7) [storage.TaskManager] queuing task: b65a563b-e25e-42d1-b9ec-4aeda4a84abd (taskManager:57)
2018-04-08 16:23:44,477+0300 DEBUG (jsonrpc/7) [storage.TaskManager] task queued: b65a563b-e25e-42d1-b9ec-4aeda4a84abd (taskManager:65)
2018-04-08 16:23:44,478+0300 DEBUG (jsonrpc/7) [storage.TaskManager.Task] (Task='b65a563b-e25e-42d1-b9ec-4aeda4a84abd') returning (task:1198)
2018-04-08 16:23:44,478+0300 DEBUG (jsonrpc/7) [storage.TaskManager.Task] (Task='b65a563b-e25e-42d1-b9ec-4aeda4a84abd') ref 0 aborting False (task:1002)
2018-04-08 16:23:44,479+0300 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call SDM.copy_data succeeded in 0.01 seconds (__init__:573)
2018-04-08 16:23:44,477+0300 DEBUG (tasks/4) [storage.ThreadPool] Number of running tasks: 1 (threadPool:60)
2018-04-08 16:23:44,480+0300 INFO  (tasks/4) [storage.ThreadPool.WorkerThread] START task b65a563b-e25e-42d1-b9ec-4aeda4a84abd (cmd=<bound method Task.commit of <vdsm.storage.task.Task instance at 0x7f2391add5f0>>, args=None) (threadPool:208)
2018-04-08 16:23:44,481+0300 DEBUG (tasks/4) [storage.TaskManager.Task] (Task='b65a563b-e25e-42d1-b9ec-4aeda4a84abd') committing task: b65a563b-e25e-42d1-b9ec-4aeda4a84abd (task:1210)
2018-04-08 16:23:44,482+0300 DEBUG (tasks/4) [storage.TaskManager.Task] (Task='b65a563b-e25e-42d1-b9ec-4aeda4a84abd') moving from state queued -> state running (task:602)
2018-04-08 16:23:44,482+0300 DEBUG (tasks/4) [storage.TaskManager.Task] (Task='b65a563b-e25e-42d1-b9ec-4aeda4a84abd') Task.run: running job 0: copy_data: <bound method Job.run of <Job id=1babe44f-82e2-451b-9d0d-87bd024c607c status=pending at 0x139790744739152>> (args: () kwargs: {}) (task:914)
2018-04-08 16:23:44,483+0300 DEBUG (tasks/4) [storage.TaskManager.Task] (Task='b65a563b-e25e-42d1-b9ec-4aeda4a84abd') Job.run: running copy_data: <bound method Job.run of <Job id=1babe44f-82e2-451b-9d0d-87bd024c607c status=pending at 0x139790744739152>> (args: () kwargs: {}) callback None (task:333)
2018-04-08 16:23:44,483+0300 INFO  (tasks/4) [root] Running job '1babe44f-82e2-451b-9d0d-87bd024c607c'... (jobs:183)
2018-04-08 16:23:44,485+0300 DEBUG (tasks/4) [storage.ResourceManager] Trying to register resource '00_storage.74668615-6112-4f5c-b0f0-8f817da34df6' for lock type 'shared' (resourceManager:495)
2018-04-08 16:23:44,486+0300 DEBUG (tasks/4) [storage.ResourceManager] Resource '00_storage.74668615-6112-4f5c-b0f0-8f817da34df6' is free. Now locking as 'shared' (1 active user) (resourceManager:552)
2018-04-08 16:23:44,487+0300 DEBUG (tasks/4) [storage.ResourceManager.Request] (ResName='00_storage.74668615-6112-4f5c-b0f0-8f817da34df6', ReqID='7c872634-5fa2-4cc7-8a36-7d228734d9c4') Granted request (resourceManager:222)
2018-04-08 16:23:44,488+0300 DEBUG (tasks/4) [storage.ResourceManager] Trying to register resource '00_storage.f3d0fa77-a26f-4e99-99db-2d0a1770acbe' for lock type 'shared' (resourceManager:495)
2018-04-08 16:23:44,489+0300 DEBUG (tasks/4) [storage.ResourceManager] Resource '00_storage.f3d0fa77-a26f-4e99-99db-2d0a1770acbe' is free. Now locking as 'shared' (1 active user) (resourceManager:552)
2018-04-08 16:23:44,489+0300 DEBUG (tasks/4) [storage.ResourceManager.Request] (ResName='00_storage.f3d0fa77-a26f-4e99-99db-2d0a1770acbe', ReqID='1a6cd1d6-bfdb-47e3-a201-d2bf1fde31a6') Granted request (resourceManager:222)
2018-04-08 16:23:44,491+0300 DEBUG (tasks/4) [storage.ResourceManager] Trying to register resource '01_img_74668615-6112-4f5c-b0f0-8f817da34df6.6d309216-37e7-4138-b09c-596a2838dfab' for lock type 'shared' (resourceManager:495)
2018-04-08 16:23:44,493+0300 DEBUG (tasks/4) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0 /usr/bin/sudo -n /sbin/lvm lvs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/3514f0c5a5160049e|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags 74668615-6112-4f5c-b0f0-8f817da34df6 (cwd None) (commands:65)
2018-04-08 16:23:44,570+0300 INFO  (jsonrpc/2) [api.host] START getJobs(job_type='storage', job_ids=['1babe44f-82e2-451b-9d0d-87bd024c607c']) from=::ffff:10.35.162.9,43264, flow_id=02801675-9a0c-4bf8-ae3b-18e3828ff78a (api:46)
2018-04-08 16:23:44,572+0300 INFO  (jsonrpc/2) [api.host] FINISH getJobs return={'status': {'message': 'Done', 'code': 0}, 'jobs': {'1babe44f-82e2-451b-9d0d-87bd024c607c': {'status': 'running', 'job_type': 'storage', 'id': '1babe44f-82e2-451b-9d0d-87bd024c607c', 'description': 'copy_data'}}} from=::ffff:10.35.162.9,43264, flow_id=02801675-9a0c-4bf8-ae3b-18e3828ff78a (api:52)

--- *** FINISH job reported *** ---
2018-04-08 18:33:06,438+0300 INFO  (jsonrpc/2) [api.host] START getJobs(job_type='storage', job_ids=['1babe44f-82e2-451b-9d0d-87bd024c607c']) from=::ffff:10.35.162.9,42792, flow_id=02801675-9a0c-4bf8-ae3b-18e382
8ff78a (api:46)
2018-04-08 18:33:06,439+0300 INFO  (jsonrpc/2) [api.host] FINISH getJobs return={'status': {'message': 'Done', 'code': 0}, 'jobs': {}} from=::ffff:10.35.162.9,42792, flow_id=02801675-9a0c-4bf8-ae3b-18e3828ff78a (api:52)
2018-04-08 18:33:06,440+0300 INFO  (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call Host.getJobs succeeded in 0.00 seconds (__init__:573)
2018-04-08 18:33:06,568+0300 DEBUG (check/loop) [storage.check] START check u'/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge10__nfs__1/9e8c3e55-8a87-4528-9bbe-fae1f0ee05eb/dom_md/metadata' (delay=0.00) (check:289)

Comment 1 Yaniv Lavi 2018-04-11 08:28:23 UTC

Is this a regression from 4.1?

Comment 2 Yaniv Kaul 2018-04-24 11:01:07 UTC

This is not a blocker nor an exception, yet it is targeted for 4.2.3?

Comment 3 Kevin Alon Goldblatt 2018-04-26 10:36:40 UTC

(In reply to Yaniv Lavi from comment #1)
> Is this a regression from 4.1?

I was not able to reproduce this on 4.1. This seems to be a statistical issue as it does not happen all the time.

Comment 4 Idan Shaby 2018-07-08 12:55:57 UTC

As bug 1565040 was solved, and there's no known way to reproduce it, closing for now.
If you manage to reproduce, please reopen and state the steps to reproduce.

Note You need to log in before you can comment on or make changes to this bug.