Bug 1323478

Summary: Engine continually shows Copying Disk task while no running tasks are reported in connected vdsm hosts
Product: [oVirt] ovirt-engine Reporter: Gilad Lazarovich <glazarov>
Component: BLL.StorageAssignee: Allon Mureinik <amureini>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Aharon Canan <acanan>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.6.0CC: acanan, bugs, glazarov
Target Milestone: ovirt-4.1.0-alphaKeywords: Automation
Target Release: ---Flags: amureini: ovirt-4.1?
glazarov: planning_ack?
glazarov: devel_ack?
glazarov: testing_ack?
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-14 09:40:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
engine and vdsm logs none

Description Gilad Lazarovich 2016-04-03 13:16:36 UTC
Created attachment 1142989 [details]
engine and vdsm logs

Description of problem:
Copying template disks into storage domains via API hangs in the engine, but is processed successfully in vdsm

Version-Release number of selected component (if applicable):
3.6.5-0.1

How reproducible:
50% - 100%

Steps to Reproduce:
1. Create a template
2. Copy the template disks into several storage domains concurrently (using API)

Actual results:
The copy template disk operation succeeds as far as vdsm goes, but the tasks hang in the engine (for example: Copying Disk golden_mixed_virtio_template from <UNKNOWN>  to iscsi_2)

Expected results:
The job/task state should be in sync between the engine and the vdsm hosts

Additional info:
Here are the relevant tasks from the vdsm host:
[root@storage-ge8-vdsm1 ~]# vdsClient -s 0 getAllTasks
3277109b-652f-41fc-a2bc-8193bec50dd4 :
	 verb = copyImage
	 code = 0
	 state = finished
	 tag = spm
	 result = {'uuid': '4eb81a0f-fa97-4510-b87c-c1cc72a6e533'}
	 message = 1 jobs completed successfully
	 id = 3277109b-652f-41fc-a2bc-8193bec50dd4
9c5d5aca-6c25-4aa6-a87c-7a18c6fa1e01 :
	 verb = copyImage
	 code = 0
	 state = finished
	 tag = spm
	 result = {'uuid': '4eb81a0f-fa97-4510-b87c-c1cc72a6e533'}
	 message = 1 jobs completed successfully
	 id = 9c5d5aca-6c25-4aa6-a87c-7a18c6fa1e01
addee2c9-1f57-4e35-99d6-7d540cbfcb3f :
	 verb = copyImage
	 code = 0
	 state = finished
	 tag = spm
	 result = {'uuid': '4eb81a0f-fa97-4510-b87c-c1cc72a6e533'}
	 message = 1 jobs completed successfully
	 id = addee2c9-1f57-4e35-99d6-7d540cbfcb3f

Comment 1 Gilad Lazarovich 2016-04-03 13:30:20 UTC
Please note that the job monitoring shows each of the Copying Disk operations as started, this isn't updated once the copy goes through

Comment 2 Allon Mureinik 2016-04-03 14:29:47 UTC
(In reply to Gilad Lazarovich from comment #1)
> Please note that the job monitoring shows each of the Copying Disk
> operations as started, this isn't updated once the copy goes through
The engine, by definition, lags behind VDSM.
Does this get sorted out eventually?

Comment 3 Gilad Lazarovich 2016-04-11 12:25:52 UTC
Allon, note that I've seen this go on for more than 7 hours. The automation now clears up these running tasks between each test plan, so the environment isn't kept in this state for more than a few hours.

Comment 4 Allon Mureinik 2016-04-14 09:40:48 UTC
Nothing reproducible here then.
If you have a trustworthy reproduction, feel free to reopen.