Bug 1122639

Summary: Error complaining about missing images for exported VM
Product: Red Hat Enterprise Virtualization Manager Reporter: Fabian Salamanca <fsalaman>
Component: ovirt-engineAssignee: Ravi Nori <rnori>
Status: CLOSED CURRENTRELEASE QA Contact: Jiri Belka <jbelka>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.4.0CC: amureini, ecohen, gklein, iheim, laravot, lpeer, mlipchuk, oourfali, rbalakri, Rhev-m-bugs, rruiz, yeylon
Target Milestone: ---   
Target Release: 3.5.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: ovirt-3.5.0_rc1.1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-17 17:13:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 1142923, 1156165    
Attachments:
Description Flags
engine.log, vdsm, messages none

Description Fabian Salamanca 2014-07-23 16:47:33 UTC
Description of problem: 
When I export any machine with more than 2 vdisks, any type, VDSM
complains about an ERROR and aborts de export of images, but the OVF is
left as a successful export.

RHEVM reports successful export and when I try to import de VM, I
receive a "missing image" error. Which is correct, the OVF is left
without any images.


Version-Release number of selected component (if applicable): rhevm-3.4.0-0.22


How reproducible: 100%


Steps to Reproduce:
1. Export VM with 3 vdisks or more
2. Verify RHEVM reports successful export
3. Import the exported VM
4. Error complaining about missing images for exported VM

Actual results:

Error complaining about missing images for exported VM

Expected results:

VM imported successfully

Additional info:

Comment 1 Fabian Salamanca 2014-07-23 16:49:27 UTC
Created attachment 920296 [details]
engine.log, vdsm, messages

Comment 3 Liron Aravot 2014-07-24 08:23:47 UTC
From brief look it seems that we have 2 issues here -

We can take for example task d9f0fba3-63ae-4a4c-a904-525907a0f16c [a]

1.  it can be seen that after a mintue the engine detects the task as zombie task and attempts to stop it, which is obviously wrong. this is either caused by wrong value of the config value AsyncTaskZombieTaskLifeInMinutes or by wrong comprasion.

-----------------------------------------------------------------------
2014-07-22 21:01:46,483 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (DefaultQuartzScheduler_Worker-96) SPMAsyncTask::PollTask: Polling task d9f0fba3-63ae-4
a4c-a904-525907a0f16c (Parent Command ExportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) returned status running.
2014-07-22 21:01:46,484 INFO  [org.ovirt.engine.core.bll.AsyncTaskManager] (DefaultQuartzScheduler_Worker-96) Finished polling Tasks, will poll again in 10 se
conds.
2014-07-22 21:01:47,885 INFO  [org.ovirt.engine.core.bll.scheduling.PolicyUnitImpl] (DefaultQuartzScheduler_Worker-3) There is no over-utilized host in cluste
r Telcel_cluster_01
2014-07-22 21:02:38,719 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-16) Correlation ID: null, 
Call Stack: null, Custom Event ID: -1, Message: Stopping async task ExportVm that started at Tue Jul 22 21:01:36 CDT 2014
2014-07-22 21:02:38,719 INFO  [org.ovirt.engine.core.bll.AsyncTaskManager] (DefaultQuartzScheduler_Worker-16) Cleaning zombie tasks: Stopping async task Expor
tVm that started at 7/22/14 9:01 PM
2014-07-22 21:02:38,720 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (DefaultQuartzScheduler_Worker-16) SPMAsyncTask::StopTask: Attempting to stop task ab27
ab6c-0877-4a99-b0c0-f3403247fb10 (Parent Command ExportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters).
-----------------------------------------------------------------------

2. when a task is stopped by the engine when its detected as zombie, endSuccesfully shouldn't be called for the parent command, but endWithFailue. as endSuccesfully is being executed the engine marks the export as succesful.

2014-07-22 21:02:50,115 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-4-thread-11) Correlation ID: 15933d80, Job ID: 23623117-7370-4d7e-9e1c-d13c1cb66efa, Call Stack: null, Custom Event ID: -1, Message: Vm Backups_02 was exported successfully to EXPORT_01


[a] :

-----------------------------------------------------------------------
2014-07-22 21:01:35,906 INFO  [org.ovirt.engine.core.bll.AsyncTaskManager] (org.ovirt.thread.pool-4-thread-1) [37d34d0d] Adding task d9f0fba3-63ae-4a4c-a904-5
25907a0f16c (Parent Command ExportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters), polling hasn't started yet..

2014-07-22 21:01:36,872 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-4-thread-1) [5a1eea6a] Correlation ID: 15933d80, Job ID: 23623117-7370-4d7e-9e1c-d13c1cb66efa, Call Stack: null, Custom Event ID: -1, Message: Starting export Vm Backups_02 to EXPORT_01


2014-07-22 21:01:36,873 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (org.ovirt.thread.pool-4-thread-1) [5a1eea6a] BaseAsyncTask::startPollingTask: Starting to poll task d9f0fba3-63ae-4a4c-a904-525907a0f16c.
-----------------------------------------------------------------------

Comment 4 Liron Aravot 2014-07-24 08:27:21 UTC
Fabian, additionaly please run this query and report back the results so the issue will be better analyzed.

select * from vdc_options where option_name = 'AsyncTaskZombieTaskLifeInMinutes';

thanks.

Comment 5 Fabian Salamanca 2014-07-24 16:25:27 UTC
Sure!



engine=# select * from vdc_options where option_name =
engine-# 'AsyncTaskZombieTaskLifeInMinutes';
 option_id |           option_name            | option_value | version 
-----------+----------------------------------+--------------+---------
        28 | AsyncTaskZombieTaskLifeInMinutes | 1            | general
(1 row)

Comment 6 Liron Aravot 2014-07-24 16:33:22 UTC
Thanks Fabian,
Have you ever updated the config values? because the default value for that is 3000.

please run the following -
update vdc_options set option_value = 3000 where option_name = 'AsyncTaskZombieTaskLifeInMinutes';

and your problem should be solved (atleast for exports that take less than 50 hours).

regardless of that, what needs to be solved is what i specified in under #1 in my comment - https://bugzilla.redhat.com/show_bug.cgi?id=1122639#c3.

Comment 7 Fabian Salamanca 2014-07-24 16:45:25 UTC
Hi Liron!

This is an in place upgrade from a RHEV 3.3.2, in that version ZombieTasks where leaving me with locked disks, so I changed the value and decided to upgrade, forgot to take back that value to the default.

Now I have the default value:

[root@rhev-mgr dbscripts]# rhevm-config -g AsyncTaskZombieTaskLifeInMinutes
AsyncTaskZombieTaskLifeInMinutes: 3000 version: general
[root@rhev-mgr dbscripts]# 

And I still have the same issue, when I try to import the VM, it complains about missing images and indeed, there are none in the images directory in the export domain:

For a 2 vdisks VM:

[root@host images]# ls -lrt
total 4
drwxr-xr-x. 2 36 kvm 4096 Jul 24 13:17 7a1e8654-9cc0-4dc5-a9e6-ef413bb4c244
[root@host images]# 

From engine.log :

2014-07-24 13:19:18,725 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (ajp-/127.0.0.1:8702-11) [545aa28b] IrsBroker::getImageInfo::Failed getting image info imageId = d03d5e87-659d-4cfa-a458-5aacbca09672 does not exist on domainName = EXPORT_01 , domainId = e211e3f5-0249-4d9b-a583-bbc03e5a4c74,  error code: ImagePathError, message: Image path does not exist or cannot be accessed/created: ('/rhev/data-center/mnt/10.202.130.103:_var_lib_export/e211e3f5-0249-4d9b-a583-bbc03e5a4c74/images/a5f46711-307f-47d4-a483-e1f7af4751a1',)


Thanks!

Comment 8 Liron Aravot 2014-07-30 08:48:29 UTC
Fabian, as the export operation was already mistakenly marked as succesfull (see #2 at https://bugzilla.redhat.com/show_bug.cgi?id=1122639#c3), this export is already broken.

try to export and import again and report if the problem persists.

thanks,
Liron

Comment 9 Allon Mureinik 2014-07-30 15:18:11 UTC
(In reply to Liron Aravot from comment #3)
> 2. when a task is stopped by the engine when its detected as zombie,
> endSuccesfully shouldn't be called for the parent command, but
> endWithFailue. as endSuccesfully is being executed the engine marks the
> export as succesful.
Oved, this seems like an issue in the tasks infrastructure (it isn't specific to this command).
Can one of your guys please take a look?

Comment 10 Oved Ourfali 2014-07-31 07:36:07 UTC
Asked Ravi to look at it.

Comment 11 Jiri Belka 2014-09-11 09:21:36 UTC
ok, vt3.

it can't be reproduced as "zombie" task does not end anymore as ended successfully thus the VM is not exported at all. tested with ' 'AsyncTaskZombieTaskLifeInMinutes' set to '1' to get zombie task. then i see 'Failed to export Vm...' in events.

Comment 13 Eyal Edri 2015-02-17 17:13:10 UTC
rhev 3.5.0 was released. closing.