Description of problem: When I export any machine with more than 2 vdisks, any type, VDSM complains about an ERROR and aborts de export of images, but the OVF is left as a successful export. RHEVM reports successful export and when I try to import de VM, I receive a "missing image" error. Which is correct, the OVF is left without any images. Version-Release number of selected component (if applicable): rhevm-3.4.0-0.22 How reproducible: 100% Steps to Reproduce: 1. Export VM with 3 vdisks or more 2. Verify RHEVM reports successful export 3. Import the exported VM 4. Error complaining about missing images for exported VM Actual results: Error complaining about missing images for exported VM Expected results: VM imported successfully Additional info:
Created attachment 920296 [details] engine.log, vdsm, messages
From brief look it seems that we have 2 issues here - We can take for example task d9f0fba3-63ae-4a4c-a904-525907a0f16c [a] 1. it can be seen that after a mintue the engine detects the task as zombie task and attempts to stop it, which is obviously wrong. this is either caused by wrong value of the config value AsyncTaskZombieTaskLifeInMinutes or by wrong comprasion. ----------------------------------------------------------------------- 2014-07-22 21:01:46,483 INFO [org.ovirt.engine.core.bll.SPMAsyncTask] (DefaultQuartzScheduler_Worker-96) SPMAsyncTask::PollTask: Polling task d9f0fba3-63ae-4 a4c-a904-525907a0f16c (Parent Command ExportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) returned status running. 2014-07-22 21:01:46,484 INFO [org.ovirt.engine.core.bll.AsyncTaskManager] (DefaultQuartzScheduler_Worker-96) Finished polling Tasks, will poll again in 10 se conds. 2014-07-22 21:01:47,885 INFO [org.ovirt.engine.core.bll.scheduling.PolicyUnitImpl] (DefaultQuartzScheduler_Worker-3) There is no over-utilized host in cluste r Telcel_cluster_01 2014-07-22 21:02:38,719 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-16) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Stopping async task ExportVm that started at Tue Jul 22 21:01:36 CDT 2014 2014-07-22 21:02:38,719 INFO [org.ovirt.engine.core.bll.AsyncTaskManager] (DefaultQuartzScheduler_Worker-16) Cleaning zombie tasks: Stopping async task Expor tVm that started at 7/22/14 9:01 PM 2014-07-22 21:02:38,720 INFO [org.ovirt.engine.core.bll.SPMAsyncTask] (DefaultQuartzScheduler_Worker-16) SPMAsyncTask::StopTask: Attempting to stop task ab27 ab6c-0877-4a99-b0c0-f3403247fb10 (Parent Command ExportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters). ----------------------------------------------------------------------- 2. when a task is stopped by the engine when its detected as zombie, endSuccesfully shouldn't be called for the parent command, but endWithFailue. as endSuccesfully is being executed the engine marks the export as succesful. 2014-07-22 21:02:50,115 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-4-thread-11) Correlation ID: 15933d80, Job ID: 23623117-7370-4d7e-9e1c-d13c1cb66efa, Call Stack: null, Custom Event ID: -1, Message: Vm Backups_02 was exported successfully to EXPORT_01 [a] : ----------------------------------------------------------------------- 2014-07-22 21:01:35,906 INFO [org.ovirt.engine.core.bll.AsyncTaskManager] (org.ovirt.thread.pool-4-thread-1) [37d34d0d] Adding task d9f0fba3-63ae-4a4c-a904-5 25907a0f16c (Parent Command ExportVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters), polling hasn't started yet.. 2014-07-22 21:01:36,872 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-4-thread-1) [5a1eea6a] Correlation ID: 15933d80, Job ID: 23623117-7370-4d7e-9e1c-d13c1cb66efa, Call Stack: null, Custom Event ID: -1, Message: Starting export Vm Backups_02 to EXPORT_01 2014-07-22 21:01:36,873 INFO [org.ovirt.engine.core.bll.SPMAsyncTask] (org.ovirt.thread.pool-4-thread-1) [5a1eea6a] BaseAsyncTask::startPollingTask: Starting to poll task d9f0fba3-63ae-4a4c-a904-525907a0f16c. -----------------------------------------------------------------------
Fabian, additionaly please run this query and report back the results so the issue will be better analyzed. select * from vdc_options where option_name = 'AsyncTaskZombieTaskLifeInMinutes'; thanks.
Sure! engine=# select * from vdc_options where option_name = engine-# 'AsyncTaskZombieTaskLifeInMinutes'; option_id | option_name | option_value | version -----------+----------------------------------+--------------+--------- 28 | AsyncTaskZombieTaskLifeInMinutes | 1 | general (1 row)
Thanks Fabian, Have you ever updated the config values? because the default value for that is 3000. please run the following - update vdc_options set option_value = 3000 where option_name = 'AsyncTaskZombieTaskLifeInMinutes'; and your problem should be solved (atleast for exports that take less than 50 hours). regardless of that, what needs to be solved is what i specified in under #1 in my comment - https://bugzilla.redhat.com/show_bug.cgi?id=1122639#c3.
Hi Liron! This is an in place upgrade from a RHEV 3.3.2, in that version ZombieTasks where leaving me with locked disks, so I changed the value and decided to upgrade, forgot to take back that value to the default. Now I have the default value: [root@rhev-mgr dbscripts]# rhevm-config -g AsyncTaskZombieTaskLifeInMinutes AsyncTaskZombieTaskLifeInMinutes: 3000 version: general [root@rhev-mgr dbscripts]# And I still have the same issue, when I try to import the VM, it complains about missing images and indeed, there are none in the images directory in the export domain: For a 2 vdisks VM: [root@host images]# ls -lrt total 4 drwxr-xr-x. 2 36 kvm 4096 Jul 24 13:17 7a1e8654-9cc0-4dc5-a9e6-ef413bb4c244 [root@host images]# From engine.log : 2014-07-24 13:19:18,725 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (ajp-/127.0.0.1:8702-11) [545aa28b] IrsBroker::getImageInfo::Failed getting image info imageId = d03d5e87-659d-4cfa-a458-5aacbca09672 does not exist on domainName = EXPORT_01 , domainId = e211e3f5-0249-4d9b-a583-bbc03e5a4c74, error code: ImagePathError, message: Image path does not exist or cannot be accessed/created: ('/rhev/data-center/mnt/10.202.130.103:_var_lib_export/e211e3f5-0249-4d9b-a583-bbc03e5a4c74/images/a5f46711-307f-47d4-a483-e1f7af4751a1',) Thanks!
Fabian, as the export operation was already mistakenly marked as succesfull (see #2 at https://bugzilla.redhat.com/show_bug.cgi?id=1122639#c3), this export is already broken. try to export and import again and report if the problem persists. thanks, Liron
(In reply to Liron Aravot from comment #3) > 2. when a task is stopped by the engine when its detected as zombie, > endSuccesfully shouldn't be called for the parent command, but > endWithFailue. as endSuccesfully is being executed the engine marks the > export as succesful. Oved, this seems like an issue in the tasks infrastructure (it isn't specific to this command). Can one of your guys please take a look?
Asked Ravi to look at it.
ok, vt3. it can't be reproduced as "zombie" task does not end anymore as ended successfully thus the VM is not exported at all. tested with ' 'AsyncTaskZombieTaskLifeInMinutes' set to '1' to get zombie task. then i see 'Failed to export Vm...' in events.
rhev 3.5.0 was released. closing.