Created attachment 1300372 [details] engine and vdsm logs Description of problem: In our automation, few cases failed because the clone of VM from template failed. From engine.log: 2017-07-16 18:49:17,797+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (DefaultQuartzScheduler7) [6cd5a2d] Failed in 'HSMGetAllTasksStatusesVDS' method 2017-07-16 18:49:17,801+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler7) [6cd5a2d] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_1 command HSMGetAllTasksStatusesVDS failed: Error creating a new volume 2017-07-16 18:49:17,801+03 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler7) [6cd5a2d] SPMAsyncTask::PollTask: Polling task '8e3b5318-2b14-450d-bcb2-a6caf841b142' (Parent Command 'CreateSnapshotFromTemplate', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'cleanFailure'. 2017-07-16 18:49:17,803+03 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler7) [6cd5a2d] BaseAsyncTask::logEndTaskFailure: Task '8e3b5318-2b14-450d-bcb2-a6caf841b142' (Parent Command 'CreateSnapshotFromTemplate', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure: -- Result: 'cleanFailure' -- Message: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Error creating a new volume, code = 205', -- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Error creating a new volume, code = 205' 2017-07-16 18:49:17,805+03 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (DefaultQuartzScheduler7) [6cd5a2d] CommandAsyncTask::endActionIfNecessary: All tasks of command '9b80811c-c179-41b3-84ae-77a3f30d98ef' has ended -> executing 'endAction' From vdsm.log: 2017-07-16 18:49:07,780+0300 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.getAllTasksStatuses succeeded in 0.00 seconds (__init__:583) 2017-07-16 18:49:11,150+0300 ERROR (tasks/9) [storage.Volume] Unexpected error (volume:1200) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1157, in create initialSize=initialSize) File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py", line 468, in _create volParent.clone(volPath, volFormat) File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 972, in clone [self.sdUUID, self.imgUUID, self.volUUID])) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 1067, in pushRecovery self.persist() File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 1138, in persist self._save(self.store) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 776, in _save raise se.TaskPersistError("%s persist failed: %s" % (self, e)) TaskPersistError: Can't persist task: ("8e3b5318-2b14-450d-bcb2-a6caf841b142 persist failed: Can't save Task Metadata: ('/rhev/data-center/3272dac3-c83a-4bcf-8c9d-8c19054b7ad9/mastersd/master/tasks/8e3b5318-2b14-45 0d-bcb2-a6caf841b142.temp/8e3b5318-2b14-450d-bcb2-a6caf841b142.recover.4',)",) 2017-07-16 18:49:11,158+0300 ERROR (tasks/9) [storage.TaskManager.Task] (Task='8e3b5318-2b14-450d-bcb2-a6caf841b142') Unexpected error (task:870) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 877, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 333, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1932, in createVolume initialSize=initialSize) File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 773, in createVolume initialSize=initialSize) File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1202, in create (volUUID, e)) VolumeCreationError: Error creating a new volume: ('Volume creation e0a3392e-ee63-439d-9b0d-36180f513943 failed: Can\'t persist task: ("8e3b5318-2b14-450d-bcb2-a6caf841b142 persist failed: Can\'t save Task Metadata: (\'/rhev/data-center/3272dac3-c83a-4bcf-8c9d-8c19054b7ad9/mastersd/master/tasks/8e3b5318-2b14-450d-bcb2-a6caf841b142.temp/8e3b5318-2b14-450d-bcb2-a6caf841b142.recover.4\',)",)',) 2017-07-16 18:49:11,395+0300 ERROR (tasks/9) [storage.TaskManager.Task] (Task='8e3b5318-2b14-450d-bcb2-a6caf841b142') Unexpected error (task:870) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 877, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 398, in run [self.moduleName]) ImportError: No module named storage.volume 2017-07-16 18:49:11,397+0300 WARN (tasks/9) [storage.TaskManager.Task] (Task='8e3b5318-2b14-450d-bcb2-a6caf841b142') task 8e3b5318-2b14-450d-bcb2-a6caf841b142: recovery failed: Task is aborted: u'No module named storage.volume' - code 100 (task:815) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 812, in _recover self._run(rec.run) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 891, in _run raise se.TaskAborted(message, code) TaskAborted: Task is aborted: u'No module named storage.volume' - code 100 2017-07-16 18:49:11,529+0300 INFO (tasks/9) [storage.ThreadPool.WorkerThread] FINISH task 8e3b5318-2b14-450d-bcb2-a6caf841b142 (threadPool:210) Version-Release number of selected component (if applicable): vdsm-4.20.1-180.git07508e2.el7.centos.x86_64 ovirt-engine-4.2.0-0.0.master.20170712143425.gitcb02bb5.el7.centos.noarch How reproducible: Don't know - ~ 20% Steps to Reproduce: 1. Clone VM from template as thin copy 2. 3. Actual results: cloned failed Expected results: Additional info:
(In reply to Raz Tamir from comment #0) > How reproducible: > Don't know - ~ 20% This doesn't make any sense to me. Assuming the file path that holds the module is on some flaky network storage, I'd expect imports to either consistently succeed or consistently fail. Nir - any bright ideas?
Also happen when creating snapshot engine.log: 2017-07-16 21:43:20,326+03 ERROR [org.ovirt.engine.core.bll.snapshots.CreateSnapshotFromTemplateCommand] (default task-16) [vms_create_a7b70299-bf91-4a84] Failed creating snapshot from image id 'bcd6e31d-7603-401e-a532-1408c89f2844' 2017-07-16 21:43:20,328+03 INFO [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (default task-16) [vms_create_a7b70299-bf91-4a84] Failing partially submitted task AsyncTaskType 'createVolume': Task 'd18f40b7-9ca7-4ca2-8856-ef9ebb1fe257' Parent Command 'CreateSnapshotFromTemplate' 2017-07-16 21:43:20,328+03 INFO [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (default task-16) [vms_create_a7b70299-bf91-4a84] Calling updateTask for partially submitted task and AsyncTaskType 'createVolume': Task 'd18f40b7-9ca7-4ca2-8856-ef9ebb1fe257' Parent Command 'CreateSnapshotFromTemplate' Parameters class '{}' 2017-07-16 21:43:20,328+03 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (default task-16) [vms_create_a7b70299-bf91-4a84] CommandAsyncTask::Adding CommandMultiAsyncTasks object for command 'eac52b4c-7214-4a9f-b3f6-20d587fffc95' 2017-07-16 21:43:20,328+03 INFO [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (default task-16) [vms_create_a7b70299-bf91-4a84] CommandMultiAsyncTasks::attachTask: Attaching task '00000000-0000-0000-0000-000000000000' to command 'eac52b4c-7214-4a9f-b3f6-20d587fffc95'. 2017-07-16 21:43:20,331+03 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (default task-16) [vms_create_a7b70299-bf91-4a84] BaseAsyncTask::logEndTaskFailure: Task '00000000-0000-0000-0000-000000000000' (Parent Command 'CreateSnapshotFromTemplate', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure: -- Result: 'failure' -- Message: 'Create snapshot failed at VDSM. DB task ID is d18f40b7-9ca7-4ca2-8856-ef9ebb1fe257', -- Exception: '[null]' 2017-07-16 21:43:20,331+03 INFO [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (default task-16) [vms_create_a7b70299-bf91-4a84] Task with DB task ID '00000000-0000-0000-0000-000000000000' has empty vdsm task ID and is about to be cleared 2017-07-16 21:43:20,332+03 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (default task-16) [vms_create_a7b70299-bf91-4a84] CommandAsyncTask::endActionIfNecessary: All tasks of command 'eac52b4c-7214-4a9f-b3f6-20d587fffc95' has ended -> executing 'endAction' 2017-07-16 21:43:20,332+03 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (default task-16) [vms_create_a7b70299-bf91-4a84] CommandAsyncTask::endAction: Ending action for '1' tasks (command ID: 'eac52b4c-7214-4a9f-b3f6-20d587fffc95'): calling endAction '. 2017-07-16 21:43:20,332+03 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (default task-16) [vms_create_a7b70299-bf91-4a84] BaseAsyncTask::logEndTaskFailure: Task '00000000-0000-0000-0000-000000000000' (Parent Command 'CreateSnapshotFromTemplate', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure: -- Result: 'failure' -- Message: 'Create snapshot failed at VDSM. DB task ID is d18f40b7-9ca7-4ca2-8856-ef9ebb1fe257', -- Exception: '[null]' 2017-07-16 21:43:20,332+03 INFO [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (default task-16) [vms_create_a7b70299-bf91-4a84] Task with DB task ID '00000000-0000-0000-0000-000000000000' has empty vdsm task ID and is about to be cleared From vdsm.log: 2017-07-16 21:43:20,465+0300 ERROR (tasks/0) [storage.TaskManager.Task] (Task='9c49dc86-5f8f-46ea-be94-d2ca92386dbc') Unexpected error (task:870) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 877, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 398, in run [self.moduleName]) ImportError: No module named storage.sd 2017-07-16 21:43:20,466+0300 WARN (tasks/0) [storage.TaskManager.Task] (Task='9c49dc86-5f8f-46ea-be94-d2ca92386dbc') task 9c49dc86-5f8f-46ea-be94-d2ca92386dbc: recovery failed: Task is aborted: u'9c49dc86-5f8f-46ea-be94-d2ca92386dbc' - code 411 (task:815) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 812, in _recover self._run(rec.run) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 890, in _run self.stop() File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 1251, in stop self._incref(force) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 983, in _incref raise se.TaskAborted(unicode(self)) TaskAborted: Task is aborted: u'9c49dc86-5f8f-46ea-be94-d2ca92386dbc' - code 411 Logs attached
Created attachment 1300474 [details] new_logs
(In reply to Allon Mureinik from comment #1) > (In reply to Raz Tamir from comment #0) > > How reproducible: > > Don't know - ~ 20% > This doesn't make any sense to me. > Assuming the file path that holds the module is on some flaky network > storage, I'd expect imports to either consistently succeed or consistently > fail. > > Nir - any bright ideas? This is probably related to https://gerrit.ovirt.org/78070, we probably have bad code in task.py that does not import modules properly.
Raz, the attached patch fix the import error, which may be enough to fix the issue, or it may reveal the real issue behind this. Can you run the failing tests with this patch to verify?
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Raz, can you update the bug with the results from the build when its done?
Execution finished with the proposed patch and there is no trace for the error described in comment #0. Nir, Do you need additional information?
Thanks Raz, the fix was merged.
Verified on ovirt-engine-4.2.0-0.0.master.20170725202023.git561151b.el7.centos
(In reply to Raz Tamir from comment #11) > Verified on > ovirt-engine-4.2.0-0.0.master.20170725202023.git561151b.el7.centos Bug is opened against vdsm product and has been verified on ovirt-engine product. Please ensure correct product is set for this bug or move back to QE for testing correct product.
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017. Since the problem described in this bug report should be resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.
(In reply to Raz Tamir from comment #11) > Verified on > ovirt-engine-4.2.0-0.0.master.20170725202023.git561151b.el7.centos Raz, can you update the vdsm version tested? (see comment 12)
Not really, It is really old and I can't find it