Description of problem: importing vm/template fails: http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.4/job/3.4-storage_full_import_export-iscsi/97/console at 2014-11-09 10:40:07,615 - Import vm fails at 2014-11-09 12:11:16,664 - Import template fails Version-Release number of selected component (if applicable): av13 How reproducible: Steps to Reproduce: 1. import vm/template 2. 3. Actual results: failed to import vm/template Expected results: Additional info:
Created attachment 955693 [details] engine log
Created attachment 955694 [details] engine log Logs attached. search for '2014-11-09 10:40:07' to see the error for import vm failure search for '2014-11-09 12:11:16,664' to see the error for import template failure
For the import template, search for '2014-11-09 12:11:16'
yeah, well, exception on the host side. vdsm logs?
Created attachment 955707 [details] vdsm log
vdsm log is full of storage-related errors. Moving to storage for deeper investigation Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 603, in _updateState self.persist() File "/usr/share/vdsm/storage/task.py", line 1131, in persist self._save(self.store) File "/usr/share/vdsm/storage/task.py", line 750, in _save raise se.TaskDirError("_save: no such task dir '%s'" % origTaskDir) TaskDirError: can't find/access task dir: ("_save: no such task dir '/rhev/data-center/aa0d7c86-f0e5-493b-905c-8e0a266fb9dc/mastersd/master/tasks/9d9453d5-6301-48dd-94e8-004b42342887'",)
VDSM receives SIGTERM which results in running hsm.prepareForShutdown, which executes cleanupMasterMount while there are still running tasks, which leads to the given error (logs below) and the failure of the tasks. Posponing for 3.6, we may switch the order of operations (first end the tasks, then cleanup the master mount) ---------------------------- MainThread::DEBUG::2014-11-09 12:30:41,243::sp::378::Storage.StoragePool::(cleanupMasterMount) unmounting /rhev/data-center/mnt/blockSD/acaff1df-69a7-4b57-9fc 6-061e9effcced/master MainThread::DEBUG::2014-11-09 12:30:42,278::mount::202::Storage.Misc.excCmd::(_runcmd) '/usr/bin/sudo -n /bin/umount /rhev/data-center/mnt/blockSD/acaff1df-69 a7-4b57-9fc6-061e9effcced/master' (cwd None) ---------------------------- 823eedac-6660-441d-9c53-e010abd83fc4::ERROR::2014-11-09 12:30:42,546::volume::505::Storage.Volume::(create) Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/volume.py", line 491, in create map(str, metaId)) File "/usr/share/vdsm/storage/task.py", line 1060, in pushRecovery self.persist() File "/usr/share/vdsm/storage/task.py", line 1131, in persist self._save(self.store) File "/usr/share/vdsm/storage/task.py", line 750, in _save raise se.TaskDirError("_save: no such task dir '%s'" % origTaskDir) TaskDirError: can't find/access task dir: ("_save: no such task dir '/rhev/data-center/aa0d7c86-f0e5-493b-905c-8e0a266fb9dc/mastersd/master/tasks/823eedac-6660-441d-9c53-e010abd83fc4'",) ----------------------------
We have 10 seconds before vdsm is killed, and we must unmount the master mount. In 4.0 we will not have a master mount or task persistence, so this problem will go away.
Any functional impact to this other than errors in the log? How risky is the patch?
Besides the errors there shouldn't be any actual impact (that can be verified) besides having a different behavior for file and block domains (this bug is relevant for block storage only). As i understood from nsoffer previously we used to kill vdsm instantly while now it has 10 seconds from the time the signal is received, that means we managed before without this function code even running. We can leave this code as is to not add operations before the mount removal and let it be solved by the spm/tasks storage persistency removal.
Pushing out based on that comment.
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.
The patch was abandoned, returning to NEW.
oVirt 4.0 beta has been released, moving to RC milestone.
This is a rare corner case from 3.4, closing as won't fix after discussion with PM and QE