Bug 1161934 - When VDSM is in prepareForShutdown stage, operations order is not correct, because of attempt to access the storage after its unmounted
Summary: When VDSM is in prepareForShutdown stage, operations order is not correct, be...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: ---
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Liron Aravot
QA Contact: Raz Tamir
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-11-09 14:45 UTC by Raz Tamir
Modified: 2022-03-07 08:37 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-06 12:30:30 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)
engine log (966.75 KB, text/plain)
2014-11-10 08:25 UTC, Raz Tamir
no flags Details
engine log (966.75 KB, text/plain)
2014-11-10 08:26 UTC, Raz Tamir
no flags Details
vdsm log (7.13 MB, text/plain)
2014-11-10 09:19 UTC, Raz Tamir
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-45039 0 None None None 2022-03-07 08:37:00 UTC
oVirt gerrit 36162 0 master ABANDONED hsm: prepareForShutdown - operations order 2016-04-10 00:55:35 UTC

Description Raz Tamir 2014-11-09 14:45:43 UTC
Description of problem:
importing vm/template fails:

http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.4/job/3.4-storage_full_import_export-iscsi/97/console

at 2014-11-09 10:40:07,615 - Import vm fails
at 2014-11-09 12:11:16,664 - Import template fails


Version-Release number of selected component (if applicable):
av13

How reproducible:


Steps to Reproduce:
1. import vm/template
2.
3.

Actual results:
failed to import vm/template

Expected results:


Additional info:

Comment 2 Raz Tamir 2014-11-10 08:25:52 UTC
Created attachment 955693 [details]
engine log

Comment 3 Raz Tamir 2014-11-10 08:26:26 UTC
Created attachment 955694 [details]
engine log

Logs attached.
search for '2014-11-09 10:40:07' to see the error for import vm failure
search for '2014-11-09 12:11:16,664' to see the error for import template failure

Comment 4 Raz Tamir 2014-11-10 08:27:23 UTC
For the import template, search for '2014-11-09 12:11:16'

Comment 5 Michal Skrivanek 2014-11-10 08:47:57 UTC
yeah, well, exception on the host side. vdsm logs?

Comment 6 Raz Tamir 2014-11-10 09:19:17 UTC
Created attachment 955707 [details]
vdsm log

Comment 7 Michal Skrivanek 2014-11-10 09:30:16 UTC
vdsm log is full of storage-related errors. Moving to storage for deeper investigation

Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 603, in _updateState
    self.persist()
  File "/usr/share/vdsm/storage/task.py", line 1131, in persist
    self._save(self.store)
  File "/usr/share/vdsm/storage/task.py", line 750, in _save
    raise se.TaskDirError("_save: no such task dir '%s'" % origTaskDir)
TaskDirError: can't find/access task dir: ("_save: no such task dir '/rhev/data-center/aa0d7c86-f0e5-493b-905c-8e0a266fb9dc/mastersd/master/tasks/9d9453d5-6301-48dd-94e8-004b42342887'",)

Comment 8 Liron Aravot 2014-12-14 22:24:20 UTC
VDSM receives SIGTERM which results in running hsm.prepareForShutdown, which executes cleanupMasterMount while there are still running tasks, which leads to the given error (logs below) and the failure of the tasks.
Posponing for 3.6, we may switch the order of operations (first end the tasks, then cleanup the master mount)

----------------------------

MainThread::DEBUG::2014-11-09 12:30:41,243::sp::378::Storage.StoragePool::(cleanupMasterMount) unmounting /rhev/data-center/mnt/blockSD/acaff1df-69a7-4b57-9fc
6-061e9effcced/master

MainThread::DEBUG::2014-11-09 12:30:42,278::mount::202::Storage.Misc.excCmd::(_runcmd) '/usr/bin/sudo -n /bin/umount /rhev/data-center/mnt/blockSD/acaff1df-69
a7-4b57-9fc6-061e9effcced/master' (cwd None)
----------------------------

823eedac-6660-441d-9c53-e010abd83fc4::ERROR::2014-11-09 12:30:42,546::volume::505::Storage.Volume::(create) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/volume.py", line 491, in create
    map(str, metaId))
  File "/usr/share/vdsm/storage/task.py", line 1060, in pushRecovery
    self.persist()
  File "/usr/share/vdsm/storage/task.py", line 1131, in persist
    self._save(self.store)
  File "/usr/share/vdsm/storage/task.py", line 750, in _save
    raise se.TaskDirError("_save: no such task dir '%s'" % origTaskDir)
TaskDirError: can't find/access task dir: ("_save: no such task dir '/rhev/data-center/aa0d7c86-f0e5-493b-905c-8e0a266fb9dc/mastersd/master/tasks/823eedac-6660-441d-9c53-e010abd83fc4'",)
----------------------------

Comment 9 Nir Soffer 2015-11-01 18:53:50 UTC
We have 10 seconds before vdsm is killed, and we must unmount the master
mount.

In 4.0 we will not have a master mount or task persistence, so this problem
will go away.

Comment 10 Yaniv Lavi 2015-11-02 12:19:11 UTC
Any functional impact to this other than errors in the log?
How risky is the patch?

Comment 11 Liron Aravot 2015-11-11 09:46:23 UTC
Besides the errors there shouldn't be any actual impact (that can be verified) besides having a different behavior for file and block domains (this bug is relevant for block storage only). As i understood from nsoffer previously we used to kill vdsm instantly while now it has 10 seconds from the time the signal is received, that means we managed before without this function code even running. We can leave this code as is to not add operations before the mount removal and let it be solved by the spm/tasks storage persistency removal.

Comment 12 Allon Mureinik 2015-11-11 11:43:55 UTC
Pushing out based on that comment.

Comment 13 Mike McCune 2016-03-28 22:54:17 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 14 Sandro Bonazzola 2016-05-02 09:49:25 UTC
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 15 Allon Mureinik 2016-05-23 12:35:30 UTC
The patch was abandoned, returning to NEW.

Comment 16 Yaniv Lavi 2016-05-23 13:13:42 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 17 Tal Nisan 2017-02-06 12:30:30 UTC
This is a rare corner case from 3.4, closing as won't fix after discussion with PM and QE


Note You need to log in before you can comment on or make changes to this bug.