Bug 851151 - [LOG] sanlock error: "ReleaseHostIdFailure: Cannot release host id" when vdsm service is restarted during async task
Summary: [LOG] sanlock error: "ReleaseHostIdFailure: Cannot release host id" when vdsm...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: unspecified
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: ---
: 3.3.0
Assignee: Sergey Gotliv
QA Contact: Haim
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-08-23 11:32 UTC by Haim
Modified: 2016-02-10 17:39 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-01-09 18:24:26 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
vdsm log (933.13 KB, application/x-gzip)
2012-08-23 11:34 UTC, Haim
no flags Details

Description Haim 2012-08-23 11:32:18 UTC
Description of problem:

- 1 host 1 storage domain iscsi
- start an async task of post-zeroing a disk
- vdsmd service is restarted, the following error appears:


Traceback (most recent call last):
  File "/usr/share/vdsm/storage/image.py", line 116, in delete
    self._delete(sdUUID, imgUUID, postZero, force)
  File "/usr/share/vdsm/storage/image.py", line 176, in _delete
    mod.deleteMultipleVolumes(sdUUID, volumes, postZero)
  File "/usr/share/vdsm/storage/blockVolume.py", line 675, in deleteMultipleVolumes
    _postZero(sdUUID, volumes)
  File "/usr/share/vdsm/storage/blockVolume.py", line 669, in _postZero
    raise se.VolumesZeroingError(lv.name)
VolumesZeroingError: Cannot zero out volume: ('_remove_me_YAiPSVOD_8567929d-19ce-43b1-962c-c7a3feaf44fc',)
2e935445-1be9-44d1-902d-260fe7f5873a::ERROR::2012-08-23 14:25:35,190::task::853::TaskManager.Task::(_setError) Task=`2e935445-1be9-44d1-902d-260fe7f5873a`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 320, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/securable.py", line 63, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1797, in deleteImage
    img.delete(sdUUID=sdUUID, imgUUID=imgUUID, postZero=postZero, force=force)
  File "/usr/share/vdsm/storage/image.py", line 116, in delete
    self._delete(sdUUID, imgUUID, postZero, force)
  File "/usr/share/vdsm/storage/image.py", line 176, in _delete
    mod.deleteMultipleVolumes(sdUUID, volumes, postZero)
  File "/usr/share/vdsm/storage/blockVolume.py", line 675, in deleteMultipleVolumes
    _postZero(sdUUID, volumes)
  File "/usr/share/vdsm/storage/blockVolume.py", line 669, in _postZero
    raise se.VolumesZeroingError(lv.name)
VolumesZeroingError: Cannot zero out volume: ('_remove_me_YAiPSVOD_8567929d-19ce-43b1-962c-c7a3feaf44fc',)
Thread-766::INFO::2012-08-23 14:25:35,199::storage_mailbox::446::Storage.MailBox.HsmMailMonitor::(run) HSM_MailboxMonitor - Incoming mail monitoring thread stopped, clearing outgoing mail
Thread-711::DEBUG::2012-08-23 14:25:35,217::domainMonitor::182::Storage.DomainMonitor::(_monitorDomain) Monitorg for domain bb9d4516-3af4-4fec-b730-988d21d6b6ba is stopping
2e935445-1be9-44d1-902d-260fe7f5873a::DEBUG::2012-08-23 14:25:35,219::task::872::TaskManager.Task::(_run) Task=`2e935445-1be9-44d1-902d-260fe7f5873a`::Task._run: 2e935445-1be9-44d1-902d-260fe7f5873a () {} failed - stopping task
Thread-766::INFO::2012-08-23 14:25:35,219::storage_mailbox::340::Storage.MailBox.HsmMailMonitor::(_sendMail) HSM_MailMonitor sending mail to SPM - ['/bin/dd', 'of=/rhev/data-center/4882ff40-ed1a-11e1-bcb8-001a4a16970e/mastersd/dom_md/inbox', 'iflag=fullblock', 'oflag=direct', 'conv=notrunc', 'bs=512', 'seek=8']
Thread-711::INFO::2012-08-23 14:25:35,220::safelease::183::SANLock::(releaseHostId) Releasing host id for domain bb9d4516-3af4-4fec-b730-988d21d6b6ba (id: 1)
2e935445-1be9-44d1-902d-260fe7f5873a::DEBUG::2012-08-23 14:25:35,220::task::1199::TaskManager.Task::(stop) Task=`2e935445-1be9-44d1-902d-260fe7f5873a`::stopping in state running (force False)
Thread-766::DEBUG::2012-08-23 14:25:35,221::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd of=/rhev/data-center/4882ff40-ed1a-11e1-bcb8-001a4a16970e/mastersd/dom_md/inbox iflag=fullblock oflag=direct conv=notrunc bs=512 seek=8' (cwd None)
2e935445-1be9-44d1-902d-260fe7f5873a::DEBUG::2012-08-23 14:25:35,222::task::978::TaskManager.Task::(_decref) Task=`2e935445-1be9-44d1-902d-260fe7f5873a`::ref 1 aborting True
Thread-711::DEBUG::2012-08-23 14:25:35,229::domainMonitor::191::Storage.DomainMonitor::(_monitorDomain) Unable to release the host id 1 for the domain bb9d4516-3af4-4fec-b730-988d21d6b6ba
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 188, in _monitorDomain
    domain.releaseHostId(hostId, unused=True)
  File "/usr/share/vdsm/storage/sd.py", line 413, in releaseHostId
    self._clusterLock.releaseHostId(hostId, async, unused)
  File "/usr/share/vdsm/storage/safelease.py", line 190, in releaseHostId
    raise se.ReleaseHostIdFailure(self._sdUUID, e)
ReleaseHostIdFailure: Cannot release host id: ('bb9d4516-3af4-4fec-b730-988d21d6b6ba', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy'))
2e935445-1be9-44d1-902d-260fe7f5873a::DEBUG::2012-08-23 14:25:35,230::task::903::TaskManager.Task::(_runJobs) Task=`2e935445-1be9-44d1-902d-260fe7f5873a`::aborting: Task is aborted: 'Cannot zero out volume' - code 374

I suspect its related to the fact we are doing some lv deactivation during preapreForShutdown and then we get a nice set of exceptions.

user impact:

None

Comment 1 Haim 2012-08-23 11:34:36 UTC
Created attachment 606514 [details]
vdsm log

Comment 3 Federico Simoncelli 2012-08-28 21:04:07 UTC
For what I see the problem is that HSM.prepareForShutdown() doesn't include an explicit stopSpm call. The domainMonitor tries to release the lockspace but it fails since the SPM resource is still in use. With safelease the HSM.__releaseLocks() (called in the prepareForShutdown flow) does something really similar to stopSpm.

Comment 8 Ayal Baron 2013-02-16 20:38:56 UTC
Fede, does this still happen?

Comment 9 Federico Simoncelli 2013-03-13 09:30:34 UTC
(In reply to comment #8)
> Fede, does this still happen?

Yes. At the moment this message indicates that the vdsm service has been restarted without stopping the SPM first (safe anyway since sanlock will notice that the process has gone away and release it).
In the future (when the volume leases will be used) this message might become more frequent (on HSMs too) and we might want to find a way to silence it.

Comment 10 Itamar Heim 2013-12-08 07:16:43 UTC
still relevant with planned changes to spm?


Note You need to log in before you can comment on or make changes to this bug.