Bug 851151 - [LOG] sanlock error: "ReleaseHostIdFailure: Cannot release host id" when vdsm service is restarted during async task
[LOG] sanlock error: "ReleaseHostIdFailure: Cannot release host id" when vdsm...
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
unspecified
x86_64 Linux
unspecified Severity low
: ---
: 3.3.0
Assigned To: Sergey Gotliv
Haim
storage
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-23 07:32 EDT by Haim
Modified: 2016-02-10 12:39 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-01-09 13:24:26 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
vdsm log (933.13 KB, application/x-gzip)
2012-08-23 07:34 EDT, Haim
no flags Details

  None (edit)
Description Haim 2012-08-23 07:32:18 EDT
Description of problem:

- 1 host 1 storage domain iscsi
- start an async task of post-zeroing a disk
- vdsmd service is restarted, the following error appears:


Traceback (most recent call last):
  File "/usr/share/vdsm/storage/image.py", line 116, in delete
    self._delete(sdUUID, imgUUID, postZero, force)
  File "/usr/share/vdsm/storage/image.py", line 176, in _delete
    mod.deleteMultipleVolumes(sdUUID, volumes, postZero)
  File "/usr/share/vdsm/storage/blockVolume.py", line 675, in deleteMultipleVolumes
    _postZero(sdUUID, volumes)
  File "/usr/share/vdsm/storage/blockVolume.py", line 669, in _postZero
    raise se.VolumesZeroingError(lv.name)
VolumesZeroingError: Cannot zero out volume: ('_remove_me_YAiPSVOD_8567929d-19ce-43b1-962c-c7a3feaf44fc',)
2e935445-1be9-44d1-902d-260fe7f5873a::ERROR::2012-08-23 14:25:35,190::task::853::TaskManager.Task::(_setError) Task=`2e935445-1be9-44d1-902d-260fe7f5873a`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 320, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/securable.py", line 63, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1797, in deleteImage
    img.delete(sdUUID=sdUUID, imgUUID=imgUUID, postZero=postZero, force=force)
  File "/usr/share/vdsm/storage/image.py", line 116, in delete
    self._delete(sdUUID, imgUUID, postZero, force)
  File "/usr/share/vdsm/storage/image.py", line 176, in _delete
    mod.deleteMultipleVolumes(sdUUID, volumes, postZero)
  File "/usr/share/vdsm/storage/blockVolume.py", line 675, in deleteMultipleVolumes
    _postZero(sdUUID, volumes)
  File "/usr/share/vdsm/storage/blockVolume.py", line 669, in _postZero
    raise se.VolumesZeroingError(lv.name)
VolumesZeroingError: Cannot zero out volume: ('_remove_me_YAiPSVOD_8567929d-19ce-43b1-962c-c7a3feaf44fc',)
Thread-766::INFO::2012-08-23 14:25:35,199::storage_mailbox::446::Storage.MailBox.HsmMailMonitor::(run) HSM_MailboxMonitor - Incoming mail monitoring thread stopped, clearing outgoing mail
Thread-711::DEBUG::2012-08-23 14:25:35,217::domainMonitor::182::Storage.DomainMonitor::(_monitorDomain) Monitorg for domain bb9d4516-3af4-4fec-b730-988d21d6b6ba is stopping
2e935445-1be9-44d1-902d-260fe7f5873a::DEBUG::2012-08-23 14:25:35,219::task::872::TaskManager.Task::(_run) Task=`2e935445-1be9-44d1-902d-260fe7f5873a`::Task._run: 2e935445-1be9-44d1-902d-260fe7f5873a () {} failed - stopping task
Thread-766::INFO::2012-08-23 14:25:35,219::storage_mailbox::340::Storage.MailBox.HsmMailMonitor::(_sendMail) HSM_MailMonitor sending mail to SPM - ['/bin/dd', 'of=/rhev/data-center/4882ff40-ed1a-11e1-bcb8-001a4a16970e/mastersd/dom_md/inbox', 'iflag=fullblock', 'oflag=direct', 'conv=notrunc', 'bs=512', 'seek=8']
Thread-711::INFO::2012-08-23 14:25:35,220::safelease::183::SANLock::(releaseHostId) Releasing host id for domain bb9d4516-3af4-4fec-b730-988d21d6b6ba (id: 1)
2e935445-1be9-44d1-902d-260fe7f5873a::DEBUG::2012-08-23 14:25:35,220::task::1199::TaskManager.Task::(stop) Task=`2e935445-1be9-44d1-902d-260fe7f5873a`::stopping in state running (force False)
Thread-766::DEBUG::2012-08-23 14:25:35,221::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd of=/rhev/data-center/4882ff40-ed1a-11e1-bcb8-001a4a16970e/mastersd/dom_md/inbox iflag=fullblock oflag=direct conv=notrunc bs=512 seek=8' (cwd None)
2e935445-1be9-44d1-902d-260fe7f5873a::DEBUG::2012-08-23 14:25:35,222::task::978::TaskManager.Task::(_decref) Task=`2e935445-1be9-44d1-902d-260fe7f5873a`::ref 1 aborting True
Thread-711::DEBUG::2012-08-23 14:25:35,229::domainMonitor::191::Storage.DomainMonitor::(_monitorDomain) Unable to release the host id 1 for the domain bb9d4516-3af4-4fec-b730-988d21d6b6ba
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 188, in _monitorDomain
    domain.releaseHostId(hostId, unused=True)
  File "/usr/share/vdsm/storage/sd.py", line 413, in releaseHostId
    self._clusterLock.releaseHostId(hostId, async, unused)
  File "/usr/share/vdsm/storage/safelease.py", line 190, in releaseHostId
    raise se.ReleaseHostIdFailure(self._sdUUID, e)
ReleaseHostIdFailure: Cannot release host id: ('bb9d4516-3af4-4fec-b730-988d21d6b6ba', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy'))
2e935445-1be9-44d1-902d-260fe7f5873a::DEBUG::2012-08-23 14:25:35,230::task::903::TaskManager.Task::(_runJobs) Task=`2e935445-1be9-44d1-902d-260fe7f5873a`::aborting: Task is aborted: 'Cannot zero out volume' - code 374

I suspect its related to the fact we are doing some lv deactivation during preapreForShutdown and then we get a nice set of exceptions.

user impact:

None
Comment 1 Haim 2012-08-23 07:34:36 EDT
Created attachment 606514 [details]
vdsm log
Comment 3 Federico Simoncelli 2012-08-28 17:04:07 EDT
For what I see the problem is that HSM.prepareForShutdown() doesn't include an explicit stopSpm call. The domainMonitor tries to release the lockspace but it fails since the SPM resource is still in use. With safelease the HSM.__releaseLocks() (called in the prepareForShutdown flow) does something really similar to stopSpm.
Comment 8 Ayal Baron 2013-02-16 15:38:56 EST
Fede, does this still happen?
Comment 9 Federico Simoncelli 2013-03-13 05:30:34 EDT
(In reply to comment #8)
> Fede, does this still happen?

Yes. At the moment this message indicates that the vdsm service has been restarted without stopping the SPM first (safe anyway since sanlock will notice that the process has gone away and release it).
In the future (when the volume leases will be used) this message might become more frequent (on HSMs too) and we might want to find a way to silence it.
Comment 10 Itamar Heim 2013-12-08 02:16:43 EST
still relevant with planned changes to spm?

Note You need to log in before you can comment on or make changes to this bug.