Bug 790331 - fenceSpmStorage raises SecureError, it breaks manual fencing
Summary: fenceSpmStorage raises SecureError, it breaks manual fencing
Keywords:
Status: CLOSED DUPLICATE of bug 787594
Alias: None
Product: oVirt
Classification: Retired
Component: vdsm
Version: unspecified
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Dan Kenigsberg
QA Contact: yeylon@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-14 09:39 UTC by Jakub Libosvar
Modified: 2016-04-18 06:43 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-02-14 10:01:19 UTC
oVirt Team: ---


Attachments (Terms of Use)
vdsm.log (367.55 KB, text/x-log)
2012-02-14 09:39 UTC, Jakub Libosvar
no flags Details

Description Jakub Libosvar 2012-02-14 09:39:44 UTC
Created attachment 561842 [details]
vdsm.log

Description of problem:
When 'Confirm Host has been rebooted' is clicked in the GUI on Non-responsive host, it fails in vdsm:

Thread-355::DEBUG::2012-02-14 10:21:48,728::clientIF::261::Storage.Dispatcher.Protect::(wrapper) [10.34.63.208]
Thread-355::DEBUG::2012-02-14 10:21:48,729::task::588::TaskManager.Task::(_updateState) Task=`a4180cae-d496-4f61-aa62-ee961982b870`::moving from state init -> state preparing
Thread-355::INFO::2012-02-14 10:21:48,729::logUtils::37::dispatcher::(wrapper) Run and protect: fenceSpmStorage(spUUID='b325b938-526e-11e1-aa15-001a4a013f0a', lastOwner=1, lastLver=2, options=None)
Thread-355::DEBUG::2012-02-14 10:21:48,730::resourceManager::175::ResourceManager.Request::(__init__) ResName=`Storage.b325b938-526e-11e1-aa15-001a4a013f0a`ReqID=`19bf27c2-4b18-45f7-b900-cde2f32ecba0`::Request was made in '/usr/share/vdsm/storage/hsm.py' line '2633' at 'fenceSpmStorage'
Thread-355::DEBUG::2012-02-14 10:21:48,730::resourceManager::483::ResourceManager::(registerResource) Trying to register resource 'Storage.b325b938-526e-11e1-aa15-001a4a013f0a' for lock type 'exclusive'
Thread-355::DEBUG::2012-02-14 10:21:48,731::resourceManager::525::ResourceManager::(registerResource) Resource 'Storage.b325b938-526e-11e1-aa15-001a4a013f0a' is free. Now locking as 'exclusive' (1 active user)
Thread-355::DEBUG::2012-02-14 10:21:48,731::resourceManager::212::ResourceManager.Request::(grant) ResName=`Storage.b325b938-526e-11e1-aa15-001a4a013f0a`ReqID=`19bf27c2-4b18-45f7-b900-cde2f32ecba0`::Granted request
Thread-355::DEBUG::2012-02-14 10:21:48,732::task::819::TaskManager.Task::(resourceAcquired) Task=`a4180cae-d496-4f61-aa62-ee961982b870`::_resourcesAcquired: Storage.b325b938-526e-11e1-aa15-001a4a013f0a (exclusive)
Thread-355::DEBUG::2012-02-14 10:21:48,732::task::980::TaskManager.Task::(_decref) Task=`a4180cae-d496-4f61-aa62-ee961982b870`::ref 1 aborting False
Thread-355::ERROR::2012-02-14 10:21:48,733::task::855::TaskManager.Task::(_setError) Task=`a4180cae-d496-4f61-aa62-ee961982b870`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 863, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2634, in fenceSpmStorage
    pool.forceFreeSpm()
  File "/usr/share/vdsm/storage/sp.py", line 150, in forceFreeSpm
    self.setMetaParams({PMDK_SPM_ID: -1, PMDK_LVER: -1})
  File "/usr/share/vdsm/storage/securable.py", line 78, in wrapper
    raise SecureError()
SecureError
Thread-355::DEBUG::2012-02-14 10:21:48,734::task::874::TaskManager.Task::(_run) Task=`a4180cae-d496-4f61-aa62-ee961982b870`::Task._run: a4180cae-d496-4f61-aa62-ee961982b870 ('b325b938-526e-11e1-aa15-001a4a013f0a', 1, 2) {} failed - stopping task
Thread-355::DEBUG::2012-02-14 10:21:48,734::task::1201::TaskManager.Task::(stop) Task=`a4180cae-d496-4f61-aa62-ee961982b870`::stopping in state preparing (force False)
Thread-355::DEBUG::2012-02-14 10:21:48,734::task::980::TaskManager.Task::(_decref) Task=`a4180cae-d496-4f61-aa62-ee961982b870`::ref 1 aborting True
Thread-355::INFO::2012-02-14 10:21:48,735::task::1159::TaskManager.Task::(prepare) Task=`a4180cae-d496-4f61-aa62-ee961982b870`::aborting: Task is aborted: '' - code 100
Thread-355::DEBUG::2012-02-14 10:21:48,735::task::1164::TaskManager.Task::(prepare) Task=`a4180cae-d496-4f61-aa62-ee961982b870`::Prepare: aborted:
Thread-355::DEBUG::2012-02-14 10:21:48,736::task::980::TaskManager.Task::(_decref) Task=`a4180cae-d496-4f61-aa62-ee961982b870`::ref 0 aborting True
Thread-355::DEBUG::2012-02-14 10:21:48,736::task::915::TaskManager.Task::(_doAbort) Task=`a4180cae-d496-4f61-aa62-ee961982b870`::Task._doAbort: force False
Thread-355::DEBUG::2012-02-14 10:21:48,736::resourceManager::841::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-355::DEBUG::2012-02-14 10:21:48,737::task::588::TaskManager.Task::(_updateState) Task=`a4180cae-d496-4f61-aa62-ee961982b870`::moving from state preparing -> state aborting
Thread-355::DEBUG::2012-02-14 10:21:48,737::task::537::TaskManager.Task::(__state_aborting) Task=`a4180cae-d496-4f61-aa62-ee961982b870`::_aborting: recover policy none
Thread-355::DEBUG::2012-02-14 10:21:48,737::task::588::TaskManager.Task::(_updateState) Task=`a4180cae-d496-4f61-aa62-ee961982b870`::moving from state aborting -> state failed
Thread-355::DEBUG::2012-02-14 10:21:48,738::resourceManager::806::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {'Storage.b325b938-526e-11e1-aa15-001a4a013f0a': < ResourceRef 'Storage.b325b938-526e-11e1-aa15-001a4a013f0a', isValid: 'True' obj: 'None'>}
Thread-355::DEBUG::2012-02-14 10:21:48,738::resourceManager::841::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-355::DEBUG::2012-02-14 10:21:48,738::resourceManager::535::ResourceManager::(releaseResource) Trying to release resource 'Storage.b325b938-526e-11e1-aa15-001a4a013f0a'
Thread-355::DEBUG::2012-02-14 10:21:48,739::resourceManager::550::ResourceManager::(releaseResource) Released resource 'Storage.b325b938-526e-11e1-aa15-001a4a013f0a' (0 active users)
Thread-355::DEBUG::2012-02-14 10:21:48,739::resourceManager::555::ResourceManager::(releaseResource) Resource 'Storage.b325b938-526e-11e1-aa15-001a4a013f0a' is free, finding out if anyone is waiting for it.
Thread-355::DEBUG::2012-02-14 10:21:48,739::resourceManager::562::ResourceManager::(releaseResource) No one is waiting for resource 'Storage.b325b938-526e-11e1-aa15-001a4a013f0a', Clearing records.
Thread-355::ERROR::2012-02-14 10:21:48,740::dispatcher::93::Storage.Dispatcher.Protect::(run)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/dispatcher.py", line 85, in run
    result = ctask.prepare(self.func, *args, **kwargs)
  File "/usr/share/vdsm/storage/task.py", line 1166, in prepare
    raise self.error
SecureError


Version-Release number of selected component (if applicable):
vdsm-4.9.3.2-0.fc16.x86_64
ovirt-engine-3.0.0_0001-1.6.fc16.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Have two hosts in datacenter, running VMs on SPM
2. Power off SPM host
3. After host becomes Non-responsive, try to Confirm that host has been rebooted
  
Actual results:
Host stays SPM

Expected results:
Host releases SPM and VMs go from Unknown to Down status

Additional info:
Nothing in SPM log

From backend log:

2012-02-14 10:32:47,401 INFO  [org.ovirt.engine.core.bll.storage.FenceVdsManualyCommand] (pool-5-thread-46) Lock Acquired to object EngineLock [exclusiveLocks= key: org.ovirt.engine.core.bll.storage.FenceVdsManualyCommand value: fd776e58-52f2-11e1-b6a6-001a4a013f0a
, sharedLocks= ]
2012-02-14 10:32:47,409 INFO  [org.ovirt.engine.core.bll.storage.FenceVdsManualyCommand] (pool-5-thread-46) Running command: FenceVdsManualyCommand internal: false. Entities affected :  ID: fd776e58-52f2-11e1-b6a6-001a4a013f0a Type: VDS
2012-02-14 10:32:47,457 INFO  [org.ovirt.engine.core.bll.storage.FenceVdsManualyCommand] (pool-5-thread-46) Trying to fence spm srh-11 via vds srh-03
2012-02-14 10:32:47,460 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceSpmStorageVDSCommand] (pool-5-thread-46) START, FenceSpmStorageVDSCommand(vdsId = 6819bdaa-56eb-11e1-953e-001a4a013f0a, storagePoolId = b325b938-526e-11e1-aa15-001a4a013f0a, prevId=1, prevLVER=2), log id: 3b26ad9a
2012-02-14 10:32:47,481 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-5-thread-46) Command org.ovirt.engine.core.vdsbroker.vdsbroker.FenceSpmStorageVDSCommand return value 
 Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
mStatus                       Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         654
mMessage                      Not SPM


2012-02-14 10:32:47,481 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-5-thread-46) Vds: srh-03
2012-02-14 10:32:47,482 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-5-thread-46) Command FenceSpmStorageVDS execution failed. Exception: IRSNonOperationalException: IRSGenericException: IRSErrorException: IRSNonOperationalException: Not SPM
2012-02-14 10:32:47,482 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceSpmStorageVDSCommand] (pool-5-thread-46) FINISH, FenceSpmStorageVDSCommand, log id: 3b26ad9a
2012-02-14 10:32:47,482 WARN  [org.ovirt.engine.core.bll.storage.FenceVdsManualyCommand] (pool-5-thread-46) Could not fence spm on vds srh-03

Comment 1 Yaniv Kaul 2012-02-14 09:48:32 UTC
Dup of https://bugzilla.redhat.com/show_bug.cgi?id=787594?

Comment 2 Jakub Libosvar 2012-02-14 10:01:19 UTC
(In reply to comment #1)
> Dup of https://bugzilla.redhat.com/show_bug.cgi?id=787594?

It is.

*** This bug has been marked as a duplicate of bug 787594 ***


Note You need to log in before you can comment on or make changes to this bug.