Bug 1170202

Summary: [PPC] Failed to attach NFS storage: Error while executing action Attach Storage Domain: AcquireHostIdFailure
Product: Red Hat Enterprise Virtualization Manager Reporter: Lukas Svaty <lsvaty>
Component: ovirt-engineAssignee: Liron Aravot <laravot>
Status: CLOSED WONTFIX QA Contact: Aharon Canan <acanan>
Severity: urgent Docs Contact:
Priority: medium    
Version: 3.4.3CC: acanan, amureini, ecohen, gklein, iheim, laravot, lpeer, lsurette, lsvaty, michal.skrivanek, nsoffer, rbalakri, Rhev-m-bugs, scohen, tnisan, yeylon
Target Milestone: ---Keywords: Reopened
Target Release: 3.4.5   
Hardware: ppc64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1160204 Environment:
Last Closed: 2015-01-12 10:31:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1160204    
Bug Blocks: 1122979    

Comment 18 Liron Aravot 2014-12-14 11:17:46 UTC
On the provided log i can see the following -
1. a domain is created - we perform the above
Thread-13::DEBUG::2014-12-05 11:45:02,591::resourceManager::421::ResourceManager::(registerNamespace) Registering namespace '00968ef6-441e-4313-9fbe-e49b08be657c_volumeNS'
Thread-13::DEBUG::2014-12-05 11:45:02,591::clusterlock::144::initSANLock::(initSANLock) Initializing SANLock for domain 00968ef6-441e-4313-9fbe-e49b08be657c
Thread-13::DEBUG::2014-12-05 11:45:02,705::sd::434::Storage.StorageDomain::(initSPMlease) lease initialized successfully
Thread-13::DEBUG::2014-12-05 11:45:02,705::hsm::2647::Storage.HSM::(createStorageDomain) knownSDs: {00968ef6-441e-4313-9fbe-e49b08be657c: storage.nfsSD.findDomain}

which performs the following:
        sanlock.init_lockspace(sdUUID, idsPath)
        sanlock.init_resource(sdUUID, SDM_LEASE_NAME,
                              [(leasesPath, SDM_LEASE_OFFSET)])

2. we disconnect from the domains storage server


3. we connect to the domain storage server

4. we immediately attempt to createStoragePool with that domain as the master and fail

Thread-13::INFO::2014-12-05 11:45:03,963::logUtils::44::dispatcher::(wrapper) Run and protect: createStoragePool(poolType=None, spUUID='195047ba-93e1-4835-828
7-61fa5b7fd1be', poolName='DC_NEW', masterDom='00968ef6-441e-4313-9fbe-e49b08be657c', domList=['00968ef6-441e-4313-9fbe-e49b08be657c'], masterVersion=1, lockP
olicy=None, lockRenewalIntervalSec=5, leaseTimeSec=60, ioOpTimeoutSec=10, leaseRetries=3, options=None)


Thread-13::INFO::2014-12-05 11:45:04,036::clusterlock::184::SANLock::(acquireHostId) Acquiring host id for domain 00968ef6-441e-4313-9fbe-e49b08be657c (id: 250)
Thread-13::ERROR::2014-12-05 11:45:05,037::task::866::TaskManager.Task::(_setError) Task=`ba7f2e70-5313-417e-be34-40b4fb4ffc85`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 988, in createStoragePool
    leaseParams)
  File "/usr/share/vdsm/storage/sp.py", line 573, in create
    self._acquireTemporaryClusterLock(msdUUID, leaseParams)
  File "/usr/share/vdsm/storage/sp.py", line 515, in _acquireTemporaryClusterLock
    msd.acquireHostId(self.id)
  File "/usr/share/vdsm/storage/sd.py", line 468, in acquireHostId
    self._clusterLock.acquireHostId(hostId, async)
  File "/usr/share/vdsm/storage/clusterlock.py", line 199, in acquireHostId
    raise se.AcquireHostIdFailure(self._sdUUID, e)
AcquireHostIdFailure: Cannot acquire host id: ('00968ef6-441e-4313-9fbe-e49b08be657c', SanlockException(19, 'Sanlock lockspace add failure', 'No such device'))


On the sanlock log i see the following error-
2014-12-05 11:45:04+0000 4032 [35940]: s1 lockspace 00968ef6-441e-4313-9fbe-e49b08be657c:250:/rhev/data-center/mnt/10.34.63.202:_mnt_export_nfs_lv3_lsvaty_ppc-nfs/00968ef6-441e-4313-9fbe-e49b08be657c/dom_md/ids:0
2014-12-05 11:45:04+0000 4032 [68886]: open error -13 /rhev/data-center/mnt/10.34.63.202:_mnt_export_nfs_lv3_lsvaty_ppc-nfs/00968ef6-441e-4313-9fbe-e49b08be657c/dom_md/ids
2014-12-05 11:45:04+0000 4032 [68886]: s1 open_disk /rhev/data-center/mnt/10.34.63.202:_mnt_export_nfs_lv3_lsvaty_ppc-nfs/00968ef6-441e-4313-9fbe-e49b08be657c/dom_md/ids error -13
2014-12-05 11:45:05+0000 4033 [35940]: s1 add_lockspace fail result -19


Lukas, can you please check the files under /dom_md and attach the output?
Does that fail constantly? I mean, after the failure you should still have the domain - what happens if you try to create a storage pool again? does it happen always with the domain?

Nir, any suggestion based on other related issues you've handled?

Comment 22 Michal Skrivanek 2015-01-12 10:31:07 UTC
Right. 3.4.3 ppc is not compatible with 3.4.4, and vice versa