Bug 1268075 - AcquireHostIdFailure during storage domain import
Summary: AcquireHostIdFailure during storage domain import
Keywords:
Status: CLOSED DUPLICATE of bug 1269768
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 3.6.0.1
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ovirt-3.6.1
: 3.6.1
Assignee: Maor
QA Contact: Aharon Canan
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-10-01 18:37 UTC by Richard Neuboeck
Modified: 2020-02-16 07:13 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-09 13:55:12 UTC
oVirt Team: Storage
Embargoed:
ylavi: ovirt-3.6.z?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
vdsm log showing the import exception (1.08 MB, application/x-xz)
2015-10-01 18:37 UTC, Richard Neuboeck
no flags Details
sanlock log file (64.16 KB, text/plain)
2015-10-01 18:37 UTC, Richard Neuboeck
no flags Details
ovirt hosted engine setup log (514.00 KB, text/plain)
2015-10-01 18:38 UTC, Richard Neuboeck
no flags Details
ovirt hosted engine setup answer file (3.17 KB, text/plain)
2015-10-01 18:38 UTC, Richard Neuboeck
no flags Details
Logs from the host system. (376.40 KB, application/x-bzip)
2015-10-14 09:58 UTC, Richard Neuboeck
no flags Details
Logs from the engine system. (266.64 KB, application/x-bzip)
2015-10-14 09:58 UTC, Richard Neuboeck
no flags Details
Screenshots of the WebUI during the import. (4.45 MB, application/x-bzip)
2015-10-14 10:00 UTC, Richard Neuboeck
no flags Details
Logs from the Engine RC3 (224.57 KB, application/x-xz)
2015-10-29 12:44 UTC, Richard Neuboeck
no flags Details
Logs from the Host RC3 (327.01 KB, application/x-xz)
2015-10-29 12:44 UTC, Richard Neuboeck
no flags Details

Description Richard Neuboeck 2015-10-01 18:37:05 UTC
Created attachment 1079221 [details]
vdsm log showing the import exception

Description of problem:

After finishing a self hosted engine setup and logging into the web UI. Importing the storage domain the engine VM is running on fails with 'Error while executing action Attach Storage Domain: AcquireHostIdFailure'.


Version-Release number of selected component (if applicable):

ovirt-release36-001-0.5.beta.noarch
vdsm-gluster-4.17.8-0.el7.centos.noarc


How reproducible:

Every time I tried.


Steps to Reproduce:

Setup for the host:
- CentOS 7.1 minimal installation
- Following the steps to setup oVirt in the 3.6 RC notes http://www.ovirt.org/OVirt_3.6_Release_Notes and http://www.ovirt.org/Hosted_Engine_Howto#Fresh_Install

Storage:
- CentOS 7.1 minimal installation
- GlusterFS 3.7
- replica 3 volume set up according to this http://www.ovirt.org/Features/Self_Hosted_Engine_Gluster_Support

1. Finish the self hosted engine setup and log in to the web UI
2. Select 'Import Pre-Configured Domain' from the Storage tab
3. Change storage type to glusterfs, fill out the name and export path information (same as during the hosted-engine --deploy)


Actual results:

'Error while executing action Attach Storage Domain: AcquireHostIdFailure'


Expected results:

Imported storage domain.


Additional info:

vdsm.log on the host shows this error when trying to import the storage domain:
Thread-1640::ERROR::2015-10-01 15:08:39,788::task::866::Storage.TaskManager.Task::(_setError) Task=`2d6cafb6-1192-4b68-9592-269aaffa7da9`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 998, in createStoragePool
    leaseParams)
  File "/usr/share/vdsm/storage/sp.py", line 574, in create
    self._acquireTemporaryClusterLock(msdUUID, leaseParams)
  File "/usr/share/vdsm/storage/sp.py", line 506, in _acquireTemporaryClusterLock
    msd.acquireHostId(self.id)
  File "/usr/share/vdsm/storage/sd.py", line 532, in acquireHostId
    self._clusterLock.acquireHostId(hostId, async)
  File "/usr/share/vdsm/storage/clusterlock.py", line 234, in acquireHostId
    raise se.AcquireHostIdFailure(self._sdUUID, e)
AcquireHostIdFailure: Cannot acquire host id: (u'df673b5c-3c20-4cf5-a0c2-f9b55559d917', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument'))

Comment 1 Richard Neuboeck 2015-10-01 18:37:37 UTC
Created attachment 1079222 [details]
sanlock log file

Comment 2 Richard Neuboeck 2015-10-01 18:38:11 UTC
Created attachment 1079223 [details]
ovirt hosted engine setup log

Comment 3 Richard Neuboeck 2015-10-01 18:38:41 UTC
Created attachment 1079224 [details]
ovirt hosted engine setup answer file

Comment 4 Allon Mureinik 2015-10-06 13:18:56 UTC
Importing a domain involves attaching it to a pool, which can't be done since the self hosted engine's domain is in its own "pool".

Roy - weren't you guys working on a procedure that does this?

Comment 5 Roy Golan 2015-10-11 13:33:37 UTC
(In reply to Allon Mureinik from comment #4)
> Importing a domain involves attaching it to a pool, which can't be done
> since the self hosted engine's domain is in its own "pool".
> 
> Roy - weren't you guys working on a procedure that does this?

In 3.6 we import the hosted engine domain into the engine.


But here it seems that the lock is held by someone which isn't SPM. Did you do
 some manual recovery procedure or something else? 


Maor/Allon - how is gluster related to that as with ISCSI and NFS this works (and we use sanlock three too cmiiw)

Comment 6 Maor 2015-10-11 13:54:12 UTC
(In reply to Roy Golan from comment #5)
> (In reply to Allon Mureinik from comment #4)
> > Importing a domain involves attaching it to a pool, which can't be done
> > since the self hosted engine's domain is in its own "pool".
> > 
> > Roy - weren't you guys working on a procedure that does this?
> 
> In 3.6 we import the hosted engine domain into the engine.
> 
> 
> But here it seems that the lock is held by someone which isn't SPM. Did you
> do
>  some manual recovery procedure or something else? 
> 
> 
> Maor/Allon - how is gluster related to that as with ISCSI and NFS this works
> (and we use sanlock three too cmiiw)

Basically Gluster and NFS should reflect similar logic,

The AcquireHostIdFailure usually fails when there is a Host which still acquire 
sanlock lease on the Storage Domain.
This could be happened when the environment has been recovered and the Hosts were not rebooted primarily. 

Was this environment is a recovered environment?
Can you please also attach the full engine log with the error of the import operation?

Comment 7 Richard Neuboeck 2015-10-14 09:57:18 UTC
The engine.log of the machine I created the bug report off got rotated and removed. Therefore I wiped everything and reinstalled from scratch. Notable is that after the installation went through and the engine was shut down the HA agent did not start it. But I guess this is another problem.

Since I'm quite new to ovirt I'm not sure what you mean by 'recovered environment'. Follow this path:

yum -y install http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm
yum -y install ovirt-hosted-engine-setup screen vdsm-gluster
screen
hosted-engine --deploy

I install the engine from the local CentOS 7.1 ISO image.

Normally I access the engine immediately after the installation is done and try to import the storage domain. In this instance I rebooted.

After rebooting the system the engine got start automatically. As described in the initial report I logged in. The Storage tab is empty. I tried to import the storage domain created during the deploy process. This failed again with 'AcquireHostIdFailure'.

The vdsm log shows the same errr as above.

The engine log shows this error:
2015-10-14 11:18:06,864 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-10) [66bb3917] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM cube-one command failed: Cannot acquire host id: (u'2fe5d951-060c-45
5a-af2c-aeb77120b969', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument'))

If I try to import the domain again the error changes to 'Storage connection already exists'.

I'm attaching three archives to this bug containing all logs of the engine, the host, a list of packages installed on those systems and screenshots from the engine WebUI.

Comment 8 Richard Neuboeck 2015-10-14 09:58:24 UTC
Created attachment 1082765 [details]
Logs from the host system.

Comment 9 Richard Neuboeck 2015-10-14 09:58:56 UTC
Created attachment 1082766 [details]
Logs from the engine system.

Comment 10 Richard Neuboeck 2015-10-14 10:00:10 UTC
Created attachment 1082768 [details]
Screenshots of the WebUI during the import.

Comment 11 Red Hat Bugzilla Rules Engine 2015-10-19 10:52:48 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 12 Sandro Bonazzola 2015-10-26 12:42:34 UTC
this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015.
Please review this bug and if not a blocker, please postpone to a later release.
All bugs not postponed on GA release will be automatically re-targeted to

- 3.6.1 if severity >= high
- 4.0 if severity < high

Comment 13 Yaniv Lavi 2015-10-29 09:06:17 UTC
Can you try with latest RC? 
I think it is resolved, note that it still doesn't auto import. 
This should be fixed as part of BZ #1269768.

Comment 14 Richard Neuboeck 2015-10-29 12:42:59 UTC
I tried 3.6 RC 3 following the same installation steps as before. The manual import still fails with the same error. I'm attaching the latest log files from the host and the engine.

Comment 15 Richard Neuboeck 2015-10-29 12:44:12 UTC
Created attachment 1087470 [details]
Logs from the Engine RC3

Comment 16 Richard Neuboeck 2015-10-29 12:44:40 UTC
Created attachment 1087471 [details]
Logs from the Host RC3

Comment 17 Yaniv Lavi 2015-10-29 12:52:34 UTC
Do you know about this issue? should this be on SLA?

Comment 18 Roy Golan 2015-11-09 13:55:12 UTC
(In reply to Yaniv Dary from comment #17)
> Do you know about this issue? should this be on SLA?

That has been stated before, this will be solved in bug 1269768.

*** This bug has been marked as a duplicate of bug 1269768 ***


Note You need to log in before you can comment on or make changes to this bug.