Bug 1408602

Summary:

Failed to get auto-import for hosted_storage after 3.5.8-0.1 to 3.6.10.1-0.1 upgrade.

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

Nikolai Sednev <nsednev>

Component:

ovirt-engine

Assignee:

Simone Tiraboschi <stirabos>

Status:

CLOSED DUPLICATE

QA Contact:

meital avital <mavital>

Severity:

urgent

Docs Contact:

Priority:

unspecified

Version:

3.6.9

CC:

dfediuck, gklein, lsurette, msivak, nsednev, nsoffer, rbalakri, Rhev-m-bugs, srevivo, ykaul, yquinn

Target Milestone:

---

Keywords:

Regression

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-01-02 11:09:38 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

SLA

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
engine's sosreport	none
alma03's sosreport	none
alma04's sosreport	none

Description Nikolai Sednev 2016-12-25 17:32:56 UTC

Description of problem:
Failed to get auto-import for hosted_storage after 3.5.8-0.1 to 3.6.10.1-0.1 upgrade.

On my 3.5 environment, which consisted of two hosts:
alma03 - 3.5's RHEL6.7:
ovirt-host-deploy-1.3.2-1.el6ev.noarch
qemu-kvm-rhev-0.12.1.2-2.479.el6_7.5.x86_64
vdsm-4.16.38-1.el6ev.x86_64
sanlock-2.8-2.el6_5.x86_64
mom-0.4.1-4.el6ev.noarch
ovirt-hosted-engine-ha-1.2.10-1.el6ev.noarch
libvirt-client-0.10.2-54.el6_7.6.x86_64
ovirt-hosted-engine-setup-1.2.6.1-1.el6ev.noarch
Linux version 2.6.32-431.76.1.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Tue Nov 1 14:24:00 EDT 2016
Linux 2.6.32-431.76.1.el6.x86_64 #1 SMP Tue Nov 1 14:24:00 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 6.7 (Santiago)

alma04 - 3.5's RHEVH7.2 hypervisor for RHEV 3.5 (RHEV Hypervisor - 7.2 - 20160219.0.el7ev):
qemu-kvm-rhev-2.3.0-31.el7_2.7.x86_64
vdsm-4.16.35-2.el7ev.x86_64
ovirt-node-plugin-hosted-engine-0.2.0-18.0.el7ev.noarch
ovirt-node-branding-rhev-3.2.3-31.el7.noarch
rhevm-sdk-python-3.5.6.0-1.el7ev.noarch
mom-0.4.1-4.el7ev.noarch
ovirt-node-selinux-3.2.3-31.el7.noarch
ovirt-host-deploy-offline-1.3.0-3.el7ev.x86_64
ovirt-hosted-engine-setup-1.2.6.1-1.el7ev.noarch
ovirt-node-plugin-vdsm-0.2.0-26.el7ev.noarch
ovirt-node-plugin-cim-3.2.3-31.el7.noarch
ovirt-node-plugin-snmp-3.2.3-31.el7.noarch
sanlock-3.2.4-1.el7.x86_64
libvirt-client-1.2.17-13.el7_2.3.x86_64
ovirt-node-plugin-rhn-3.2.3-31.el7.noarch
ovirt-node-3.2.3-31.el7.noarch
ovirt-hosted-engine-ha-1.2.10-1.el7ev.noarch
ovirt-host-deploy-1.3.2-1.el7ev.noarch
Linux version 3.10.0-327.10.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Sat Jan 23 04:54:55 EST 2016
Linux 3.10.0-327.10.1.el7.x86_64 #1 SMP Sat Jan 23 04:54:55 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Virtualization Hypervisor release 7.2 (20160219.0.el7ev)

I've followed the https://access.redhat.com/solutions/2351141 and performed an upgrade from Red Hat Enterprise Virtualization Manager Version 3.5.8-0.1.el6ev to 3.6.10.1-0.1.el6.noarch engine (initially engine was running on alma03, which was RHEL6.7 on 3.5.

3.6.10.1-0.1 engine's components:
rhevm-sdk-python-3.6.9.1-1.el6ev.noarch
ovirt-setup-lib-1.0.1-1.el6ev.noarch
rhevm-restapi-3.5.8-0.1.el6ev.noarch
rhevm-setup-plugin-vmconsole-proxy-helper-3.6.10.1-0.1.el6.noarch
rhevm-websocket-proxy-3.6.10.1-0.1.el6.noarch
rhevm-guest-agent-common-1.0.11-6.el6ev.noarch
rhevm-image-uploader-3.6.1-2.el6ev.noarch
rhevm-spice-client-x64-msi-3.6-7.el6.noarch
rhevm-setup-base-3.6.10.1-0.1.el6.noarch
rhevm-extensions-api-impl-3.6.10.1-0.1.el6.noarch
rhevm-dbscripts-3.5.8-0.1.el6ev.noarch
rhevm-backend-3.5.8-0.1.el6ev.noarch
rhevm-setup-plugin-ovirt-engine-common-3.6.10.1-0.1.el6.noarch
rhevm-reports-setup-3.6.5.1-1.el6ev.noarch
rhevm-log-collector-3.6.1-1.el6ev.noarch
ovirt-host-deploy-java-1.4.1-1.el6ev.noarch
rhevm-spice-client-x86-msi-3.6-7.el6.noarch
rhevm-branding-rhev-3.6.0-10.el6ev.noarch
rhevm-lib-3.6.10.1-0.1.el6.noarch
rhevm-3.5.8-0.1.el6ev.noarch
rhevm-dwh-3.5.5-1.el6ev.noarch
rhev-release-3.6.10-2-001.noarch
rhevm-setup-plugins-3.6.5-1.el6ev.noarch
rhevm-dwh-setup-3.6.8-1.el6ev.noarch
rhevm-spice-client-x64-cab-3.6-7.el6.noarch
rhevm-tools-3.5.8-0.1.el6ev.noarch
rhevm-setup-3.6.10.1-0.1.el6.noarch
rhevm-iso-uploader-3.6.0-1.el6ev.noarch
rhevm-spice-client-x86-cab-3.6-7.el6.noarch
ovirt-host-deploy-1.4.1-1.el6ev.noarch
ovirt-engine-extension-aaa-jdbc-1.0.7-2.el6ev.noarch
rhevm-userportal-3.5.8-0.1.el6ev.noarch
rhevm-webadmin-portal-3.5.8-0.1.el6ev.noarch
rhevm-setup-plugin-ovirt-engine-3.6.10.1-0.1.el6.noarch
rhevm-doc-3.6.10-1.el6ev.noarch
rhevm-dependencies-3.6.1-1.el6ev.noarch
rhevm-reports-3.5.8-1.el6ev.noarch
rhevm-setup-plugin-websocket-proxy-3.6.10.1-0.1.el6.noarch
rhevm-cli-3.6.9.0-1.el6ev.noarch
rhev-guest-tools-iso-3.6-6.el6ev.noarch
Linux version 2.6.32-642.11.1.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Wed Oct 26 10:25:23 EDT 2016
Linux 2.6.32-642.11.1.el6.x86_64 #1 SMP Wed Oct 26 10:25:23 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 6.8 (Santiago)

Reprovisioned RHEL to RHEV Hypervisor - 7.2 - 20160219.0.el7ev (3.5) and then set it to maintenance and upgraded to Hypervisor - 7.3 - 20161028.1.el7ev (3.6) and migrated HE-VM back to it, to default host cluster, from temporary host cluster. 
Upgraded the remaining RHEV Hypervisor - 7.2 - 20160219.0.el7ev (3.5) to Hypervisor - 7.3 - 20161028.1.el7ev (3.6) in temporary host cluster and then moved it to default host cluster.

Upgraded the engine from 3.5.8-0.1 to 3.6.10.1-0.1.el6.

Bumped up default host cluster with two RHEVH7.2 20161028.1.el7ev (3.6) from 3.5 to 3.6 compatibility mode, while received message:
"Change Cluster Compatibility Version
There are running VMs in the cluster. All of them have to be rebooted if the Cluster Compatibility Version is changed.
You are about to change the Cluster Compatibility Version. Are you sure you want to continue?"

Bumped up Data Center from 3.5 to 3.6. While still in data center was in 3.5, I've already saw that hosted_storage started auto-import, as seen it in storage tab with locked symbol.

After data center was bumped up to 3.6, hosted_storage was still in locked status in WEBUI and hosted_storage did not finished auto-import successfully, in WEBUI I saw :
"Dec 25, 2016 6:40:42 PM
Data Center Default was updated by admin@internal

Dec 25, 2016 6:38:57 PM
The Hosted Engine Storage Domain isn't Active.

Dec 25, 2016 6:38:49 PM
Failed to import the Hosted Engine Storage Domain
	
Dec 25, 2016 6:38:49 PM
Failed to attach Storage Domain hosted_storage to Data Center Default. (User: SYSTEM)
	
Dec 25, 2016 6:38:49 PM
VDSM command failed: Cannot acquire host id: (u'81af2dad-ac74-4e6b-9f56-d69c5811686a', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument'))"

I've tried to set alma04 to global maintenance and then tried this on engine: 
"engine-config --list | grep -i ovf
engine-config -s OvfUpdateIntervalInMinutes=1
service ovirt-engine restart".
I've set back "hosted-engine --set-maintenance --mode=none" on alma04.

Version-Release number of selected component (if applicable):

3.6.10 RHEVH's components on both hosts after upgrade:
ovirt-node-lib-3.6.1-34.0.el7ev.noarch
ovirt-node-plugin-cim-logic-3.6.1-34.0.el7ev.noarch
qemu-kvm-rhev-2.6.0-27.el7.x86_64
ovirt-node-plugin-hosted-engine-0.3.0-8.el7ev.noarch
ovirt-node-plugin-vdsm-0.6.1-11.el7ev.noarch
ovirt-node-plugin-cim-3.6.1-34.0.el7ev.noarch
ovirt-node-branding-rhev-3.6.1-34.0.el7ev.noarch
rhevm-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-host-deploy-1.4.1-1.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
libvirt-client-2.0.0-10.el7.x86_64
ovirt-node-lib-legacy-3.6.1-34.0.el7ev.noarch
ovirt-node-selinux-3.6.1-34.0.el7ev.noarch
ovirt-node-plugin-snmp-logic-3.6.1-34.0.el7ev.noarch
mom-0.5.6-1.el7ev.noarch
vdsm-4.17.35-1.el7ev.noarch
ovirt-hosted-engine-setup-1.3.7.3-1.el7ev.noarch
ovirt-node-plugin-rhn-3.6.1-34.0.el7ev.noarch
ovirt-node-plugin-snmp-3.6.1-34.0.el7ev.noarch
sanlock-3.2.4-1.el7.x86_64
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
ovirt-node-lib-config-3.6.1-34.0.el7ev.noarch
ovirt-node-3.6.1-34.0.el7ev.noarch
ovirt-hosted-engine-ha-1.3.5.8-1.el7ev.noarch
ovirt-host-deploy-offline-1.4.0-1.el7ev.x86_64
ovirt-setup-lib-1.0.1-1.el7ev.noarch
Linux version 3.10.0-514.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Oct 19 11:24:13 EDT 2016
Linux 3.10.0-514.el7.x86_64 #1 SMP Wed Oct 19 11:24:13 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Virtualization Hypervisor release 7.3 (20161028.1.el7ev)

Components on engine after upgrade:
ovirt-vmconsole-proxy-1.0.4-1.el6ev.noarch
rhevm-webadmin-portal-3.6.10.1-0.1.el6.noarch
rhevm-sdk-python-3.6.9.1-1.el6ev.noarch
ovirt-setup-lib-1.0.1-1.el6ev.noarch
rhevm-setup-plugin-vmconsole-proxy-helper-3.6.10.1-0.1.el6.noarch
rhevm-tools-3.6.10.1-0.1.el6.noarch
rhevm-dwh-3.6.8-1.el6ev.noarch
rhevm-websocket-proxy-3.6.10.1-0.1.el6.noarch
rhevm-guest-agent-common-1.0.11-6.el6ev.noarch
rhevm-image-uploader-3.6.1-2.el6ev.noarch
rhevm-spice-client-x64-msi-3.6-7.el6.noarch
rhevm-dbscripts-3.6.10.1-0.1.el6.noarch
rhevm-setup-base-3.6.10.1-0.1.el6.noarch
rhevm-extensions-api-impl-3.6.10.1-0.1.el6.noarch
rhevm-setup-plugin-ovirt-engine-common-3.6.10.1-0.1.el6.noarch
rhevm-reports-setup-3.6.5.1-1.el6ev.noarch
rhevm-log-collector-3.6.1-1.el6ev.noarch
ovirt-host-deploy-java-1.4.1-1.el6ev.noarch
rhevm-spice-client-x86-msi-3.6-7.el6.noarch
ovirt-vmconsole-1.0.4-1.el6ev.noarch
rhevm-tools-backup-3.6.10.1-0.1.el6.noarch
rhevm-userportal-3.6.10.1-0.1.el6.noarch
rhevm-branding-rhev-3.6.0-10.el6ev.noarch
rhevm-lib-3.6.10.1-0.1.el6.noarch
rhev-release-3.6.10-2-001.noarch
rhevm-setup-plugins-3.6.5-1.el6ev.noarch
rhevm-dwh-setup-3.6.8-1.el6ev.noarch
rhevm-spice-client-x64-cab-3.6-7.el6.noarch
rhev-hypervisor7-7.3-20161028.1.el6ev.noarch
rhevm-restapi-3.6.10.1-0.1.el6.noarch
rhevm-reports-3.6.5.1-1.el6ev.noarch
rhevm-setup-3.6.10.1-0.1.el6.noarch
rhevm-iso-uploader-3.6.0-1.el6ev.noarch
rhevm-spice-client-x86-cab-3.6-7.el6.noarch
rhevm-vmconsole-proxy-helper-3.6.10.1-0.1.el6.noarch
rhevm-backend-3.6.10.1-0.1.el6.noarch
ovirt-host-deploy-1.4.1-1.el6ev.noarch
ovirt-engine-extension-aaa-jdbc-1.0.7-2.el6ev.noarch
rhevm-setup-plugin-ovirt-engine-3.6.10.1-0.1.el6.noarch
rhevm-doc-3.6.10-1.el6ev.noarch
rhevm-dependencies-3.6.1-1.el6ev.noarch
rhevm-3.6.10.1-0.1.el6.noarch
rhevm-setup-plugin-websocket-proxy-3.6.10.1-0.1.el6.noarch
rhevm-cli-3.6.9.0-1.el6ev.noarch
rhev-guest-tools-iso-3.6-6.el6ev.noarch
Linux version 2.6.32-642.11.1.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Wed Oct 26 10:25:23 EDT 2016
Linux 2.6.32-642.11.1.el6.x86_64 #1 SMP Wed Oct 26 10:25:23 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 6.8 (Santiago)


How reproducible:
100%

Steps to Reproduce:
1.Deploy clean 3.5 HE environment on one RHEL6.7 host over NFS and add pair data storage domains to it.
2.Create temporary host cluster and name it "test".
3.Add additional 3.5 RHEVH7.2 host as second hosted-engine host to host cluster "test".
4.Migrate HE-VM via WEBUI to 3.5 RHEVH7.2 in host cluster "test".
5.Remove RHEL6.7 host from default host cluster and reprovision it to 3.5 RHEVH7.2, then add it back to default host cluster as hosted-engine host.
6.Install "yum install rhev-hypervisor7" on engine and upgrade 3.5 RHEVH7.2 in default host cluster to 3.6 RHEVH7.2.
7.Migrate HE-VM to 3.6 RHEVH7.2 in default host cluster.
8.Upgrade remaining in "test" host cluster host from 3.5 RHEVH7.2 to 3.6 RHEVH7.2 and then add it to default host cluster.
9.Remove "test" host cluster.
10.Upgrade the engine from 3.5 to 3.6.10, while hosts in global maintenance.
11.Cancel global maintenance on hosts, once engine upgrade is finished.
12.Bump up to 3.6 compatibility mode on default host cluster.
13.Bump up to 3.6 compatibility mode on data center.

Actual results:
Hosted_storage stuck in locked.

Expected results:
Hosted storage should succeed, once host cluster compatibility mode changes from 3.5 to 3.6.

Additional info:
See sosreports from both hosts and the engine.

Comment 2 Nikolai Sednev 2016-12-25 17:39:05 UTC

Created attachment 1235281 [details]
engine's sosreport

Comment 3 Nikolai Sednev 2016-12-25 17:40:11 UTC

Created attachment 1235282 [details]
alma03's sosreport

Comment 4 Nikolai Sednev 2016-12-25 17:41:23 UTC

Created attachment 1235283 [details]
alma04's sosreport

Comment 5 Nikolai Sednev 2016-12-27 15:54:22 UTC

In case that someone being interested on why there is a time shift between logs from host (RHEVH) and the engine, the answer is here https://access.redhat.com/solutions/440343. "All RHEV Hypervisors use UTC as the system time and this can not be changed. Any guests that have the incorrect time need to have their time zones changed to reflect the proper time."
I had around 2 hours time shift between engine and RHEVH, as engine was deployed first time on RHEL6.7 with correct timezone, while engine migrated to RHEVH, the time zone had changed and was not synchronized with what it should have to be.

Comment 6 Nir Soffer 2016-12-28 12:54:22 UTC

Nikolai, can you point me to the vdsm log containing this error?

Dec 25, 2016 6:38:49 PM
VDSM command failed: Cannot acquire host id: (u'81af2dad-ac74-4e6b-9f56-d69c5811686a', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument'))"

Comment 7 Nir Soffer 2016-12-28 12:59:57 UTC

Also, /var/log/sanlock.log from same host can be useful.

Comment 8 Yanir Quinn 2016-12-28 13:08:46 UTC

I have found a possible cause for that VDSM failure on vdsm.log.1 of host alma04 :

Thread-16::ERROR::2016-12-25 15:56:27,693::sdc::145::Storage.StorageDomainCache::(_findDomain) domain 81af2dad-ac74-4e6b-9f56-d69c5811686a not found
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 143, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 173, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: ('81af2dad-ac74-4e6b-9f56-d69c5811686a',)
Thread-16::ERROR::2016-12-25 15:56:27,697::task::866::Storage.TaskManager.Task::(_setError) Task=`c22ca9a6-9869-40aa-9ede-6ae723f293d1`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2835, in getStorageDomainInfo
    dom = self.validateSdUUID(sdUUID)
  File "/usr/share/vdsm/storage/hsm.py", line 277, in validateSdUUID
...

Im not sure if the logs correlate, this needs to be verified.

this issue was raised in the past :
https://bugzilla.redhat.com/show_bug.cgi?id=1269768#c31

Comment 9 Nikolai Sednev 2016-12-28 14:42:34 UTC

(In reply to Nir Soffer from comment #6)
> Nikolai, can you point me to the vdsm log containing this error?
> 
> Dec 25, 2016 6:38:49 PM
> VDSM command failed: Cannot acquire host id:
> (u'81af2dad-ac74-4e6b-9f56-d69c5811686a', SanlockException(22, 'Sanlock
> lockspace add failure', 'Invalid argument'))"

Time gap should be +2 hours between UTC and IST, so on RHEVH host there is always lag of 2 hours if compared to the engine's IST. 
You should correlate with engine log, by subtracting 2 hours from it's events and then looking on vdsm log from RHEVH.

Comment 10 Simone Tiraboschi 2017-01-02 11:09:38 UTC

This is exactly a duplicate of rhbz#1322849:

The engine was trying to import the hosted-engine storage domain using the host alma03; on alma03, under /etc/ovirt-hosted-engine/hosted-engine.conf, we have host_id=3 so ovirt-ha-agent is locking with host_id=3

on the other side, the engine thinks that alma03 has host_id=1 and so it's trying to acquire again the lock with host_id=1 but this will fail, indeed in /var/log/sanlock.log we see:
2016-12-25 16:38:49+0000 4219 [16357]: add_lockspace 81af2dad-ac74-4e6b-9f56-d69c5811686a:1:/rhev/data-center/mnt/10.35.110.11:_Compute__NFS_nsednev__he__1/81af2dad-ac74-4e6b-9f56-d69c5811686a/dom_md/ids:0 conflicts with name of list1 s4 81af2dad-ac74-4e6b-9f56-d69c5811686a:3:/rhev/data-center/mnt/10.35.110.11:_Compute__NFS_nsednev__he__1/81af2dad-ac74-4e6b-9f56-d69c5811686a/dom_md/ids:0

where the relevant information is the conflict between 81af2dad-ac74-4e6b-9f56-d69c5811686a:1 and 81af2dad-ac74-4e6b-9f56-d69c5811686a:3

Under vdsm logs we see:
jsonrpc.Executor/1::DEBUG::2016-12-25 16:38:49,810::__init__::532::jsonrpc.JsonRpcServer::(_handle_request) Calling 'StorageDomain.detach' in bridge with {u'storagepoolID': u'00000000-0000-0000-0000-000000000000', u'force': True, u'storagedomainID': u'81af2dad-ac74-4e6b-9f56-d69c5811686a'}
jsonrpc.Executor/1::DEBUG::2016-12-25 16:38:49,811::task::595::Storage.TaskManager.Task::(_updateState) Task=`b95270fd-34e8-4b7f-adfd-6f587bab67cf`::moving from state init -> state preparing
jsonrpc.Executor/1::INFO::2016-12-25 16:38:49,811::logUtils::48::dispatcher::(wrapper) Run and protect: forcedDetachStorageDomain(sdUUID=u'81af2dad-ac74-4e6b-9f56-d69c5811686a', spUUID=u'00000000-0000-0000-0000-000000000000', options=None)
jsonrpc.Executor/1::INFO::2016-12-25 16:38:49,811::clusterlock::219::Storage.SANLock::(acquireHostId) Acquiring host id for domain 81af2dad-ac74-4e6b-9f56-d69c5811686a (id: 1)
jsonrpc.Executor/1::ERROR::2016-12-25 16:38:49,813::task::866::Storage.TaskManager.Task::(_setError) Task=`b95270fd-34e8-4b7f-adfd-6f587bab67cf`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 805, in forcedDetachStorageDomain
    self._deatchStorageDomainFromOldPools(sdUUID)
  File "/usr/share/vdsm/storage/hsm.py", line 781, in _deatchStorageDomainFromOldPools
    dom.acquireHostId(pool.id)
  File "/usr/share/vdsm/storage/sd.py", line 533, in acquireHostId
    self._clusterLock.acquireHostId(hostId, async)
  File "/usr/share/vdsm/storage/clusterlock.py", line 234, in acquireHostId
    raise se.AcquireHostIdFailure(self._sdUUID, e)
AcquireHostIdFailure: Cannot acquire host id: (u'81af2dad-ac74-4e6b-9f56-d69c5811686a', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument'))

and also here the relevant information is the conflict between (id: 1) from the engine and host_id=3 under /etc/ovirt-hosted-engine/hosted-engine.conf

As for https://bugzilla.redhat.com/show_bug.cgi?id=1322849#c18 the workaround is manually syncing the host_id value for ovirt-ha-agent with the value used by the engine.
To do that:
1. run on the engine VM:
 sudo -u postgres psql engine -c "select vds_name, vds_spm_id from vds ORDER BY vds_spm_id"
2. check the value of host_id under /etc/ovirt-hosted-engine/hosted-engine.conf for all the involved hosts. If they doesn't match with the value at point 1:
3. put the host into maintenance mode from the engine, fix the value under  /etc/ovirt-hosted-engine/hosted-engine.conf and reboot the host
4. When the host is up again, reactivate it from the engine

*** This bug has been marked as a duplicate of bug 1322849 ***

Comment 11 Nikolai Sednev 2017-01-03 10:21:01 UTC

Opened https://bugzilla.redhat.com/show_bug.cgi?id=1409771 to get this issue covered there.