Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1100566

Summary: [hosted engine] - vdsm needs HA agent configuration before deployment
Product: Red Hat Enterprise Virtualization Manager Reporter: Jiri Moskovcak <jmoskovc>
Component: vdsmAssignee: Federico Simoncelli <fsimonce>
Status: CLOSED WORKSFORME QA Contact: Nikolai Sednev <nsednev>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.3.0CC: acanan, agk, alukiano, amureini, bazulay, cluster-maint, dfediuck, eedri, fsimonce, gchaplik, gpadgett, iheim, jmoskovc, lpeer, mavital, pablo.iranzo, pstehlik, sbonazzo, sherold, teigland, yeylon
Target Milestone: ---Keywords: Triaged
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1045053 Environment:
Last Closed: 2014-07-30 13:39:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1045053, 1142923, 1156165    

Comment 2 David Teigland 2014-05-23 14:35:27 UTC
I don't see any reference to sanlock changes in the comments above; was this misassigned?

Comment 3 Jiri Moskovcak 2014-05-26 06:46:15 UTC
My assumption about the problem with sanlock was based on this:

Thread-49::DEBUG::2013-12-19 13:25:44,912::domainMonitor::263::Storage.DomainMonitorThread::(_monitorDomain) Unable to issue the acquire host id 1 request for domain 4eea45f1-0be1-4c5c-9ec3-1460a16de055
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 259, in _monitorDomain
    self.domain.acquireHostId(self.hostId, async=True)
  File "/usr/share/vdsm/storage/sd.py", line 458, in acquireHostId
    self._clusterLock.acquireHostId(hostId, async)
  File "/usr/share/vdsm/storage/clusterlock.py", line 189, in acquireHostId
    raise se.AcquireHostIdFailure(self._sdUUID, e)
AcquireHostIdFailure: Cannot acquire host id: ('4eea45f1-0be1-4c5c-9ec3-1460a16de055', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument'))

If you don't think it's a problem with sanlock, then please reassign it to whatever component you think is causing the problem.

Comment 4 David Teigland 2014-05-27 14:18:34 UTC
The most likely cause for -EINVAL from add_lockspace is that a lockspace with the same name has already been added.  In the next version I have included a log message when this happens.

sanlock cannot do anything about this. vdsm will need to handle this situation.

Comment 6 Federico Simoncelli 2014-07-14 11:36:15 UTC
I see that the host id was released successfully 100ms earlier:

Thread-31::INFO::2013-12-19 10:34:34,882::clusterlock::197::SANLock::(releaseHostId) Releasing host id for domain 6c57fd8e-d77b-4833-adff-0050415ac789 (id: 1)
Thread-31::DEBUG::2013-12-19 10:34:34,882::clusterlock::207::SANLock::(releaseHostId) Host id for domain 6c57fd8e-d77b-4833-adff-0050415ac789 released successfully (id: 1)
...(no other sanlock operation)...
Thread-54::INFO::2013-12-19 10:34:34,975::clusterlock::174::SANLock::(acquireHostId) Acquiring host id for domain 6c57fd8e-d77b-4833-adff-0050415ac789 (id: 1)
Thread-54::DEBUG::2013-12-19 10:34:34,976::domainMonitor::263::Storage.DomainMonitorThread::(_monitorDomain) Unable to issue the acquire host id 1 request for domain 6c57fd8e-d77b-4833-adff-0050415ac789
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 259, in _monitorDomain
    self.domain.acquireHostId(self.hostId, async=True)
  File "/usr/share/vdsm/storage/sd.py", line 458, in acquireHostId
    self._clusterLock.acquireHostId(hostId, async)
  File "/usr/share/vdsm/storage/clusterlock.py", line 189, in acquireHostId
    raise se.AcquireHostIdFailure(self._sdUUID, e)
AcquireHostIdFailure: Cannot acquire host id: ('6c57fd8e-d77b-4833-adff-0050415ac789', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument'))

I know we can't do much without sanlock logs but do you think there could be another reason for EINVAL or maybe a race between release/acquire?
Thanks.

Comment 7 David Teigland 2014-07-14 15:10:19 UTC
Is it doing sanlock_rem_lockspace(SANLK_REM_ASYNC)?  If so, then the following sanlock_add_lockspace() would likely see the previous instance which is not yet gone.  In that case, add_lockspace returns EVINAL:
https://git.fedorahosted.org/cgit/sanlock.git/tree/src/lockspace.c?id=8163bc7b56be4cbe747dc1f3ad9a6f3bca368eb5#n642

Comment 8 Federico Simoncelli 2014-07-14 15:49:40 UTC
(In reply to David Teigland from comment #7)
> Is it doing sanlock_rem_lockspace(SANLK_REM_ASYNC)?  If so, then the
> following sanlock_add_lockspace() would likely see the previous instance
> which is not yet gone.  In that case, add_lockspace returns EVINAL:
> https://git.fedorahosted.org/cgit/sanlock.git/tree/src/lockspace.
> c?id=8163bc7b56be4cbe747dc1f3ad9a6f3bca368eb5#n642

No, we never use async for release (only for acquire). I double checked the code as well.

Jiri are you still hitting this issue?

Comment 9 Jiri Moskovcak 2014-07-15 06:32:55 UTC
moving needinfo to pstehlik, because he reported this bug

Comment 12 Artyom 2014-07-17 11:11:38 UTC
I see that bug opened for 3.3, for what version you prefer reproduction, for 3.3 or 3.4?

Comment 13 Artyom 2014-07-17 14:56:33 UTC
I checked it on ovirt-hosted-engine-setup-1.1.3-2.el6ev.noarch,
I have HE environment without changing any modes:
1) yum erase ovirt-host* -y
2) rm -rf /etc/ovirt-hosted*
3) yum install ovirt-hosted-engine-setup-1.1.3-2.el6ev.noarch -y
4) First hosted-engine --deploy up error because we have running vm:
   vdsClient -s 0 destroy vm_id
5) hosted-engine --deploy on clean storage, run ok, without any problems and also no any problems on:
...
[ INFO  ] Initializing sanlock metadata
[ INFO  ] Creating VM Image
[ INFO  ] Disconnecting Storage Pool
[ INFO  ] Start monitoring domain
[ INFO  ] Configuring VM
[ INFO  ] Updating hosted-engine configuration
...
Please inform me if you need to check it for 3.3

Comment 14 Allon Mureinik 2014-07-23 08:44:10 UTC
(In reply to Artyom from comment #13)
> 5) hosted-engine --deploy on clean storage, run ok, without any problems and
> also no any problems on:
Fede, based on this statement, can we close this BZ?

Comment 15 Allon Mureinik 2014-07-30 13:39:40 UTC
(In reply to Allon Mureinik from comment #14)
> (In reply to Artyom from comment #13)
> > 5) hosted-engine --deploy on clean storage, run ok, without any problems and
> > also no any problems on:
> Fede, based on this statement, can we close this BZ?
Closing.
If this was incorrect, please open with the relevant details.