Bug 1045053

Summary: [hosted engine] - vdsm needs HA agent configuration before deployment
Product: Red Hat Enterprise Virtualization Manager Reporter: Pavel Stehlik <pstehlik>
Component: vdsmAssignee: Doron Fediuck <dfediuck>
Status: CLOSED CURRENTRELEASE QA Contact: Artyom <alukiano>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.3.0CC: bazulay, dfediuck, dougsland, eedri, gchaplik, lpeer, mavital, pablo.iranzo, sbonazzo, yeylon
Target Milestone: ---Keywords: Triaged
Target Release: 3.5.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: sla
Fixed In Version: vt13 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1100566 (view as bug list) Environment:
Last Closed: 2015-04-29 06:25:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1100566    
Bug Blocks:    
Attachments:
Description Flags
HE log & vdsm none

Description Pavel Stehlik 2013-12-19 14:29:42 UTC
Created attachment 839003 [details]
HE log & vdsm

Description of problem:
 vdsm asks for configuration before deployment of HE. See add info for vdsm.log
...
IOError: [Errno 2] No such file or directory: '/etc/ovirt-hosted-engine/hosted-engine.conf'
...

Version-Release number of selected component (if applicable):
is27

How reproducible:
100%

Steps to Reproduce:
1. have HE installed, then try to disassemble it
2. yum erase ovirt-host\* and remove /etc/ovirt*
3. try install HE again

Actual results:
HE installation never ends - it stops on this line.
...
[ INFO  ] Start monitoring domain



Expected results:


Additional info:
...
Thread-1046::ERROR::2013-12-19 13:19:24,958::API::1223::vds::(getStats) failed to retrieve Hosted Engine HA score
Traceback (most recent call last):
  File "/usr/share/vdsm/API.py", line 1221, in getStats
    stats['haScore'] = haClient.HAClient().get_local_host_score()
  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 193, in get_local_host_score
    self._config = config.Config()
  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/env/config.py", line 57, in __init__
    self._load(Config.static_files)
  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/env/config.py", line 63, in _load
    with open(fname, 'r') as f:
IOError: [Errno 2] No such file or directory: '/etc/ovirt-hosted-engine/hosted-engine.conf'
...

Comment 2 Doron Fediuck 2013-12-23 12:09:45 UTC
Looks like setup left-overs which we should handle properly.

Comment 3 Sandro Bonazzola 2013-12-23 12:24:27 UTC
(In reply to Doron Fediuck from comment #2)
> Looks like setup left-overs which we should handle properly.

Seems more a bug in vdsm getStats: it's trying to read a file that is not there because the deploy is not done yet.

I think Greg has already seen it.

Comment 4 Greg Padgett 2014-01-07 00:13:29 UTC
After looking into it - there's another error in the log that is causing setup to stall:

Thread-49::DEBUG::2013-12-19 13:25:44,912::domainMonitor::263::Storage.DomainMonitorThread::(_monitorDomain) Unable to issue the acquire host id 1 request for domain 4eea45f1-0be1-4c5c-9ec3-1460a16de055
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 259, in _monitorDomain
    self.domain.acquireHostId(self.hostId, async=True)
  File "/usr/share/vdsm/storage/sd.py", line 458, in acquireHostId
    self._clusterLock.acquireHostId(hostId, async)
  File "/usr/share/vdsm/storage/clusterlock.py", line 189, in acquireHostId
    raise se.AcquireHostIdFailure(self._sdUUID, e)
AcquireHostIdFailure: Cannot acquire host id: ('4eea45f1-0be1-4c5c-9ec3-1460a16de055', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument'))

That being said, the exception from the HAClient library is a bit unsightly but should be recorded somewhere.  We might be able to clean up the logs a little by reducing this to a simple error rather than always logging the whole backtrace.

Comment 5 Eyal Edri 2014-02-10 10:29:49 UTC
moving to 3.3.2 since 3.3.1 was built and moved to QE.

Comment 6 Jiri Moskovcak 2014-05-22 09:00:53 UTC
Adding a patch to reduce the noise from the haclient and reassigning to vdsm to deal with the sanlock part.

Comment 7 Sandro Bonazzola 2014-08-12 07:39:24 UTC
Is the change already in 3.5 and 3.4? Is this a 3.3 issue only?

Comment 8 Jiri Moskovcak 2014-08-13 07:18:14 UTC
(In reply to Sandro Bonazzola from comment #7)
> Is the change already in 3.5 and 3.4? Is this a 3.3 issue only?

- it exists in 3.4 and 3.5 and is not fixed, good catch, going to post a patch in a moment

Comment 9 Eyal Edri 2014-10-07 07:16:49 UTC
this bug status was moved to MODIFIED before vdsm vt5 was built,
hence moving to on_qa, if this was mistake and the fix isn't in,
please contact rhev-integ

Comment 10 Artyom 2014-10-08 07:12:44 UTC
Verified on vt5

Comment 11 Jiri Moskovcak 2014-10-10 13:06:44 UTC
*** Bug 1150285 has been marked as a duplicate of this bug. ***

Comment 12 Jiri Moskovcak 2014-10-10 13:14:45 UTC
The patch is missing in vdsm-4.16.7

Comment 13 Jiri Moskovcak 2014-12-10 08:19:51 UTC
Fixed in vt13

Comment 14 Artyom 2014-12-15 11:33:12 UTC
Verified on vdsm-4.16.8.1-2.el6ev.x86_64 and ovirt-hosted-engine-ha-1.2.4-2.el6ev.noarch
Redeployment success.

Comment 18 Eyal Edri 2015-04-29 06:25:59 UTC
RHEV 3.5.1 was GA'd. closing current release.