Bug 1045053 - [hosted engine] - vdsm needs HA agent configuration before deployment
Summary: [hosted engine] - vdsm needs HA agent configuration before deployment
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 3.5.1
Assignee: Doron Fediuck
QA Contact: Artyom
URL:
Whiteboard: sla
: 1150285 (view as bug list)
Depends On: 1100566
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-19 14:29 UTC by Pavel Stehlik
Modified: 2016-02-10 20:17 UTC (History)
10 users (show)

Fixed In Version: vt13
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1100566 (view as bug list)
Environment:
Last Closed: 2015-04-29 06:25:59 UTC
oVirt Team: SLA
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
HE log & vdsm (823.64 KB, application/x-gzip)
2013-12-19 14:29 UTC, Pavel Stehlik
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 31432 0 master MERGED hosted-engine: don't log the whole backtrace for expected problems Never
oVirt gerrit 34003 0 ovirt-3.5 MERGED hosted-engine: don't log the whole backtrace for expected problems Never
oVirt gerrit 34004 0 ovirt-3.5.0 ABANDONED hosted-engine: don't log the whole backtrace for expected problems Never

Description Pavel Stehlik 2013-12-19 14:29:42 UTC
Created attachment 839003 [details]
HE log & vdsm

Description of problem:
 vdsm asks for configuration before deployment of HE. See add info for vdsm.log
...
IOError: [Errno 2] No such file or directory: '/etc/ovirt-hosted-engine/hosted-engine.conf'
...

Version-Release number of selected component (if applicable):
is27

How reproducible:
100%

Steps to Reproduce:
1. have HE installed, then try to disassemble it
2. yum erase ovirt-host\* and remove /etc/ovirt*
3. try install HE again

Actual results:
HE installation never ends - it stops on this line.
...
[ INFO  ] Start monitoring domain



Expected results:


Additional info:
...
Thread-1046::ERROR::2013-12-19 13:19:24,958::API::1223::vds::(getStats) failed to retrieve Hosted Engine HA score
Traceback (most recent call last):
  File "/usr/share/vdsm/API.py", line 1221, in getStats
    stats['haScore'] = haClient.HAClient().get_local_host_score()
  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 193, in get_local_host_score
    self._config = config.Config()
  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/env/config.py", line 57, in __init__
    self._load(Config.static_files)
  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/env/config.py", line 63, in _load
    with open(fname, 'r') as f:
IOError: [Errno 2] No such file or directory: '/etc/ovirt-hosted-engine/hosted-engine.conf'
...

Comment 2 Doron Fediuck 2013-12-23 12:09:45 UTC
Looks like setup left-overs which we should handle properly.

Comment 3 Sandro Bonazzola 2013-12-23 12:24:27 UTC
(In reply to Doron Fediuck from comment #2)
> Looks like setup left-overs which we should handle properly.

Seems more a bug in vdsm getStats: it's trying to read a file that is not there because the deploy is not done yet.

I think Greg has already seen it.

Comment 4 Greg Padgett 2014-01-07 00:13:29 UTC
After looking into it - there's another error in the log that is causing setup to stall:

Thread-49::DEBUG::2013-12-19 13:25:44,912::domainMonitor::263::Storage.DomainMonitorThread::(_monitorDomain) Unable to issue the acquire host id 1 request for domain 4eea45f1-0be1-4c5c-9ec3-1460a16de055
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 259, in _monitorDomain
    self.domain.acquireHostId(self.hostId, async=True)
  File "/usr/share/vdsm/storage/sd.py", line 458, in acquireHostId
    self._clusterLock.acquireHostId(hostId, async)
  File "/usr/share/vdsm/storage/clusterlock.py", line 189, in acquireHostId
    raise se.AcquireHostIdFailure(self._sdUUID, e)
AcquireHostIdFailure: Cannot acquire host id: ('4eea45f1-0be1-4c5c-9ec3-1460a16de055', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument'))

That being said, the exception from the HAClient library is a bit unsightly but should be recorded somewhere.  We might be able to clean up the logs a little by reducing this to a simple error rather than always logging the whole backtrace.

Comment 5 Eyal Edri 2014-02-10 10:29:49 UTC
moving to 3.3.2 since 3.3.1 was built and moved to QE.

Comment 6 Jiri Moskovcak 2014-05-22 09:00:53 UTC
Adding a patch to reduce the noise from the haclient and reassigning to vdsm to deal with the sanlock part.

Comment 7 Sandro Bonazzola 2014-08-12 07:39:24 UTC
Is the change already in 3.5 and 3.4? Is this a 3.3 issue only?

Comment 8 Jiri Moskovcak 2014-08-13 07:18:14 UTC
(In reply to Sandro Bonazzola from comment #7)
> Is the change already in 3.5 and 3.4? Is this a 3.3 issue only?

- it exists in 3.4 and 3.5 and is not fixed, good catch, going to post a patch in a moment

Comment 9 Eyal Edri 2014-10-07 07:16:49 UTC
this bug status was moved to MODIFIED before vdsm vt5 was built,
hence moving to on_qa, if this was mistake and the fix isn't in,
please contact rhev-integ

Comment 10 Artyom 2014-10-08 07:12:44 UTC
Verified on vt5

Comment 11 Jiri Moskovcak 2014-10-10 13:06:44 UTC
*** Bug 1150285 has been marked as a duplicate of this bug. ***

Comment 12 Jiri Moskovcak 2014-10-10 13:14:45 UTC
The patch is missing in vdsm-4.16.7

Comment 13 Jiri Moskovcak 2014-12-10 08:19:51 UTC
Fixed in vt13

Comment 14 Artyom 2014-12-15 11:33:12 UTC
Verified on vdsm-4.16.8.1-2.el6ev.x86_64 and ovirt-hosted-engine-ha-1.2.4-2.el6ev.noarch
Redeployment success.

Comment 18 Eyal Edri 2015-04-29 06:25:59 UTC
RHEV 3.5.1 was GA'd. closing current release.


Note You need to log in before you can comment on or make changes to this bug.