1569593 – ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory' Is the Hosted Engine setup finished?

Bug 1569593 - ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory' Is the Hosted Engine setup finished?

Summary: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or di...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-hosted-engine-ha
Classification:	oVirt
Component:	General
Sub Component:
Version:	---
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	low
Target Milestone:	ovirt-4.4.0
Target Release:	---
Assignee:	Milan Zamazal
QA Contact:	Nikolai Sednev
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1569601 (view as bug list)
Depends On:	1795672
Blocks:	1656294
TreeView+	depends on / blocked

Reported:	2018-04-19 14:39 UTC by Petr Kubica
Modified:	2020-05-21 13:00 UTC (History)
CC List:	18 users (show)
Fixed In Version:
Clone Of:
Clones:	1656294 (view as bug list)
Environment:
Last Closed:	2020-05-20 19:59:57 UTC
oVirt Team:	SLA
Embargoed:
Flags:	rbarry: ovirt-4.4?

Attachments	(Terms of Use)
hosted-engine --vm-status (7.54 KB, text/plain) 2018-04-22 07:59 UTC, Petr Kubica	no flags	Details
sosreport from alma04 (10.01 MB, application/x-xz) 2019-02-11 17:51 UTC, Nikolai Sednev	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1501012	high	CLOSED	[RFE] Show thread dump for blocked VDSM workers in the log to show details what exactly is blocked	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1569601	unspecified	CLOSED	vdsm root ERROR failed to retrieve Hosted Engine HA info	2021-02-22 00:41:40 UTC
oVirt gerrit	92338	'None'	ABANDONED	Broker: Start domain monitor asynchronously	2021-02-10 15:57:09 UTC
oVirt gerrit	96618	'None'	MERGED	Exit unconditionally on errors after some timeout	2021-02-10 15:57:09 UTC
oVirt gerrit	99994	'None'	MERGED	virt: Changed errors to warnings for deployments	2021-02-10 15:57:09 UTC

Internal Links: 1501012 1569601

Description Petr Kubica 2018-04-19 14:39:50 UTC

Description of problem:
Exception was discovered during automation testing.

ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory' Is the Hosted Engine setup finished?
WARN Worker blocked: <Worker name=periodic/0 running <Task <Operation action=<vdsm.virt.sampling.HostMonitor object at 0x7f4d64342150> at 0x7f4d64342190> timeout=15, duration=15 at 0x7f4d64178e50> task#=9 at 0x7f4d64376b50>, traceback:
File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
  self.run()
File: "/usr/lib64/python2.7/threading.py", line 765, in run
  self.__target(*self.__args, **self.__kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 194, in run
  ret = func(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run
  self._execute_task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task
  task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__
  self._callable()
File: "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 220, in __call__
  self._func()
File: "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 578, in __call__
  stats = hostapi.get_stats(self._cif, self._samples.stats())
File: "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 77, in get_stats
  ret['haStats'] = _getHaInfo()
File: "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in _getHaInfo
  stats = instance.get_all_stats()
File: "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 93, in get_all_stats
  stats = broker.get_stats_from_storage()
File: "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 135, in get_stats_from_storage
  result = self._proxy.get_stats()
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
  return self.__send(self.__name, args)
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __request
  verbose=self.__verbose
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
  return self.single_request(host, handler, request_body, verbose)
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1303, in single_request
  response = h.getresponse(buffering=True)
File: "/usr/lib64/python2.7/httplib.py", line 1113, in getresponse
  response.begin()
File: "/usr/lib64/python2.7/httplib.py", line 444, in begin
  version, status, reason = self._read_status()
File: "/usr/lib64/python2.7/httplib.py", line 400, in _read_status
  line = self.fp.readline(_MAXLINE + 1)
File: "/usr/lib64/python2.7/socket.py", line 476, in readline
  data = self._sock.recv(self._rbufsize)

Version-Release number of selected component (if applicable):
vdsm-4.20.25-1.el7ev.x86_64

Steps to Reproduce:
Unclear - it happens during some different tests

Actual results:
Exception in journal log of vdsmd service

Expected results:
No exceptions in journal log of vdsmd service

Comment 2 Doron Fediuck 2018-04-20 08:55:03 UTC

Is hosted engine installed on the vdsm host with this issue?

Comment 3 Petr Kubica 2018-04-22 07:58:36 UTC

That is a difficult question. During the tests it can happen that the machine was removed and added as host without hosted engine deployment process (I mean that the host was added as host and not hosted-engine host). I run all tests again and check webadmin and all hosts. In engine there are crowns at all hosts but at first on lynx23 and lynx24 were for few seconds exclamation marks with "unable to compute HA score

I'm attaching output of hosted-engine --vm-status from all hosts

Comment 4 Petr Kubica 2018-04-22 07:59:41 UTC

Created attachment 1425253 [details]
hosted-engine --vm-status

Comment 5 Martin Sivák 2018-04-25 10:17:59 UTC

We really need a reproducer if we want to fix this, but looking at the report and the comment #3 I am not really sure I would consider this a bug..

Comment 6 Petr Kubica 2018-05-16 09:37:30 UTC

User shouldn't see multiple printed exceptions in journal log. User should see only errors in this case. I cannot give you exact reproduction steps how to achieve this exception. I have only logs from automation from all machines which I posted in comment #1

Comment 7 Olimp Bockowski 2018-05-30 09:11:51 UTC

I was asked by a customer to double check his upgraded RHV env (4.1 -> 4.2), there are some other issues, so I had to conduct similar upgrade in my test environment. After upgrading hypervisor from  rhvh-4.1-0.20180126.0 to rhvh-4.2.3.0-0.20180518.0 I have problems with ovirt-ha-agent, which leads to ovirt-ha-broker and finally to vdsm, many errors are reported:

ERROR (periodic/1) [root] failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? (api:196)

Comment 8 Martin Sivák 2018-06-04 16:26:40 UTC

*** Bug 1569601 has been marked as a duplicate of this bug. ***

Comment 9 Andrej Krejcir 2018-06-06 14:48:53 UTC

The error: ERROR (periodic/1) [root] failed to retrieve Hosted Engine HA score ...
is logged when the ovirt-ha-broker service is not running, or is not ready to respond to requests. We could limit the frequency of these messages.

The traceback beginning with 'WARN Worker blocked' is interesting.
Is it somewhere in the provided logs? I didn't find it.

Comment 10 Petr Kubica 2018-06-08 06:47:56 UTC

For example in vdsm.log of lynx23 server in archive on 9924 line

Comment 11 Martin Sivák 2018-06-20 11:17:56 UTC

Petr, why is this an automation blocker? It just pollutes the log, no functionality is broken.

Comment 12 Petr Kubica 2018-06-20 14:53:59 UTC

It blocks the test about searching for tracebacks in journal log of vdsmd service (as there shouldn't be any tracebacks). Due to this bug we cannot find other exceptions which could be more important.

Comment 13 Martin Sivák 2018-06-26 15:54:09 UTC

Francesco, you mentioned that GSS asked for this traceback to stay visible iirc. Can we figure out how to make it work for us, QA and GSS?

One option might be showing WARNING summary and DEBUG traceback.

Comment 14 Andrej Krejcir 2018-06-27 09:01:40 UTC

The traceback was implemented in Bug 1501012.

Comment 15 Francesco Romani 2018-07-16 15:24:40 UTC

(In reply to Martin Sivák from comment #13)
> Francesco, you mentioned that GSS asked for this traceback to stay visible
> iirc. Can we figure out how to make it work for us, QA and GSS?
> 
> One option might be showing WARNING summary and DEBUG traceback.

We can work about that. I for myself have no preferences, we "just" need consensus about a solution.

Comment 17 Nikolai Sednev 2018-12-05 11:11:50 UTC

I see this error during addition of additional ha-host to the engine:
2018-12-05 13:05:14,974+0200 ERROR (periodic/0) [root] failed to retrieve Hosted Engine HA score '[Errno 2] No such fi
le or directory'Is the Hosted Engine setup finished? (api:196)
2018-12-05 13:05:15,428+0200 INFO  (hsm/init) [storage.HSM] FINISH HSM init succeeded in 1.37 seconds (hsm:410)
2018-12-05 13:05:15,718+0200 INFO  (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (
__init__:573)
2018-12-05 13:05:15,725+0200 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (
__init__:573)
2018-12-05 13:05:15,729+0200 INFO  (jsonrpc/4) [vdsm.api] START getStorageDomainInfo(sdUUID=u'35c9172d-c2ca-407f-b54a-
67a4fb24b757', options=None) from=::1,60576, task_id=4af8dbf5-a1f3-4522-a803-2ca16b51dc88 (api:46)
2018-12-05 13:05:15,834+0200 INFO  (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (
__init__:573)
2018-12-05 13:05:15,894+0200 INFO  (itmap/0) [IOProcessClient] (/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_nsedn
ev__he__1) Starting client (__init__:308)
2018-12-05 13:05:15,904+0200 INFO  (ioprocess/57753) [IOProcess] (/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_nse
dnev__he__1) Starting ioprocess (__init__:434)
2018-12-05 13:05:15,909+0200 INFO  (jsonrpc/4) [IOProcessClient] (35c9172d-c2ca-407f-b54a-67a4fb24b757) Starting clien
t (__init__:308)
2018-12-05 13:05:15,920+0200 INFO  (ioprocess/57760) [IOProcess] (35c9172d-c2ca-407f-b54a-67a4fb24b757) Starting ioprocess (__init__:434)
2018-12-05 13:05:15,926+0200 INFO  (jsonrpc/4) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:734)
2018-12-05 13:05:15,926+0200 INFO  (jsonrpc/4) [storage.StorageDomain] sdUUID=35c9172d-c2ca-407f-b54a-67a4fb24b757 (fileSD:534)
2018-12-05 13:05:15,934+0200 INFO  (jsonrpc/4) [vdsm.api] FINISH getStorageDomainInfo return={'info': {'uuid': u'35c9172d-c2ca-407f-b54a-67a4fb24b757', 'version': '4', 'role': 'Master', 'remotePath': 'yellow-vdsb.qa.lab.tlv.redhat.com:/Compute_NFS/nsednev_he_1', 'type': 'NFS', 'class': 'Data', 'pool': ['890ecd6b-6da7-4b1c-97d5-760239680b36'], 'name': 'hosted_storage'}} from=::1,60576, task_id=4af8dbf5-a1f3-4522-a803-2ca16b51dc88 (api:52)
2018-12-05 13:05:15,935+0200 INFO  (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call StorageDomain.getInfo succeeded in 0.20 seconds (__init__:573)
2018-12-05 13:05:15,943+0200 INFO  (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:573)
2018-12-05 13:05:15,947+0200 INFO  (jsonrpc/7) [vdsm.api] START getStorageDomainInfo(sdUUID=u'35c9172d-c2ca-407f-b54a-:

Steps for reproduction:
1.Deploy HE over NFS on one ha-host.
2.Add additional regular host, not ha-host.
3.Set additional regular host in to maintenance.
4.Reinstall additional the non-ha-host as hosted engine host.
5.Check vdsm.log on reinstalled as ha-host.

Comment 18 Ryan Barry 2018-12-05 15:20:36 UTC

That part actually doesn't look bad to me, Nikolai -- no traceback which will confuse automation.

Dominik - what's the timeout for blocking workers? It seems to me that simply setting an appropriate timeout in ovirt_hosted_engine_ha (which doesn't have one right now, or any exception handling at all that I can see) would stop blocking.

Milan -- I know you've never touched this, but it's Python ;) 

Likely candidates seem to be a lack of any connection timeouts in 
ovirt_hosted_engine_ha/lib/unixrpc.py

And:

No exception handling at all in:
ovirt_hosted_engine_ha/lib/brokerlink.py#L138

No need for an actual fix to this, since Martin indicated that the messages are harmless. Just trying to avoid blocking vdsm workers by throwing esxceptions around

Comment 19 Francesco Romani 2018-12-10 08:49:47 UTC

Answer here for documentation purpose a question received offline.

Vdsm keeps two thread pools[1] to handle possibly blocking operation. One is JSON-RPC workers, another for periodic monitoring operations.
These thread pools are smart, they can handle blocked workers and replace them, up to a configured high limit, so there always are available workers to perform tasks submitted to the pool.
Tasks submitted to the pools always have a timeout. When timeout expires and the task is still running, the executor flags the slow worker, and immediately creates a new idle one unless the worker limit is reached.
When eventually the slow worker completes the operation (if it does!), it checks for the "slow" flag. If set, the worker self-destructs. Otherwise, it put itself in the idle pool.


+++

[1] these are the ones I'm aware of :)

Comment 20 Dominik Holler 2018-12-10 08:52:39 UTC

(In reply to Ryan Barry from comment #18)
> That part actually doesn't look bad to me, Nikolai -- no traceback which
> will confuse automation.
> 
> Dominik - what's the timeout for blocking workers? It seems to me that
> simply setting an appropriate timeout in ovirt_hosted_engine_ha (which
> doesn't have one right now, or any exception handling at all that I can see)
> would stop blocking.
> 

Does comment 19 answers your question?

Comment 21 Ryan Barry 2018-12-10 11:19:29 UTC

Almost -- what's the timeout?

Comment 22 Francesco Romani 2018-12-10 13:00:34 UTC

(In reply to Ryan Barry from comment #21)
> Almost -- what's the timeout?

It depends. For the JSON-RPC tasks, it is tunable using the configuration value

[rpc]
worker_timeout

default is 60 seconds

For the virt periodic operations, the timeout is half of the periodic interval. These intervals are again tunable using the configuration file, the values are

irs.vol_size_sample_interval
vars.vm_sample_jobs_interval
vars.vm_watermark_interval
sampling.external_vm_lookup_interval
vars.vm_kill_paused_time
vars.vm_sample_interval

This one -host sampling- has timeout equal to the period. Furthermore, no replacements attempts are to be made should the operation block for any reason:

vars.host_sample_stats_interval

Comment 23 Ryan Barry 2018-12-10 13:35:43 UTC

Thanks Francesco.

Milan, let's try to just beat the timeout. Setting a socket timeout of 20 seconds should be more than enough for HE, and also hide ugly exceptions

Comment 24 Milan Zamazal 2019-01-10 15:14:39 UTC

Blocked calls cause the tracebacks even with oVirt gerrit 96618. I tried to add a timeout to get_stats() call but it didn't help either. Maybe I did something wrong or the timeout doesn't apply to Unix domain connections. I'm not sure it's worth further work.

BTW I think the related Vdsm worker timeout is 15 seconds in HostMonitor, as seen in comment 0.

Comment 25 Nikolai Sednev 2019-02-11 17:48:38 UTC

Consistently reproduced during addition of second host as ha-host, just like described in comment #17.
2019-02-11 19:46:02,357+0200 ERROR (periodic/0) [root] failed to retrieve Hosted Engine HA score '[Errno 2] No such fi
le or directory'Is the Hosted Engine setup finished? (api:191)
2019-02-11 19:46:02,500+0200 INFO  (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:38226
 (protocoldetector:61)
2019-02-11 19:46:02,521+0200 INFO  (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:38228
 (protocoldetector:61)
2019-02-11 19:46:02,523+0200 INFO  (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::1:38226
 (protocoldetector:125)
2019-02-11 19:46:02,523+0200 INFO  (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompserver:95)
2019-02-11 19:46:02,529+0200 INFO  (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompser
ver:124)
2019-02-11 19:46:02,542+0200 INFO  (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::1:38228
 (protocoldetector:125)
2019-02-11 19:46:02,542+0200 INFO  (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompserver:95)
2019-02-11 19:46:02,544+0200 INFO  (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompser
ver:124)
2019-02-11 19:46:02,570+0200 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] In recovery, ignoring 'Host.ping2' in bridge wi
th {} (__init__:324)
2019-02-11 19:46:02,570+0200 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.ping2 failed (error 99) in 0.00 s
econds (__init__:312)
2019-02-11 19:46:02,584+0200 INFO  (jsonrpc/1) [jsonrpc.JsonRpcServer] In recovery, ignoring 'Host.ping2' in bridge wi
th {} (__init__:324)
2019-02-11 19:46:02,584+0200 INFO  (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call Host.ping2 failed (error 99) in 0.00 s
econds (__init__:312)
2019-02-11 19:46:03,073+0200 INFO  (hsm/init) [storage.HSM] FINISH HSM init succeeded in 1.21 seconds (hsm:410)
2019-02-11 19:46:03,576+0200 INFO  (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (
__init__:312)
2019-02-11 19:46:03,597+0200 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (
__init__:312)
2019-02-11 19:46:03,604+0200 INFO  (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (
__init__:312)
2019-02-11 19:46:03,608+0200 INFO  (jsonrpc/5) [vdsm.api] START getStorageDomainInfo(sdUUID=u'da4b370f-2b0b-485a-853e-

Comment 26 Nikolai Sednev 2019-02-11 17:51:43 UTC

Created attachment 1530062 [details]
sosreport from alma04

Comment 27 Nikolai Sednev 2019-02-11 17:52:31 UTC

Tested on:
ovirt-hosted-engine-ha-2.3.1-1.el7ev.noarch
ovirt-hosted-engine-setup-2.3.3-1.el7ev.noarch
rhvm-appliance-4.3-20190201.0.el7.x86_64
Linux 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.6 (Maipo)

Comment 28 Steven Rosenberg 2019-05-13 17:05:15 UTC

I reviewed this issue and traced through the code. The vdsm calls into the ovirt-hosted-engine-ha project's BrokerLink object which attempts to send a get_stats request via the broker.socket file. When deploying the Hosted engine on the Host, if the ovirt-ha-agent and ovirt-ha-broker are running, they are brought down and the broker.socket file is deleted. At that moment, until the ovirt-ha-broker is restarted, the vdsm polling of the get_stats message will fail with the error: "failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?". Only when the ovirt-ha-broker is restarted is the broker.socket file recreated. Therefore the error is correct because from the time the broker.socket file is deleted when the ovirt-ha-broker service is terminated until the time it is restarted and creates the broker.socket file, the file does not exist in which case error 2 (errno.ENOENT) is correct. However after the ovirt-ha-broker service has started and created the file, the polling to get_stats in the vdsm may still fail with an error: "failed to retrieve Hosted Engine HA score" until the ovirt-ha-agent service is also launched so that it can listen to the broker connection requests.

One solution may be not to stop and restart the services so that the services are up and running and the broker.socket file will continue to exist, but there may be a historical reason for why the services are restarted.

Another option may be to detect if the services are running and to only poll for the get_stats when both services are running and the broker.socket file exists. This may require additional code affecting performance and may not be worth the effort.

Another solution may be to not treat these diagnostics as errors or remove them all together, but then when there is a problem there would not be proper diagnostics.

I might lean towards making them warnings and not errors, but more input from others would be welcome.

Comment 29 Simone Tiraboschi 2019-05-14 14:58:14 UTC

On my opinion the better and safer option is simply to declass the message from error to warning.

Comment 37 Nikolai Sednev 2019-06-23 14:39:57 UTC

I still see this error right on newly deployed hosted engine environment right on first ha-host:
2019-06-23 16:58:32,289+0300 ERROR (periodic/2) [root] failed to retrieve Hosted Engine HA score '[Errno 2] No such fi
le or directory'Is the Hosted Engine setup finished? (api:191)
2019-06-23 16:58:33,946+0300 INFO  (jsonrpc/2) [api.host] START getAllVmStats() from=::1,37484 (api:48)
2019-06-23 16:58:33,947+0300 INFO  (jsonrpc/2) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', '
code': 0}, 'statsList': (suppressed)} from=::1,37484 (api:54)
2019-06-23 16:58:33,947+0300 INFO  (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 s
econds (__init__:312)
2019-06-23 16:58:47,316+0300 INFO  (periodic/3) [vdsm.api] START repoStats(domains=()) from=internal, task_id=c89996fb
-f4b9-4e22-aca7-431b0d9c9834 (api:48)
2019-06-23 16:58:47,317+0300 INFO  (periodic/3) [vdsm.api] FINISH repoStats return={u'ae57631d-65cb-49c6-b143-f5561bd0
8da4': {'code': 0, 'actual': True, 'version': 5, 'acquired': True, 'delay': '0.000359139', 'lastCheck': '2.3', 'valid'
: True}} from=internal, task_id=c89996fb-f4b9-4e22-aca7-431b0d9c9834 (api:54)

Components:
ovirt-hosted-engine-ha-2.3.2-1.el7ev.noarch
ovirt-hosted-engine-setup-2.3.10-1.el7ev.noarch
Linux 3.10.0-989.el7.bz1721503.x86_64 #1 SMP Fri Jun 21 15:10:35 EDT 2019 x86_64 x86_64 x86_64 GNU/Linux

Moving back to assigned.

Comment 38 Steven Rosenberg 2019-06-23 16:17:03 UTC

This was a fix within the vdsm module and was not included in the last tag v4.40.0.

It should be tested in the next vdsm release.

Comment 39 Michal Skrivanek 2019-12-05 14:52:52 UTC

then why did you move it to post?

Comment 40 Michal Skrivanek 2019-12-05 15:02:16 UTC

we believe it's fixed, please retest

Comment 41 Michal Skrivanek 2019-12-05 15:07:14 UTC

already present in 4.4, just retest already

Comment 42 Nikolai Sednev 2020-03-03 12:10:17 UTC

Didn't seen reported error on any manually deployed RHEL based deployment.
Moving to verified.
Please reopen in case that you still getting the error.

Tested on:
NFS deployment on these components:
rhvm-appliance.x86_64 2:4.4-20200123.0.el8ev rhv-4.4.0                                               
sanlock-3.8.0-2.el8.x86_64
qemu-kvm-4.2.0-12.module+el8.2.0+5858+afd073bc.x86_64
vdsm-4.40.5-1.el8ev.x86_64
libvirt-client-6.0.0-7.module+el8.2.0+5869+c23fe68b.x86_64
ovirt-hosted-engine-setup-2.4.2-2.el8ev.noarch
ovirt-hosted-engine-ha-2.4.2-1.el8ev.noarch
Linux 4.18.0-183.el8.x86_64 #1 SMP Sun Feb 23 20:50:47 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux release 8.2 Beta (Ootpa)

Engine is Software Version:4.4.0-0.17.master.el7
Red Hat Enterprise Linux Server release 7.8 Beta (Maipo)
Linux 3.10.0-1123.el7.x86_64 #1 SMP Tue Jan 14 03:44:38 EST 2020 x86_64 x86_64 x86_64 GNU/Linux

Comment 44 Sandro Bonazzola 2020-05-20 19:59:57 UTC

This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020.

Since the problem described in this bug report should be
resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.