1525907 – VDSM is in recovery mode after a restart in fresh installation.

Bug 1525907 - VDSM is in recovery mode after a restart in fresh installation.

Summary: VDSM is in recovery mode after a restart in fresh installation.

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-hosted-engine-ha
Classification:	oVirt
Component:	General
Sub Component:
Version:	2.2.0
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	ovirt-4.2.1
Target Release:	---
Assignee:	Simone Tiraboschi
QA Contact:	Nikolai Sednev
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1524076 (view as bug list)
Depends On:
Blocks:	1378310 1511094
TreeView+	depends on / blocked

Reported:	2017-12-14 11:14 UTC by Nikolai Sednev
Modified:	2018-02-12 11:51 UTC (History)
CC List:	16 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-02-12 11:51:37 UTC
oVirt Team:	Integration
Embargoed:
Flags:	rule-engine: ovirt-4.2+ rule-engine: blocker+ rule-engine: planning_ack+ michal.skrivanek: devel_ack+ mavital: testing_ack+

Attachments	(Terms of Use)
logs from purple-vds1 (10.41 MB, application/x-xz) 2017-12-14 11:14 UTC, Nikolai Sednev	no flags	Details
logs from purple-vds2 (9.36 MB, application/x-xz) 2017-12-14 11:16 UTC, Nikolai Sednev	no flags	Details
logs from purple-vds1 after failed SHE deployment over FC. (9.33 MB, application/x-xz) 2017-12-18 13:36 UTC, Nikolai Sednev	no flags	Details
sosreport from purple vds1 (9.53 MB, application/x-xz) 2017-12-18 15:51 UTC, Nikolai Sednev	no flags	Details
logs from purple-vds2 (9.50 MB, application/x-xz) 2017-12-19 13:38 UTC, Nikolai Sednev	no flags	Details
Network on nsednev-he-2 (40.32 KB, image/png) 2017-12-19 13:46 UTC, Simone Tiraboschi	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
oVirt gerrit	85559	0	None	MERGED	Ignore error code 99 (recovering) when checking jsonrpc	2020-10-01 10:46:11 UTC

Description Nikolai Sednev 2017-12-14 11:14:39 UTC

Created attachment 1367909 [details]
logs from purple-vds1

Description of problem:
On two freshly clean hosts I'm getting failed SHE deployments with these details:
on first cleanly reprovisioned host purple-vds1 http://pastebin.test.redhat.com/540761, and here what happened on second cleanly reprovisioned host purple-vds2 http://pastebin.test.redhat.com/540764.

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-ha-2.2.1-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.1-1.el7ev.noarch
rhvm-appliance-4.2-20171207.0.el7.noarch

How reproducible:
In my case I have pair of hosts on which reproduction works 100%.

Steps to Reproduce:
1.Deploy SHE.

Actual results:
Deployment fails.

Expected results:
Deployment should succeed.

Additional info:
Logs from both hosts attached.

Comment 1 Nikolai Sednev 2017-12-14 11:16:49 UTC

Created attachment 1367910 [details]
logs from purple-vds2

Comment 2 Dan Kenigsberg 2017-12-14 13:35:52 UTC

Nikolai, when filing vdsm bug, you must specify vdsm version. According to one of the logs it is vdsm-4.20.9.1-1.el7ev 

could it be that a failure to initialize storage is delaying vdsm startup?

MainProcess|hsm/init::ERROR::2017-12-13 18:09:34,699::supervdsm_server::98::SuperVdsm.ServerCallback::(wrapper) Error in resizeMap
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/supervdsm_server.py", line 96, in wrapper
    res = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/supervdsm_server.py", line 146, in resizeMap
    return multipath._resize_map(devName)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/multipath.py", line 140, in _resize_map
    % (name, out, err))
Error: Resizing map 'dm-10' failed: out='fail\n' err=''

Comment 3 Nikolai Sednev 2017-12-14 14:09:55 UTC

(In reply to Dan Kenigsberg from comment #2)
> Nikolai, when filing vdsm bug, you must specify vdsm version. According to
> one of the logs it is vdsm-4.20.9.1-1.el7ev 
> 
> could it be that a failure to initialize storage is delaying vdsm startup?
> 
> MainProcess|hsm/init::ERROR::2017-12-13
> 18:09:34,699::supervdsm_server::98::SuperVdsm.ServerCallback::(wrapper)
> Error in resizeMap
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/supervdsm_server.py", line 96,
> in wrapper
>     res = func(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/supervdsm_server.py", line
> 146, in resizeMap
>     return multipath._resize_map(devName)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/multipath.py", line
> 140, in _resize_map
>     % (name, out, err))
> Error: Resizing map 'dm-10' failed: out='fail\n' err=''

vdsm-4.20.9-37.gitb42ea79.el7.centos.x86_64

Comment 4 Adam Litke 2017-12-14 14:24:24 UTC

This is not part of a typical vdsm flow.  From multipath.py in the storage code:

 84 def resize_devices():                                                       
 85     """                                                                     
 86     This is needed in case a device has been increased on the storage server
 87     Resize multipath map if the underlying slaves are bigger than           
 88     the map size.                                                           
 89     The slaves can be bigger if the LUN size has been increased on the storage                                                                          
 90     server after the initial discovery.                                     
 91     """

Did you make some changes to the storage environment while/during your setup of SHE?

Comment 5 Francesco Romani 2017-12-15 09:42:23 UTC

Attached a patch to mistake to this bug, doesn't seem a virt issue.

On purple1, I see plenty of storage errors, maybe relevant maybe not (like Dan already mentioned). On purple2, I still much less storage errors, but still some of them, and the HostedEngine VM succesfully started:

2017-12-13 19:08:17,950+0200 INFO  (vm/37cec957) [root] /usr/libexec/vdsm/hooks/before_vm_start/50_vfio_mdev: rc=0 err= (hooks:110)
2017-12-13 19:08:18,121+0200 INFO  (vm/37cec957) [root] /usr/libexec/vdsm/hooks/before_vm_start/50_vhostmd: rc=0 err= (hooks:110)
2017-12-13 19:08:18,121+0200 INFO  (vm/37cec957) [virt.vm] (vmId='37cec957-13ab-468c-8b40-3a86f7649f87') <?xml version="1.0" encoding="utf-8"?><domain type="kvm" xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0">
    <name>HostedEngine</name>
    <uuid>37cec957-13ab-468c-8b40-3a86f7649f87</uuid>
    <memory>9118720</memory>

2017-12-13 19:08:21,361+0200 INFO  (vm/37cec957) [virt.vm] (vmId='37cec957-13ab-468c-8b40-3a86f7649f87') CPU running: domain initialization (vm:5888)

I see no evidence of virt bug, and I advise to reassign this bug for further analysis, I cannot do anything else on this bug for the time being

Comment 6 Simone Tiraboschi 2017-12-15 14:03:41 UTC

Let's on purple-vds1 since the issues are probably different here.

The issue is for sure caused by:
2017-12-13 17:47:15,562+0200 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] In recovery, ignoring 'Host.ping2' in bridge with {} (__init__:585)
2017-12-13 17:47:15,562+0200 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.ping2 failed (error 99) in 0.00 seconds (__init__:573)

as described in:
https://bugzilla.redhat.com/show_bug.cgi?id=1512534#c41

but the point here is why vdsm discarded that ping2 message.

According to vdsm logs (from purple-vds1),
it started on

 2017-12-13 17:47:13,954+0200 INFO  (MainThread) [vds] (PID: 17560) I am the actual vdsm 4.20.9.1-1.el7ev purple-vds1.qa.lab.tlv.redhat.com (3.10.0-693.11.1.el7.x86_64) (vdsmd:148)

and completed the recovery mode at
 2017-12-13 17:47:14,703+0200 INFO  (vmrecovery) [vds] recovery: completed in 0s (clientIF:569)

but then it was still discarding the ping2 at 17:47:15,562+0200 due to the recovery mode although it completed it at 17:47:14,703+0200 and this on my opinion is a bug by itself.

Comment 7 Nikolai Sednev 2017-12-16 23:12:29 UTC

(In reply to Adam Litke from comment #4)
> This is not part of a typical vdsm flow.  From multipath.py in the storage
> code:
> 
>  84 def resize_devices():                                                   
> 
>  85     """                                                                 
> 
>  86     This is needed in case a device has been increased on the storage
> server
>  87     Resize multipath map if the underlying slaves are bigger than       
> 
>  88     the map size.                                                       
> 
>  89     The slaves can be bigger if the LUN size has been increased on the
> storage                                                                     
> 
>  90     server after the initial discovery.                                 
> 
>  91     """
> 
> Did you make some changes to the storage environment while/during your setup
> of SHE?

No, I did not.

Comment 8 Dan Kenigsberg 2017-12-18 07:20:13 UTC

(In reply to Simone Tiraboschi from comment #6)
> and completed the recovery mode at
>  2017-12-13 17:47:14,703+0200 INFO  (vmrecovery) [vds] recovery: completed
> in 0s (clientIF:569)
> 
> but then it was still discarding the ping2 at 17:47:15,562+0200 due to the
> recovery mode although it completed it at 17:47:14,703+0200 and this on my
> opinion is a bug by itself.

VM recovery has finished, but not storage initialization. That's why I've called in Adam 4 days ago.

Comment 9 Doron Fediuck 2017-12-18 08:11:07 UTC

How systematic is this issue?

Comment 10 Nikolai Sednev 2017-12-18 11:03:57 UTC

My bad, I've provided you with vdsm-4.20.9-37.gitb42ea79.el7.centos.x86_64 by mistake. The right one was vdsm-4.20.9.1-1.el7ev.
(In reply to Doron Fediuck from comment #9)
> How systematic is this issue?

It happened on specific HW all the time. Both purple-vds1 and purple-vds2 suffers from this issue.

Comment 11 Allon Mureinik 2017-12-18 11:05:18 UTC

(In reply to Nikolai Sednev from comment #10)
> My bad, I've provided you with vdsm-4.20.9-37.gitb42ea79.el7.centos.x86_64
> by mistake. The right one was vdsm-4.20.9.1-1.el7ev.
> (In reply to Doron Fediuck from comment #9)
> > How systematic is this issue?
> 
> It happened on specific HW all the time. Both purple-vds1 and purple-vds2
> suffers from this issue.

Please try to reproduce on a different pair of hosts (even nested), and report back the results here.
Thanks!

Comment 12 Nikolai Sednev 2017-12-18 11:43:46 UTC

(In reply to Allon Mureinik from comment #11)
> (In reply to Nikolai Sednev from comment #10)
> > My bad, I've provided you with vdsm-4.20.9-37.gitb42ea79.el7.centos.x86_64
> > by mistake. The right one was vdsm-4.20.9.1-1.el7ev.
> > (In reply to Doron Fediuck from comment #9)
> > > How systematic is this issue?
> > 
> > It happened on specific HW all the time. Both purple-vds1 and purple-vds2
> > suffers from this issue.
> 
> Please try to reproduce on a different pair of hosts (even nested), and
> report back the results here.
> Thanks!

On different hosts it was working and I already have reported that here:
https://bugzilla.redhat.com/show_bug.cgi?id=1511094#c9

The thing is, it really depends on HW, on some machines its working flawlessly, on other is not.

Comment 13 Nikolai Sednev 2017-12-18 12:30:51 UTC

I've just verified on fast hosts (almas) and deployment over NFS was fine.
ovirt-hosted-engine-ha-2.2.1-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.1-1.el7ev.noarch
qemu-kvm-rhev-2.9.0-16.el7_4.13.x86_64
sanlock-3.5.0-1.el7.x86_64
libvirt-client-3.2.0-14.el7_4.5.x86_64
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
mom-0.5.11-1.el7ev.noarch
vdsm-4.20.9.2-1.el7ev.x86_64
ovirt-host-deploy-1.7.0-1.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch

Slow hosts retesting is still in progress.

Comment 14 Nikolai Sednev 2017-12-18 13:34:11 UTC

(In reply to Nikolai Sednev from comment #13)
> I've just verified on fast hosts (almas) and deployment over NFS was fine.
> ovirt-hosted-engine-ha-2.2.1-1.el7ev.noarch
> ovirt-hosted-engine-setup-2.2.1-1.el7ev.noarch
> qemu-kvm-rhev-2.9.0-16.el7_4.13.x86_64
> sanlock-3.5.0-1.el7.x86_64
> libvirt-client-3.2.0-14.el7_4.5.x86_64
> ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
> mom-0.5.11-1.el7ev.noarch
> vdsm-4.20.9.2-1.el7ev.x86_64
> ovirt-host-deploy-1.7.0-1.el7ev.noarch
> ovirt-vmconsole-1.0.4-1.el7ev.noarch
> 
> Slow hosts retesting is still in progress.

Deployment on slow host over FC has failed:
 [ ERROR ] Error creating Volume Group: Command LVMVolumeGroup.create with args {'devlist': ['3514f0c5a51601092'], 'force': True, 'name': 'f3f15c92-9d84-4c5e-a65c-fb5212181069'} failed:
         (code=601, message=Failed to initialize physical device: ("['/dev/mapper/3514f0c5a51601092']",))
[ ERROR ] Failed to execute stage 'Misc configuration': Command LVMVolumeGroup.create with args {'devlist': ['3514f0c5a51601092'], 'force': True, 'name': 'f3f15c92-9d84-4c5e-a65c-fb5212181069'} failed:
         (code=601, message=Failed to initialize physical device: ("['/dev/mapper/3514f0c5a51601092']",))

It looks like different now.
Sosreport is attached.

Comment 15 Nikolai Sednev 2017-12-18 13:36:58 UTC

Created attachment 1369506 [details]
logs from purple-vds1 after failed SHE deployment over FC.

Comment 16 Nikolai Sednev 2017-12-18 13:37:56 UTC

Adding components from host here:
ovirt-imageio-daemon-1.2.0-0.el7ev.noarch
ovirt-hosted-engine-ha-2.2.1-1.el7ev.noarch
ovirt-imageio-common-1.2.0-0.el7ev.noarch
mom-0.5.11-1.el7ev.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-host-deploy-1.7.0-1.el7ev.noarch
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.1-1.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
ovirt-setup-lib-1.1.4-1.el7ev.noarch
qemu-kvm-rhev-2.9.0-16.el7_4.13.x86_64
vdsm-4.20.9.2-1.el7ev.x86_64
ovirt-host-4.2.0-1.el7ev.x86_64
ovirt-provider-ovn-driver-1.2.2-1.el7ev.noarch
sanlock-3.5.0-1.el7.x86_64
ovirt-host-dependencies-4.2.0-1.el7ev.x86_64
libvirt-client-3.2.0-14.el7_4.5.x86_64
Linux version 3.10.0-693.15.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Thu Dec 14 05:13:32 EST 2017
Linux purple-vds1.qa.lab.tlv.redhat.com 3.10.0-693.15.1.el7.x86_64 #1 SMP Thu Dec 14 05:13:32 EST 2017 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.4 (Maipo)

Comment 17 Nikolai Sednev 2017-12-18 13:44:45 UTC

Adding more details from failed FC deployment:
http://pastebin.test.redhat.com/541628

Comment 18 Simone Tiraboschi 2017-12-18 13:48:50 UTC

In vdsm logs we see:

2017-12-18 15:27:21,752+0200 INFO  (jsonrpc/0) [vdsm.api] START createVG(vgname='f3f15c92-9d84-4c5e-a65c-fb5212181069', devlist=['3514f0c5a51601092'], force=True, options=None) from=::1,49016, task_id=5b5cab39-060b-4a1d-9b24-911fbf741264 (api:46)
2017-12-18 15:27:21,890+0200 ERROR (jsonrpc/0) [storage.LVM] pvcreate failed with rc=5 (lvm:754)
2017-12-18 15:27:21,890+0200 ERROR (jsonrpc/0) [storage.LVM] [], ['  Device /dev/mapper/3514f0c5a51601092 not found (or ignored by filtering).'] (lvm:755)
2017-12-18 15:27:21,891+0200 INFO  (jsonrpc/0) [vdsm.api] FINISH createVG error=Failed to initialize physical device: ("['/dev/mapper/3514f0c5a51601092']",) from=::1,49016, task_id=5b5cab39-060b-4a1d-9b24-911fbf741264 (api:50)
2017-12-18 15:27:21,891+0200 ERROR (jsonrpc/0) [storage.TaskManager.Task] (Task='5b5cab39-060b-4a1d-9b24-911fbf741264') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "<string>", line 2, in createVG
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2129, in createVG
    force=force)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 983, in createVG
    _initpvs(pvs, metadataSize, force)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 756, in _initpvs
    raise se.PhysDevInitializationError(str(devices))
PhysDevInitializationError: Failed to initialize physical device: ("['/dev/mapper/3514f0c5a51601092']",)
2017-12-18 15:27:21,951+0200 INFO  (jsonrpc/0) [storage.TaskManager.Task] (Task='5b5cab39-060b-4a1d-9b24-911fbf741264') aborting: Task is aborted: 'Failed to initialize physical device: ("[\'/dev/mapper/3514f0c5a51601092\']",)' - code 601 (task:1181)
2017-12-18 15:27:21,951+0200 ERROR (jsonrpc/0) [storage.Dispatcher] FINISH createVG error=Failed to initialize physical device: ("['/dev/mapper/3514f0c5a51601092']",) (dispatcher:82)


The device seams fine:
[root@purple-vds1 ~]# multipath -ll
3514f0c5a51601092 dm-1 XtremIO ,XtremApp        
size=60G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
  |- 0:0:0:1 sda 8:0  active ready running
  `- 0:0:1:1 sdb 8:16 active ready running

Comment 19 Nikolai Sednev 2017-12-18 14:06:50 UTC

I've opened a separate bug https://bugzilla.redhat.com/show_bug.cgi?id=1527077 forth to comment 14-18. The original issue had not been reproduced at the moment. I'll try to reproduce again on another host and if not reproduced, will update in the bug.

Comment 20 Fred Rolland 2017-12-18 14:14:02 UTC

(In reply to Simone Tiraboschi from comment #18)
This could be caused due to a "dirty" LUN.
Check here how to solve this:

https://bugzilla.redhat.com/show_bug.cgi?id=1343043

For now, we could not find any storage related issues in this bug.
Moving to Hosted Engine

Comment 21 Simone Tiraboschi 2017-12-18 14:23:37 UTC

For sure now hosted-engine-setup correctly managed to talk with vdsm as a side effect of https://gerrit.ovirt.org/#/c/85416/ for https://bugzilla.redhat.com/1512534

No other patches are needed on my opinion.
Moving to QE for repeating the tests.

Comment 22 Nikolai Sednev 2017-12-18 15:36:12 UTC

Deployment still failed with:
[ INFO  ] Engine is still unreachable
[ INFO  ] Engine is still not reachable, waiting...
[ INFO  ] Engine is still unreachable
[ INFO  ] Engine is still not reachable, waiting...
[ INFO  ] Engine is still unreachable
[ ERROR ] Engine is still not reachable
[ ERROR ] Failed to execute stage 'Closing up': Engine is still not reachable
[ INFO  ] Stage: Clean up
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20171218173244.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: this system is not reliable, please check the issue,fix and redeploy
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20171218170411-tj2a1c.log

Comment 23 Simone Tiraboschi 2017-12-18 15:38:56 UTC

Moving back to assigned as for a Nir hint:
we have also to explicitly check for a successfully pong.

Comment 24 Nikolai Sednev 2017-12-18 15:51:48 UTC

Created attachment 1369564 [details]
sosreport from purple vds1

Comment 25 Martin Sivák 2017-12-18 16:58:35 UTC

And we would, but the jsonrpc client gets stuck anyway...

Comment 26 Nikolai Sednev 2017-12-19 13:36:57 UTC

I see the same issue happening on FC deployment, error looks the same to me:
[ INFO  ] Engine is still not reachable, waiting...
[ INFO  ] Engine is still unreachable
[ INFO  ] Engine is still not reachable, waiting...
[ INFO  ] Engine is still unreachable
[ INFO  ] Engine is still not reachable, waiting...
[ INFO  ] Engine is still unreachable
[ INFO  ] Engine is still not reachable, waiting...
[ INFO  ] Engine is still unreachable
[ ERROR ] Engine is still not reachable
[ ERROR ] Failed to execute stage 'Closing up': Engine is still not reachable
[ INFO  ] Stage: Clean up
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20171219135626.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: this system is not reliable, please check the issue,fix and redeploy
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20171219124044-eizanb.log

logs from purple-vds2 attached.

Comment 27 Nikolai Sednev 2017-12-19 13:38:03 UTC

Created attachment 1370034 [details]
logs from purple-vds2

Comment 28 Nikolai Sednev 2017-12-19 13:39:44 UTC

Deployed on RHEL7.4 purple-vds2 over FC with:
rhvm-appliance-4.2-20171207.0.el7.noarch
ovirt-provider-ovn-driver-1.2.2-1.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.2-1.el7ev.noarch
ovirt-host-deploy-1.7.0-1.el7ev.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-imageio-common-1.2.0-0.el7ev.noarch
qemu-kvm-rhev-2.9.0-16.el7_4.13.x86_64
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
vdsm-4.20.9.3-1.el7ev.x86_64
ovirt-hosted-engine-setup-2.2.2-1.el7ev.noarch
mom-0.5.11-1.el7ev.noarch
libvirt-client-3.2.0-14.el7_4.5.x86_64
ovirt-imageio-daemon-1.2.0-0.el7ev.noarch
ovirt-host-dependencies-4.2.0-1.el7ev.x86_64
ovirt-host-4.2.0-1.el7ev.x86_64
sanlock-3.5.0-1.el7.x86_64
ovirt-setup-lib-1.1.4-1.el7ev.noarch
Linux version 3.10.0-693.15.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Thu Dec 14 05:13:32 EST 2017
Linux 3.10.0-693.15.1.el7.x86_64 #1 SMP Thu Dec 14 05:13:32 EST 2017 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.4 (Maipo)

Comment 29 Simone Tiraboschi 2017-12-19 13:45:43 UTC

(In reply to Nikolai Sednev from comment #26)
> I see the same issue happening on FC deployment, error looks the same to me:
> [ INFO  ] Engine is still not reachable, waiting...

The engine VM configured to rely on DHCP failed to get an IP address, see attached screenshot.
Please ensure that the network is properly working.

Comment 30 Simone Tiraboschi 2017-12-19 13:46:57 UTC

Created attachment 1370039 [details]
Network on nsednev-he-2

Comment 31 Simone Tiraboschi 2017-12-19 17:05:38 UTC

*** Bug 1524076 has been marked as a duplicate of this bug. ***

Comment 32 Nikolai Sednev 2017-12-19 17:41:15 UTC

(In reply to Simone Tiraboschi from comment #30)
> Created attachment 1370039 [details]
> Network on nsednev-he-2

Network issue fixed. FC deployment passed just fine on these components:
rhvm-appliance-4.2-20171207.0.el7.noarch
ovirt-imageio-daemon-1.2.0-0.el7ev.noarch
ovirt-imageio-common-1.2.0-0.el7ev.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
qemu-kvm-rhev-2.9.0-16.el7_4.13.x86_64
ovirt-setup-lib-1.1.4-1.el7ev.noarch
ovirt-host-4.2.0-1.el7ev.x86_64
ovirt-vmconsole-1.0.4-1.el7ev.noarch
mom-0.5.11-1.el7ev.noarch
vdsm-4.20.9.3-1.el7ev.x86_64
ovirt-host-dependencies-4.2.0-1.el7ev.x86_64
ovirt-hosted-engine-setup-2.2.2-1.el7ev.noarch
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
sanlock-3.5.0-1.el7.x86_64
ovirt-host-deploy-1.7.0-1.el7ev.noarch
ovirt-provider-ovn-driver-1.2.2-1.el7ev.noarch
libvirt-client-3.2.0-14.el7_4.5.x86_64
ovirt-hosted-engine-ha-2.2.2-1.el7ev.noarch
Linux version 3.10.0-693.15.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Thu Dec 14 05:13:32 EST 2017
Linux 3.10.0-693.15.1.el7.x86_64 #1 SMP Thu Dec 14 05:13:32 EST 2017 x86_64 x86_64 x86_64 GNU/Linux

I did not redproduced initial errors during the deployment.

Comment 33 Nikolai Sednev 2018-01-29 10:11:31 UTC

Worked just fine using ansible deployment with these components:
ovirt-hosted-engine-ha-2.2.5-0.0.master.20180117160214.20180117160210.gitd5b58a8.el7.centos.noarch
ovirt-hosted-engine-setup-2.2.8-0.0.master.20180123120416.gitfa69a78.el7.centos.noarch
ovirt-engine-appliance-4.2-20180121.1.el7.centos.noarch
Linux 3.10.0-693.17.1.el7.x86_64 #1 SMP Sun Jan 14 10:36:03 EST 2018 x86_64 x86_64 x86_64 GNU/Linux

Moving to verified.

Comment 34 Sandro Bonazzola 2018-02-12 11:51:37 UTC

This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.