Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1546017

Summary:	Hosted Engine Installation fails via cockpit due to sanlock calling watchdog and restarting VDSM.
Product:	Red Hat Enterprise Virtualization Manager	Reporter:	Ribu Tho <rabraham>
Component:	cockpit-ovirt	Assignee:	Phillip Bailey <phbailey>
Status:	CLOSED WONTFIX	QA Contact:	Yihui Zhao <yzhao>
Severity:	high	Docs Contact:
Priority:	high
Version:	4.1.9	CC:	adevolder, cshao, dfediuck, jupittma, lsurette, phbailey, rabraham, rbalakri, Rhev-m-bugs, srevivo, stirabos, ycui, ykaul, ylavi
Target Milestone:	---	Keywords:	Unconfirmed
Target Release:	---	Flags:	lsvaty: testing_plan_complete-
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-04-06 08:37:11 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Integration	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Ribu Tho 2018-02-16 05:02:44 UTC

Description of problem:

Installation of the hosted engine via Cockpit fails with monitor process being blocked and resulting in sanlock calling watchdog and restarting the VDSMD service on the host terminating the install process. 

Version-Release number of selected component (if applicable):

RHVH-4.2-20171218.0-RHVH-x86_64-dvd1.iso
vdsm-4.19.45-1.el7ev.x86_64
rhvm-appliance-4.1.20180125.0-1.el7.noarch

How reproducible:



Steps to Reproduce:

1. Install RHV on a host.

2. Install Hosted Engine via cockpit
 
3. Enter all info for storage/config etc and proceed with the installation. 
 

Actual results:

- Stops on the step from GUI .

- VDSM reports errors related to creation of hosted engine disk creation

=-----------------------------------------------------
2018-02-16 15:54:06,888+1100 WARN  (check/loop) [storage.check] Checker u'/rhev/data-center/mnt/_var_lib_ovirt-hosted-engine-setup_tmp2gg__IU/d42ad173-214a-4135-9195-763f585ddb30/dom_md/metadata' is blocked for 50.00 seconds (check:279)
=-----------------------------------------------------

- sanlock indicating -202 timeout errors and calling watchdog to restart vdsm service

=----------------------------------------------------------
2018-02-16 15:53:40 3640 [28014]: d42ad173 aio timeout RD 0x7fccbc0008c0:0x7fccbc0008d0:0x7fcccf912000 ioto 10 to_count 1
2018-02-16 15:53:40 3640 [28014]: s3 delta_renew read timeout 10 sec offset 0 /rhev/data-center/mnt/_var_lib_ovirt-hosted-engine-setup_tmp2gg__IU/d42ad173-214a-4135-9195-763f585ddb30/dom_md/ids
2018-02-16 15:53:40 3640 [28014]: s3 renewal error -202 delta_length 10 last_success 3610
2018-02-16 15:54:00 3661 [28014]: d42ad173 aio timeout RD 0x7fccbc000910:0x7fccbc000920:0x7fcccc3fd000 ioto 10 to_count 2
2018-02-16 15:54:00 3661 [28014]: s3 delta_renew read timeout 10 sec offset 0 /rhev/data-center/mnt/_var_lib_ovirt-hosted-engine-setup_tmp2gg__IU/d42ad173-214a-4135-9195-763f585ddb30/dom_md/ids
2018-02-16 15:54:00 3661 [28014]: s3 renewal error -202 delta_length 20 last_success 3610
=----------------------------------------------------------


Expected results:

Hosted Engine to continue installation and complete successfully.

Additional info:

Comment 6 Yaniv Lavi 2018-02-19 08:26:59 UTC

Can you try to reproduce with the RHV 4.2 Beta 2? I think this is already resolved.

Comment 7 Ribu Tho 2018-02-20 01:44:51 UTC

Yaniv,

The same issue is seen again with the latest RHV 4.2 beta for this. Below are the logs for the same and I tried using NFS storage 

=-----------------------------------------------------------------
2018-02-20 12:40:41,328+1100 WARN  (check/loop) [storage.check] Checker u'/rhev/data-center/mnt/_var_lib_ovirt-hosted-engine-setup_tmpdWJSkK/bd5f10ec-cb5c-4cfe-8816-4b7cae635be1/dom_md/metadata' is blocked for 70.01 seconds (check:278)
2018-02-20 12:40:43,197+1100 INFO  (jsonrpc/0) [api.host] START getAllVmStats() from=::1,58430 (api:46)
2018-02-20 12:40:43,198+1100 INFO  (jsonrpc/0) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::1,58430 (api:52)
2018-02-20 12:40:43,199+1100 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:573)
2018-02-20 12:40:44,915+1100 INFO  (MainThread) [vds] Received signal 15, shutting down (vdsmd:67)
2018-02-20 12:40:44,916+1100 INFO  (MainThread) [jsonrpc.JsonRpcServer] Stopping JsonRPC Server (__init__:703)
2018-02-20 12:40:44,921+1100 INFO  (MainThread) [vds] Stopping http server (http:79)
2018-02-20 12:40:44,922+1100 INFO  (http) [vds] Server stopped (http:69)
2018-02-20 12:40:44,922+1100 INFO  (MainThread) [root] Unregistering all secrets (secret:91)
2018-02-20 12:40:44,923+1100 INFO  (MainThread) [vds] Stopping QEMU-GA poller (qemuguestagent:119)
2018-02-20 12:40:44,924+1100 INFO  (MainThread) [vdsm.api] START prepareForShutdown(options=None) from=internal, task_id=48ca73c6-2a83-4130-9d8f-af092d81a823 (api:46)
2018-02-20 12:40:46,568+1100 INFO  (MainThread) [vds] Received signal 15, shutting down (vdsmd:67)
=--------------------------------------------------------------------------

Also when I tried using ISCSI storage during installation I received the error as follows

=---------------------------------------------------------------------------------------------------
2018-02-20 11:48:37,940+1100 DEBUG otopi.plugins.gr_he_setup.storage.blockd blockd._customization:720 exception
Traceback (most recent call last):
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-setup/storage/blockd.py", line 716, in _customization
    lunGUID
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-setup/storage/blockd.py", line 517, in _validate_domain
    _('The requested device is not listed by VDSM')
RuntimeError: The requested device is not listed by VDSM
2018-02-20 11:48:37,943+1100 ERROR otopi.plugins.gr_he_setup.storage.blockd blockd._customization:721 The requested device is not listed by VDSM
2018-02-20 11:48:37,943+1100 DEBUG otopi.context context._executeMethod:143 method exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 133, in _executeMethod
    method['method']()
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-setup/storage/blockd.py", line 723, in _customization
    raise RuntimeError(_('Cannot access LUN'))
RuntimeError: Cannot access LUN
2018-02-20 11:48:37,945+1100 ERROR otopi.context context._executeMethod:152 Failed to execute stage 'Environment customization': Cannot access LUN
=------------------------------------------------------------------------------------------------------

I am not too sure for the above error being received as the storage is accessible and I am able to login to the target successfully from the host. 


Ribu

Comment 8 Sandro Bonazzola 2018-02-26 08:27:19 UTC

Simone please check if this is a vdsm storage issue or a he issue. If vdsm, please move to vdsm storage team

Comment 10 Simone Tiraboschi 2018-02-26 09:58:03 UTC

At the time of the issue (around 2018-02-16 15:54 ) the system seamed to be really slow; according to the log it took 4'30" to locally extract a 1 GB tar.gz file (with a less than 3 GB inside ) and about 10 seconds just to run '/bin/qemu-img info --output json' on it while normally it's by far faster:

 2018-02-16 15:52:09 INFO otopi.plugins.gr_he_common.vm.boot_disk boot_disk.prepare:216 Extracting disk image from OVF archive (could take a few minutes depending on archive size)
 2018-02-16 15:56:32 INFO otopi.plugins.gr_he_common.vm.boot_disk boot_disk._validate_volume:91 Validating pre-allocated volume size
 2018-02-16 15:56:32 DEBUG otopi.plugins.gr_he_common.vm.boot_disk plugin.executeRaw:813 execute: ('/bin/sudo', '-u', 'vdsm', '-g', 'kvm', '/bin/qemu-img', 'info', '--output', 'json', '/var/tmp/tmpl1ssJY'), executable='None', cwd='None', env=None
 2018-02-16 15:56:42 DEBUG otopi.plugins.gr_he_common.vm.boot_disk plugin.executeRaw:863 execute-result: ('/bin/sudo', '-u', 'vdsm', '-g', 'kvm', '/bin/qemu-img', 'info', '--output', 'json', '/var/tmp/tmpl1ssJY'), rc=0

I suspect that the sanlock timeout could be due to that.

Also using up to 100% of the available memory on the host for the engine VM appears really risky:
 2018-02-16 15:49:28 DEBUG otopi.plugins.gr_he_setup.vm.memory dialog.queryEnvKey:90 queryEnvKey called for key OVEHOSTED_VM/vmMemSizeMB
 2018-02-16 15:49:28 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND       **%QStart: ovehosted_vmenv_mem
 2018-02-16 15:49:28 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND       ### Please specify the memory size of the VM in MB (Defaults to maximum available): [4993]: 
 2018-02-16 15:49:28 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND       **%QDefault: 4993
 2018-02-16 15:49:28 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND       **%QHidden: FALSE
 2018-02-16 15:49:28 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND       ***Q:STRING ovehosted_vmenv_mem
 2018-02-16 15:49:28 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND       **%QEnd: ovehosted_vmenv_mem
 2018-02-16 15:49:29 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE    4993
 2018-02-16 15:49:29 DEBUG otopi.context context.dumpEnvironment:760 ENVIRONMENT DUMP - BEGIN
 2018-02-16 15:49:29 DEBUG otopi.context context.dumpEnvironment:770 ENV OVEHOSTED_VM/vmMemSizeMB=int:'4993'

Ribu, could you please try reproducing on a system with more resources?

Comment 11 Simone Tiraboschi 2018-02-26 10:14:42 UTC

(In reply to Ribu Tho from comment #7)
> I am not too sure for the above error being received as the storage is
> accessible and I am able to login to the target successfully from the host. 

This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1542426
See https://bugzilla.redhat.com/show_bug.cgi?id=1539598#c17

Comment 12 Justin Pittman 2018-03-02 15:36:11 UTC

(In reply to Simone Tiraboschi from comment #10)
> At the time of the issue (around 2018-02-16 15:54 ) the system seamed to be
> really slow; according to the log it took 4'30" to locally extract a 1 GB
> tar.gz file (with a less than 3 GB inside ) and about 10 seconds just to run
> '/bin/qemu-img info --output json' on it while normally it's by far faster:

Simone,

I suspected hardware latency as well but sanlock timeout occured on my test systems running much faster than the one reported above.  For 1.6G qemu info of the RVH Appliance image, time was <<1s.

time sudo -u vdsm -g kvm qemu-img info --output json /usr/share/ovirt-engine-appliance/rhvm-appliance-4.1.20180125.0-1.el7.ova 
{
    "virtual-size": 1670002176,
    "filename": "/usr/share/ovirt-engine-appliance/rhvm-appliance-4.1.20180125.0-1.el7.ova",
    "format": "raw",
    "actual-size": 1670008832,
    "dirty-flag": false
}

real    0m0.069s
user    0m0.057s
sys     0m0.012s

> Ribu, could you please try reproducing on a system with more resources?

Ribu tried NFS (I suppose remote storage backend?) and iSCSI path; I've tested GlusterFS and NFS (locally backed storage) as well.  sanlock is occuring in each of these scenarios for different test systems (I'm not sure of Ribu's test system).  The RH doc says HE min req's are 4GiB memory over 1GiB Ether.  Perhaps that isn't enough ...

Comment 13 Justin Pittman 2018-03-02 15:41:44 UTC

(In reply to Simone Tiraboschi from comment #11)
> (In reply to Ribu Tho from comment #7)
> > I am not too sure for the above error being received as the storage is
> > accessible and I am able to login to the target successfully from the host. 
> 
> This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1542426
> See https://bugzilla.redhat.com/show_bug.cgi?id=1539598#c17

Those bugs are only about iSCSI attached storage, so how are the related to storage bexposed by NFS and Gluster for the other 2 tests?  

I do see this in the supervdsm.log:

MainThread::DEBUG::2018-03-01 11:15:54,985::__init__::47::blivet::(register_device_format) registered device format class EFIVarFS as efivarfs
MainThread::DEBUG::2018-03-01 11:15:55,102::storage_log::69::blivet::(log_exception_info) IGNORED:        Caught exception, continuing.
MainThread::DEBUG::2018-03-01 11:15:55,102::storage_log::72::blivet::(log_exception_info) IGNORED:        Problem description: failed to get initiator name from iscsi firmware
MainThread::DEBUG::2018-03-01 11:15:55,103::storage_log::73::blivet::(log_exception_info) IGNORED:        Begin exception details.
MainThread::DEBUG::2018-03-01 11:15:55,103::storage_log::76::blivet::(log_exception_info) IGNORED:            Traceback (most recent call last):
MainThread::DEBUG::2018-03-01 11:15:55,103::storage_log::76::blivet::(log_exception_info) IGNORED:              File "/usr/lib/python2.7/site-packages/blivet/iscsi.py", line 146, in __init__
MainThread::DEBUG::2018-03-01 11:15:55,103::storage_log::76::blivet::(log_exception_info) IGNORED:                initiatorname = libiscsi.get_firmware_initiator_name()
MainThread::DEBUG::2018-03-01 11:15:55,103::storage_log::76::blivet::(log_exception_info) IGNORED:            IOError: Unknown error
MainThread::DEBUG::2018-03-01 11:15:55,103::storage_log::77::blivet::(log_exception_info) IGNORED:        End exception details.

Comment 14 Simone Tiraboschi 2018-03-02 15:57:42 UTC

(In reply to Justin Pittman from comment #13)
> (In reply to Simone Tiraboschi from comment #11)
> > (In reply to Ribu Tho from comment #7)
> > > I am not too sure for the above error being received as the storage is
> > > accessible and I am able to login to the target successfully from the host. 
> > 
> > This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1542426
> > See https://bugzilla.redhat.com/show_bug.cgi?id=1539598#c17
> 
> Those bugs are only about iSCSI attached storage, so how are the related to
> storage bexposed by NFS and Gluster for the other 2 tests?  

I was just referring to
2018-02-20 11:48:37,940+1100 DEBUG otopi.plugins.gr_he_setup.storage.blockd blockd._customization:720 exception
Traceback (most recent call last):
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-setup/storage/blockd.py", line 716, in _customization
    lunGUID
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-setup/storage/blockd.py", line 517, in _validate_domain
    _('The requested device is not listed by VDSM')
RuntimeError: The requested device is not listed by VDSM

on comment 7 on thkis bug.
That one was on iSCSI.

Comment 16 Ribu Tho 2018-03-12 21:38:45 UTC

(In reply to Simone Tiraboschi from comment #10)
> At the time of the issue (around 2018-02-16 15:54 ) the system seamed to be
> really slow; according to the log it took 4'30" to locally extract a 1 GB
> tar.gz file (with a less than 3 GB inside ) and about 10 seconds just to run
> '/bin/qemu-img info --output json' on it while normally it's by far faster:
> 
>  2018-02-16 15:52:09 INFO otopi.plugins.gr_he_common.vm.boot_disk
> boot_disk.prepare:216 Extracting disk image from OVF archive (could take a
> few minutes depending on archive size)
>  2018-02-16 15:56:32 INFO otopi.plugins.gr_he_common.vm.boot_disk
> boot_disk._validate_volume:91 Validating pre-allocated volume size
>  2018-02-16 15:56:32 DEBUG otopi.plugins.gr_he_common.vm.boot_disk
> plugin.executeRaw:813 execute: ('/bin/sudo', '-u', 'vdsm', '-g', 'kvm',
> '/bin/qemu-img', 'info', '--output', 'json', '/var/tmp/tmpl1ssJY'),
> executable='None', cwd='None', env=None
>  2018-02-16 15:56:42 DEBUG otopi.plugins.gr_he_common.vm.boot_disk
> plugin.executeRaw:863 execute-result: ('/bin/sudo', '-u', 'vdsm', '-g',
> 'kvm', '/bin/qemu-img', 'info', '--output', 'json', '/var/tmp/tmpl1ssJY'),
> rc=0
> 
> I suspect that the sanlock timeout could be due to that.
> 
> Also using up to 100% of the available memory on the host for the engine VM
> appears really risky:
>  2018-02-16 15:49:28 DEBUG otopi.plugins.gr_he_setup.vm.memory
> dialog.queryEnvKey:90 queryEnvKey called for key OVEHOSTED_VM/vmMemSizeMB
>  2018-02-16 15:49:28 DEBUG otopi.plugins.otopi.dialog.machine
> dialog.__logString:204 DIALOG:SEND       **%QStart: ovehosted_vmenv_mem
>  2018-02-16 15:49:28 DEBUG otopi.plugins.otopi.dialog.machine
> dialog.__logString:204 DIALOG:SEND       ### Please specify the memory size
> of the VM in MB (Defaults to maximum available): [4993]: 
>  2018-02-16 15:49:28 DEBUG otopi.plugins.otopi.dialog.machine
> dialog.__logString:204 DIALOG:SEND       **%QDefault: 4993
>  2018-02-16 15:49:28 DEBUG otopi.plugins.otopi.dialog.machine
> dialog.__logString:204 DIALOG:SEND       **%QHidden: FALSE
>  2018-02-16 15:49:28 DEBUG otopi.plugins.otopi.dialog.machine
> dialog.__logString:204 DIALOG:SEND       ***Q:STRING ovehosted_vmenv_mem
>  2018-02-16 15:49:28 DEBUG otopi.plugins.otopi.dialog.machine
> dialog.__logString:204 DIALOG:SEND       **%QEnd: ovehosted_vmenv_mem
>  2018-02-16 15:49:29 DEBUG otopi.plugins.otopi.dialog.machine
> dialog.__logString:204 DIALOG:RECEIVE    4993
>  2018-02-16 15:49:29 DEBUG otopi.context context.dumpEnvironment:760
> ENVIRONMENT DUMP - BEGIN
>  2018-02-16 15:49:29 DEBUG otopi.context context.dumpEnvironment:770 ENV
> OVEHOSTED_VM/vmMemSizeMB=int:'4993'
> 
> Ribu, could you please try reproducing on a system with more resources?


Simone ,

As per my comment #15 I have reproduced the issue with cockpit again. Please check out the comment for more info 

Ribu

Comment 17 Ying Cui 2018-03-14 09:07:10 UTC

According to the comment 15, we move it to cockpit-ovirt component for further investigation from UI side.

Comment 18 Sandro Bonazzola 2018-03-14 09:11:48 UTC

Moving to Phillip since Ryan is on PTO

Comment 19 Yihui Zhao 2018-03-14 10:40:08 UTC

I cannot reproduce this issue with the given versions in the description

Test versions:
rhvh-4.2.0.6-0.20171218.0+1
ovirt-hosted-engine-ha-2.2.2-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.2-1.el7ev.noarch
cockpit-ws-157-1.el7.x86_64
cockpit-storaged-157-1.el7.noarch
cockpit-bridge-157-1.el7.x86_64
cockpit-system-157-1.el7.noarch
cockpit-dashboard-157-1.el7.x86_64
cockpit-ovirt-dashboard-0.11.3-0.1.el7ev.noarch
cockpit-157-1.el7.x86_64
rhvm-appliance-4.1.20180125.0-1.el7.noarch
vdsm-4.20.9.3-1.el7ev.x86_64

Test steps:
1. Install RHVH
2. Deploy HE via cockpit


Test results:
 
[root@hp-dl388g9-05 tmp]# cat he-setup-answerfile.conf 
[environment:default]
OVEHOSTED_CORE/rollbackProceed=none:None
OVEHOSTED_CORE/screenProceed=none:None
OVEHOSTED_CORE/deployProceed=bool:true
OVEHOSTED_CORE/upgradeProceed=none:None
OVEHOSTED_CORE/confirmSettings=bool:true
OVEHOSTED_STORAGE/domainType=str:nfs3
OVEHOSTED_STORAGE/imgSizeGB=str:58
OVEHOSTED_STORAGE/storageDomain=str:hosted_storage
OVEHOSTED_STORAGE/storageDomainConnection=str:10.66.148.11:/home/yzhao1/nfs8
OVEHOSTED_STORAGE/mntOptions=str:
OVEHOSTED_NETWORK/bridgeIf=str:eno1
OVEHOSTED_NETWORK/bridgeName=str:ovirtmgmt
OVEHOSTED_NETWORK/firewallManager=str:iptables
OVEHOSTED_NETWORK/gateway=str:10.73.75.254
OVEHOSTED_NETWORK/fqdn=str:rhevh-hostedengine-vm-05.lab.eng.pek2.redhat.com
OVEHOSTED_VM/bootDevice=str:disk
OVEHOSTED_VM/vmVCpus=str:2
OVEHOSTED_VM/vmMACAddr=str:52:54:00:5d:21:64
OVEHOSTED_VM/vmMemSizeMB=int:4096
OVEHOSTED_VM/cloudinitVMStaticCIDR=str:
OVEHOSTED_VM/cloudinitVMDNS=str:
OVEHOSTED_VM/cloudinitVMTZ=str:Asia/Shanghai 
OVEHOSTED_VM/cloudInitISO=str:generate
OVEHOSTED_VM/cloudinitInstanceHostName=str:rhevh-hostedengine-vm-05
OVEHOSTED_VM/cloudinitInstanceDomainName=str:lab.eng.pek2.redhat.com
OVEHOSTED_VM/cloudinitExecuteEngineSetup=bool:true
OVEHOSTED_VM/automateVMShutdown=bool:true
OVEHOSTED_VM/cloudinitRootPwd=str:redhat
OVEHOSTED_VM/rootSshPubkey=str:
OVEHOSTED_VM/rootSshAccess=str:yes
OVEHOSTED_VM/cloudinitVMETCHOSTS=bool:true
OVEHOSTED_ENGINE/adminPassword=str:redhat
OVEHOSTED_VDSM/consoleType=str:vnc
OVEHOSTED_VDSM/cpu=str:model_Broadwell
OVEHOSTED_NOTIF/smtpServer=str:localhost
OVEHOSTED_NOTIF/smtpPort=str:25
OVEHOSTED_NOTIF/sourceEmail=str:root@localhost
OVEHOSTED_NOTIF/destEmail=str:root@localhost

[root@hp-dl388g9-05 tmp]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : hp-dl388g9-05.lab.eng.pek2.redhat.com
Host ID                            : 1
Engine status                      : {"health": "good", "vm": "up", "detail": "Up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : ff9346da
local_conf_timestamp               : 3043
Host timestamp                     : 3042
Extra metadata (valid at timestamp):
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=3042 (Wed Mar 14 18:32:47 2018)
	host-id=1
	score=3400
	vm_conf_refresh_time=3043 (Wed Mar 14 18:32:48 2018)
	conf_on_shared_storage=True
	maintenance=False
	state=EngineUp
	stopped=False


Also, I have the question about that, this attachment: https://bugzilla.redhat.com/attachment.cgi?id=1396820 seems that it is a RHVH4.1 system.  Why describe the version which you tested with RHVH-4.2-20171218.0-RHVH-x86_64-dvd1.iso? 

Then, if installed the RHVH-4.2-20171218.0-RHVH-x86_64-dvd1.iso, the vdsm version should be vdsm-4.20.9.3-1.el7ev.x86_64
 , not  vdsm-4.19.45-1.el7ev.x86_64. I confused about that.

Comment 20 Sandro Bonazzola 2018-03-15 10:41:19 UTC

Not blocking 4.2.2 on this being not reproducible. Moving to 4.2.4 waiting for needed info.

Comment 23 Yihui Zhao 2018-03-21 06:41:32 UTC

Hi Ribu,

Pls see the picture which attached: https://bugzilla.redhat.com/attachment.cgi?id=1396820

From the cockpit, "The image not uploaded to the data domain", maybe it is the point.


1. Confirm to install rhvm-appliance successfully

2. Prepare the clean nfs storage domain, confirm to connect

Also , you can attach the answer file to the attachment.
Here is mine.

[root@hp-dl388g9-05 tmp]# cat he-setup-answerfile.conf 
[environment:default]
OVEHOSTED_CORE/rollbackProceed=none:None
OVEHOSTED_CORE/screenProceed=none:None
OVEHOSTED_CORE/deployProceed=bool:true
OVEHOSTED_CORE/upgradeProceed=none:None
OVEHOSTED_CORE/confirmSettings=bool:true
OVEHOSTED_STORAGE/domainType=str:nfs3
OVEHOSTED_STORAGE/imgSizeGB=str:58
OVEHOSTED_STORAGE/storageDomain=str:hosted_storage
OVEHOSTED_STORAGE/storageDomainConnection=str:10.66.148.11:/home/yzhao1/nfs8
OVEHOSTED_STORAGE/mntOptions=str:
OVEHOSTED_NETWORK/bridgeIf=str:eno1
OVEHOSTED_NETWORK/bridgeName=str:ovirtmgmt
OVEHOSTED_NETWORK/firewallManager=str:iptables
OVEHOSTED_NETWORK/gateway=str:10.73.75.254
OVEHOSTED_NETWORK/fqdn=str:rhevh-hostedengine-vm-05.lab.eng.pek2.redhat.com
OVEHOSTED_VM/bootDevice=str:disk
OVEHOSTED_VM/vmVCpus=str:2
OVEHOSTED_VM/vmMACAddr=str:52:54:00:5d:21:64
OVEHOSTED_VM/vmMemSizeMB=int:4096
OVEHOSTED_VM/cloudinitVMStaticCIDR=str:
OVEHOSTED_VM/cloudinitVMDNS=str:
OVEHOSTED_VM/cloudinitVMTZ=str:Asia/Shanghai 
OVEHOSTED_VM/cloudInitISO=str:generate
OVEHOSTED_VM/cloudinitInstanceHostName=str:rhevh-hostedengine-vm-05
OVEHOSTED_VM/cloudinitInstanceDomainName=str:lab.eng.pek2.redhat.com
OVEHOSTED_VM/cloudinitExecuteEngineSetup=bool:true
OVEHOSTED_VM/automateVMShutdown=bool:true
OVEHOSTED_VM/cloudinitRootPwd=str:redhat
OVEHOSTED_VM/rootSshPubkey=str:
OVEHOSTED_VM/rootSshAccess=str:yes
OVEHOSTED_VM/cloudinitVMETCHOSTS=bool:true
OVEHOSTED_ENGINE/adminPassword=str:redhat
OVEHOSTED_VDSM/consoleType=str:vnc
OVEHOSTED_VDSM/cpu=str:model_Broadwell
OVEHOSTED_NOTIF/smtpServer=str:localhost
OVEHOSTED_NOTIF/smtpPort=str:25
OVEHOSTED_NOTIF/sourceEmail=str:root@localhost
OVEHOSTED_NOTIF/destEmail=str:root@localhost

Comment 24 Simone Tiraboschi 2018-04-06 08:37:11 UTC

From the logs I see that this happens starting the vintage flow from cockpit but now we completely removed it supporting only the new ansible flow from cockpit and there we are not directly dealing with sanlock.

According to comment 15 is not reproducible from command line so let's close it as WONTFIX since we removed the vintage flow support from cockpit and CLI works.

Comment 26 Franta Kust 2019-05-16 13:08:14 UTC

BZ<2>Jira Resync