Bug 1464664 - Self Hosted Engine deploy Stuck
Self Hosted Engine deploy Stuck
Status: CLOSED WORKSFORME
Product: ovirt-node
Classification: oVirt
Component: Installation & Update (Show other bugs)
4.1
x86_64 Unspecified
unspecified Severity medium (vote)
: ---
: ---
Assigned To: Ryan Barry
Yihui Zhao
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-24 07:08 EDT by KooV
Modified: 2017-08-01 05:25 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-01 05:25:42 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Node
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
cshao: testing_ack+


Attachments (Terms of Use)
hosted engine stuck screen (8.80 KB, image/png)
2017-06-24 07:08 EDT, KooV
no flags Details
VMWare guest vmx file (2.59 KB, text/plain)
2017-06-26 22:56 EDT, KooV
no flags Details
vmware setting capture (23.39 KB, image/png)
2017-06-26 22:57 EDT, KooV
no flags Details
/var/log/* (341.23 KB, application/x-bzip)
2017-07-19 01:30 EDT, Yihui Zhao
no flags Details
cockpit_error (57.78 KB, image/png)
2017-07-19 01:33 EDT, Yihui Zhao
no flags Details

  None (edit)
Description KooV 2017-06-24 07:08:56 EDT
Created attachment 1291465 [details]
hosted engine stuck screen

Description of problem:

Self Hosted Engine deploy stuck

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Hypervisor by Nested VM (Vmware - VM Hypervisor ovirt node - SelfHosted Engine)
2. install oVirt Node (Support CPU Virtualization, VT-D enable)
3. deploy Self hosted engine
4. get stuck when applience Manager boot (ref. screenshot)

Actual results:


Expected results:


Additional info:
Comment 1 Ryan Barry 2017-06-24 11:49:08 EDT
Interesting, because this definitely works with nested KVM.

VT-d shouldn't be relevant here unless you're passing through devices or otherwise using IOMMU, though.

Can you post your .vmx?

Can you also post the output of:

cat /proc/cpuinfo
modinfo kvm_intel (or kvm_amd, if you have an AMD CPU)

Simone, any ideas here? I've seen a couple of reports about this here and there, but I don't have a vmware environment to test on right now.
Comment 2 KooV 2017-06-26 22:56 EDT
Created attachment 1292159 [details]
VMWare guest vmx file
Comment 3 KooV 2017-06-26 22:57 EDT
Created attachment 1292160 [details]
vmware setting capture
Comment 4 KooV 2017-06-26 23:51:04 EDT
/proc/cpuinfo

processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 58
model name	: Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz
stepping	: 9
microcode	: 0x1c
cpu MHz		: 2400.936
cache size	: 6144 KB
physical id	: 0
siblings	: 8
core id		: 7
cpu cores	: 8
apicid		: 7
initial apicid	: 7
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm ida arat epb pln pts dtherm tpr_shadow vnmi ept vpid fsgsbase tsc_adjust smep
bogomips	: 4802.00
clflush size	: 64
cache_alignment	: 64
address sizes	: 42 bits physical, 48 bits virtual
power management:

[root@node1 ~]# modinfo kvm_intel
filename:       /lib/modules/3.10.0-514.16.1.el7.x86_64/kernel/arch/x86/kvm/kvm-intel.ko
license:        GPL
author:         Qumranet
rhelversion:    7.3
srcversion:     BA361F72DD6AA866A792893
alias:          x86cpu:vendor:*:family:*:model:*:feature:*0085*
depends:        kvm
intree:         Y
vermagic:       3.10.0-514.16.1.el7.x86_64 SMP mod_unload modversions 
signer:         CentOS Linux kernel signing key
sig_key:        3F:E1:EB:8B:4F:91:D4:84:CD:55:44:84:54:A0:24:DE:56:34:E1:06
sig_hashalgo:   sha256
parm:           vpid:bool
parm:           flexpriority:bool
parm:           ept:bool
parm:           unrestricted_guest:bool
parm:           eptad:bool
parm:           emulate_invalid_guest_state:bool
parm:           vmm_exclusive:bool
parm:           fasteoi:bool
parm:           enable_apicv:bool
parm:           enable_shadow_vmcs:bool
parm:           nested:bool
parm:           pml:bool
parm:           ple_gap:int
parm:           ple_window:int
parm:           ple_window_grow:int
parm:           ple_window_shrink:int
parm:           ple_window_max:int
Comment 5 KooV 2017-06-26 23:54:17 EDT
[root@node1 modprobe.d]# cat kvm.conf 
# Setting modprobe kvm_intel/kvm_amd nested = 1
# only enables Nested Virtualization until the next reboot or
# module reload. Uncomment the option applicable
# to your system below to enable the feature permanently.
#
# User changes in this file are preserved across upgrades.
#
# For Intel
#options kvm_intel nested=1
#
# For AMD
#options kvm_amd nested=1


I was changed options kvm_intel nested=1 uncomment. but the same.
Comment 6 Ryan Barry 2017-07-03 10:50:24 EDT
The "nested" options for KVM apply to a host which is running KVM in which you want to run guests which can also run KVM.

I don't see anything obvious in the CPU flags which would prevent this, or in the vmx, but I haven't used vSphere since 5.0. 

I might suggest Google to see what changes are needed in a .vmx to run nested KVM in the version you're running. In 5.0, there were some changes needed to the virtual hardware, as well as setting a promiscuous vswitch, but I can't recall the details.
Comment 7 Yihui Zhao 2017-07-19 01:30 EDT
Created attachment 1300804 [details]
/var/log/*
Comment 8 Yihui Zhao 2017-07-19 01:31:39 EDT
I have tested , but occur some errors.

Test version:
rhvh-4.1-0.20170706.0+1
ovirt-host-deploy-1.6.6-1.el7ev.noarch
ovirt-imageio-common-1.0.0-0.el7ev.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-hosted-engine-ha-2.1.4-1.el7ev.noarch
ovirt-setup-lib-1.1.3-1.el7ev.noarch
python-ovirt-engine-sdk4-4.1.5-1.el7ev.x86_64
ovirt-node-ng-nodectl-4.1.3-0.20170608.1.el7.noarch
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
ovirt-hosted-engine-setup-2.1.3.3-1.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
ovirt-imageio-daemon-1.0.0-0.el7ev.noarch
cockpit-ovirt-dashboard-0.10.7-0.0.20.el7ev.noarch
rhvm-appliance-4.1.20170709.3-1.el7.noarch

Steps to Reproduce:
1. Hypervisor by Nested VM (Vmware workstation 12 - VM Hypervisor ovirt node - SelfHosted Engine)
2. install oVirt Node (Support CPU Virtualization, VT-D enable)
3. deploy Self hosted engine


Actual results:
Cockpit shows that:

 Failed to execute stage 'Closing up': The VM is not powering up: please check VDSM logs
 Hosted Engine deployment failed: this system is not reliable, please check the issue,fix and redeploy

vdsm.log:

2017-07-19 13:28:48,904+0800 ERROR (periodic/2) [ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink] Failed to connect to broker, the number of errors has exceeded the limit (1) (brokerlink:75)
2017-07-19 13:28:48,905+0800 ERROR (periodic/2) [root] failed to retrieve Hosted Engine HA info (api:252)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo
    stats = instance.get_all_stats()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats
    with broker.connection(self._retries, self._wait):
  File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection
    self.connect(retries, wait)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect
    raise BrokerConnectionError(error_msg)
BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1)
2017-07-19 13:28:53,214+0800 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:539)




Additional info:
1. [root@dhcp-10-241 vdsm]# cat /proc/cpuinfo 
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 16
model		: 2
model name	: AMD Phenom(tm) 9600B Quad-Core Processor
stepping	: 3
microcode	: 0x1000083
cpu MHz		: 2304.134
cache size	: 512 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc art rep_good nopl tsc_reliable nonstop_tsc pni cx16 popcnt hypervisor lahf_lm svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw npt svm_lock
bogomips	: 4609.61
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 42 bits physical, 48 bits virtual
power management:


2. [root@dhcp-10-241 vdsm]# modinfo kvm_intel
filename:       /lib/modules/3.10.0-514.26.1.el7.x86_64/kernel/arch/x86/kvm/kvm-intel.ko
license:        GPL
author:         Qumranet
rhelversion:    7.3
srcversion:     E993023D5DB19ECF002844B
alias:          x86cpu:vendor:*:family:*:model:*:feature:*0085*
depends:        kvm
intree:         Y
vermagic:       3.10.0-514.26.1.el7.x86_64 SMP mod_unload modversions 
signer:         Red Hat Enterprise Linux kernel signing key
sig_key:        F3:16:72:35:9E:67:63:62:03:AF:87:68:85:AC:CE:08:BB:07:0C:DA
sig_hashalgo:   sha256
parm:           vpid:bool
parm:           flexpriority:bool
parm:           ept:bool
parm:           unrestricted_guest:bool
parm:           eptad:bool
parm:           emulate_invalid_guest_state:bool
parm:           vmm_exclusive:bool
parm:           fasteoi:bool
parm:           enable_apicv:bool
parm:           enable_shadow_vmcs:bool
parm:           nested:bool
parm:           pml:bool
parm:           ple_gap:int
parm:           ple_window:int
parm:           ple_window_grow:int
parm:           ple_window_shrink:int
parm:           ple_window_max:int


The /var/log/* in the attachment 1300804 [details]
Comment 9 Yihui Zhao 2017-07-19 01:33 EDT
Created attachment 1300805 [details]
cockpit_error
Comment 10 Yihui Zhao 2017-07-19 01:39:51 EDT
rhvh.vmx file:

.encoding = "GBK"
config.version = "8"
virtualHW.version = "12"
vcpu.hotadd = "TRUE"
scsi0.present = "TRUE"
scsi0.virtualDev = "lsilogic"
sata0.present = "TRUE"
memsize = "2048"
mem.hotadd = "TRUE"
scsi0:0.present = "TRUE"
scsi0:0.fileName = "Red Hat Enterprise Linux 7 64-bit.vmdk"
sata0:1.present = "TRUE"
sata0:1.fileName = "C:\Users\boyang\Downloads\RHVH-4.1-20170706.1-RHVH-x86_64-dvd1.iso"
sata0:1.deviceType = "cdrom-image"
ethernet0.present = "TRUE"
ethernet0.connectionType = "bridged"
ethernet0.virtualDev = "e1000"
ethernet0.wakeOnPcktRcv = "FALSE"
ethernet0.addressType = "generated"
usb.present = "TRUE"
ehci.present = "TRUE"
ehci.pciSlotNumber = "35"
sound.present = "TRUE"
sound.startConnected = "FALSE"
sound.fileName = "-1"
sound.autodetect = "TRUE"
mks.enable3d = "TRUE"
svga.graphicsMemoryKB = "786432"
serial0.present = "TRUE"
serial0.fileType = "thinprint"
pciBridge0.present = "TRUE"
pciBridge4.present = "TRUE"
pciBridge4.virtualDev = "pcieRootPort"
pciBridge4.functions = "8"
pciBridge5.present = "TRUE"
pciBridge5.virtualDev = "pcieRootPort"
pciBridge5.functions = "8"
pciBridge6.present = "TRUE"
pciBridge6.virtualDev = "pcieRootPort"
pciBridge6.functions = "8"
pciBridge7.present = "TRUE"
pciBridge7.virtualDev = "pcieRootPort"
pciBridge7.functions = "8"
vmci0.present = "TRUE"
hpet0.present = "TRUE"
usb.vbluetooth.startConnected = "TRUE"
displayName = "Red Hat Enterprise Linux 7 64-bit"
guestOS = "rhel7-64"
nvram = "Red Hat Enterprise Linux 7 64-bit.nvram"
virtualHW.productCompatibility = "hosted"
powerType.powerOff = "soft"
powerType.powerOn = "soft"
powerType.suspend = "soft"
powerType.reset = "soft"
extendedConfigFile = "Red Hat Enterprise Linux 7 64-bit.vmxf"
uuid.bios = "56 4d 25 6d a4 90 2d 8b-6e 35 3c ea dc b9 2f f7"
uuid.location = "56 4d 25 6d a4 90 2d 8b-6e 35 3c ea dc b9 2f f7"
migrate.hostlog = ".\Red Hat Enterprise Linux 7 64-bit-85de2077.hlog"
scsi0:0.redo = ""
pciBridge0.pciSlotNumber = "17"
pciBridge4.pciSlotNumber = "21"
pciBridge5.pciSlotNumber = "22"
pciBridge6.pciSlotNumber = "23"
pciBridge7.pciSlotNumber = "24"
scsi0.pciSlotNumber = "16"
usb.pciSlotNumber = "32"
ethernet0.pciSlotNumber = "33"
sound.pciSlotNumber = "34"
vmci0.pciSlotNumber = "36"
sata0.pciSlotNumber = "37"
ethernet0.generatedAddress = "00:0C:29:B9:2F:F7"
ethernet0.generatedAddressOffset = "0"
vmci0.id = "-591843337"
monitor.phys_bits_used = "42"
vmotion.checkpointFBSize = "8388608"
vmotion.checkpointSVGAPrimarySize = "33554432"
cleanShutdown = "TRUE"
softPowerOff = "FALSE"
usb:1.speed = "2"
usb:1.present = "TRUE"
usb:1.deviceType = "hub"
usb:1.port = "1"
usb:1.parent = "-1"
svga.guestBackedPrimaryAware = "TRUE"
ethernet0.linkStatePropagation.enable = "true"
tools.syncTime = "FALSE"
tools.remindInstall = "TRUE"
vhv.enable = "TRUE"
vpmc.enable = "TRUE"
floppy0.present = "FALSE"
usb:0.present = "TRUE"
usb:0.deviceType = "hid"
usb:0.port = "0"
usb:0.parent = "-1"

Note You need to log in before you can comment on or make changes to this bug.