Description of problem: HE deployment is Failing on RHVH-4.4 i feel that its unable to start vdsmd.service ------------------------------------------- (if applicable): vdsm-4.40.5-1.el8ev.x86_64 vdsm-client-4.40.5-1.el8ev.noarch vdsm-api-4.40.5-1.el8ev.noarch vdsm-hook-vhostmd-4.40.5-1.el8ev.noarch vdsm-http-4.40.5-1.el8ev.noarch vdsm-hook-ethtool-options-4.40.5-1.el8ev.noarch vdsm-hook-openstacknet-4.40.5-1.el8ev.noarch vdsm-common-4.40.5-1.el8ev.noarch vdsm-network-4.40.5-1.el8ev.x86_64 vdsm-jsonrpc-4.40.5-1.el8ev.noarch vdsm-hook-vmfex-dev-4.40.5-1.el8ev.noarch vdsm-hook-fcoe-4.40.5-1.el8ev.noarch vdsm-gluster-4.40.5-1.el8ev.x86_64 vdsm-yajsonrpc-4.40.5-1.el8ev.noarch vdsm-python-4.40.5-1.el8ev.noarch ----------------------------------------- How reproducible: Always ------------------------------------------ Steps to Reproduce: 1. From cockpit click in Hyperconverged and deploy gluster it will be successfully complete 2. Deploy HE this step will fail -------------------------------------------------------------------------------------- Additional info: ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": [{"address": "rhsqa-grafton1.lab.eng.blr.redhat.com", "affinity_labels": [], "auto_numa_status": "unknown", "certificate": {"organization": "lab.eng.blr.redhat.com", "subject": "O=lab.eng.blr.redhat.com,CN=rhsqa-grafton1.lab.eng.blr.redhat.com"}, "cluster": {"href": "/ovirt-engine/api/clusters/56a6240c-5e01-11ea-a454-004554194801", "id": "56a6240c-5e01-11ea-a454-004554194801"}, "comment": "", "cpu": {"speed": 0.0, "topology": {}}, "device_passthrough": {"enabled": false}, "devices": [], "external_network_provider_configurations": [], "external_status": "ok", "hardware_information": {"supported_rng_sources": []}, "hooks": [], "href": "/ovirt-engine/api/hosts/362dd036-aa2e-403b-9aee-f47ff7fa7496", "id": "362dd036-aa2e-403b-9aee-f47ff7fa7496", "katello_errata": [], "kdump_status": "unknown", "ksm": {"enabled": false}, "max_scheduling_memory": 0, "memory": 0, "name": "rhsqa-grafton1.lab.eng.blr.redhat.com", "network_attachments": [], "nics": [], "numa_nodes": [], "numa_supported": false, "os": {"custom_kernel_cmdline": ""}, "permissions": [], "port": 54321, "power_management": {"automatic_pm_enabled": true, "enabled": false, "kdump_detection": true, "pm_proxies": []}, "protocol": "stomp", "se_linux": {}, "spm": {"priority": 5, "status": "none"}, "ssh": {"fingerprint": "SHA256:afIfjlqbi4e9fzOARDkN0wfg2IVI3qI/Dejc3kTUHPo", "port": 22}, "statistics": [], "status": "install_failed", "storage_connection_extensions": [], "summary": {"total": 0}, "tags": [], "transparent_huge_pages": {"enabled": false}, "type": "rhel", "unmanaged_networks": [], "update_available": false, "vgpu_placement": "consolidated"}]}, "attempts": 120, "changed": false, "deprecations": [{"msg": "The 'ovirt_host_facts' module has been renamed to 'ovirt_host_info', and the renamed one no longer returns ansible_facts", "version": "2.13"}]} [ INFO ] TASK [ovirt.hosted_engine_setup : Fetch logs from the engine VM] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Set destination directory path] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Create destination directory] [ INFO ] changed: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : include_tasks] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Find the local appliance image] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Set local_vm_disk_path] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Give the vm time to flush dirty buffers] [ INFO ] ok: [localhost -> localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Copy engine logs] [ INFO ] changed: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : include_tasks] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Remove local vm dir] [ INFO ] changed: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Remove temporary entry in /etc/hosts for the local VM] [ INFO ] changed: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Notify the user about a failure] [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}
We could debug to some point. When HE deployment is initiated from web console, 'HostedEngineLocal' is created, and its up, engine-setup is done, and when adding that 'HE host' to the cluster, the host never becomes operational and moves in to non-operational. At the time these events happens, engine is trying to restart vdsmd and vdsmd couldn't start any more due to dependency failed <snip> [root@rhsqa-grafton1 ~]# systemctl status vdsmd ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; disabled; vendor preset: enabled) Active: inactive (dead) Mar 05 09:43:14 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Virtual Desktop Server Manager. Mar 05 09:43:14 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: vdsmd.service: Job vdsmd.service/start failed with result 'dependency'. [root@rhsqa-grafton1 ~]# systemctl status supervdsmd ● supervdsmd.service - Auxiliary vdsm service for running helper functions as root Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static; vendor preset: enabled) Active: inactive (dead) since Thu 2020-03-05 11:25:18 UTC; 18h ago Main PID: 1577 (code=exited, status=0/SUCCESS) Mar 05 09:43:00 localhost.localdomain systemd[1]: Started Auxiliary vdsm service for running helper functions as root. Mar 05 09:43:03 localhost.localdomain supervdsmd[1577]: failed to load module lvm: libbd_lvm.so.2: cannot open shared object file: No such file or directory Mar 05 09:43:03 localhost.localdomain supervdsmd[1577]: failed to load module mpath: libbd_mpath.so.2: cannot open shared object file: No such file or directory Mar 05 09:43:03 localhost.localdomain supervdsmd[1577]: failed to load module dm: libbd_dm.so.2: cannot open shared object file: No such file or directory Mar 05 09:43:03 localhost.localdomain supervdsmd[1577]: failed to load module nvdimm: libbd_nvdimm.so.2: cannot open shared object file: No such file or directory Mar 05 11:25:18 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Stopping Auxiliary vdsm service for running helper functions as root... Mar 05 11:25:18 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Stopped Auxiliary vdsm service for running helper functions as root. Mar 05 11:25:24 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Auxiliary vdsm service for running helper functions as root. Mar 05 11:25:24 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: supervdsmd.service: Job supervdsmd.service/start failed with result 'dependency'. [root@rhsqa-grafton1 ~]# systemctl status libvirtd-tls.socket ● libvirtd-tls.socket - Libvirt TLS IP socket Loaded: loaded (/usr/lib/systemd/system/libvirtd-tls.socket; enabled; vendor preset: disabled) Active: failed (Result: service-start-limit-hit) since Thu 2020-03-05 09:43:16 UTC; 20h ago Listen: [::]:16514 (Stream) Mar 05 09:42:54 localhost.localdomain systemd[1]: Listening on Libvirt TLS IP socket. Mar 05 09:43:16 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: libvirtd-tls.socket: Failed with result 'service-start-limit-hit'. Mar 05 11:25:24 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: libvirtd-tls.socket: Socket service libvirtd.service already active, refusing. Mar 05 11:25:24 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Failed to listen on Libvirt TLS IP socket. </snip> When checked with Marcin Sobcyzk, he exclaimed that this issue is the result improper host configuration and pointed to get help from Martin Perina. Martin Perina is currently investigating the issue. Access to the setup is made available for him to debug and find out the real cause. This is not only the RHHI problem, but also the HE deployment problem at RHV 4.4
RHHI-V 1.8 deployment with 3 node works good with the workaround from Bug 1823423. The particular issue on the bug is not seen The builds used for the verification are: RHVH-4.4-20200417.0-RHVH-x86_64-dvd1.iso rhvm-appliance-4.4-20200417.0.el8ev.x86_64.rpm @Milind, could you verify this bug also with single node RHHI-V 1.8 deployment and verify this bug ?
As the Vdsmd is successfully restated and HE Deployment is successfully . Hence marking this bug as verified [root@rhsqa-grafton1 vdsm]# imgbase w You are on rhvh-4.4.0.18-0.20200417.0+1 [root@rhsqa-grafton1 vdsm]# systemctl status vdsmd ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2020-05-05 09:49:44 UTC; 1h 46min ago Main PID: 278336 (vdsmd) Tasks: 58 (limit: 1648316) Memory: 115.6M CGroup: /system.slice/vdsmd.service ├─278336 /usr/bin/python3 /usr/share/vdsm/vdsmd └─328561 /usr/libexec/ioprocess --read-pipe-fd 59 --write-pipe-fd 58 --max-threads 10 --max-queued-requests 10 May 05 09:49:45 rhsqa-grafton1.lab.eng.blr.redhat.com vdsm[278336]: WARN Not ready yet, ignoring event '|virt|VM_status|0a3a305e-0d89-4ce4-93f9-bb3441e27874' args={'0a3a305e-0d89-4ce4-93f9-> May 05 11:05:52 rhsqa-grafton1.lab.eng.blr.redhat.com vdsm[278336]: WARN Failed to retrieve Hosted Engine HA info, is Hosted Engine setup finished? May 05 11:06:07 rhsqa-grafton1.lab.eng.blr.redhat.com vdsm[278336]: WARN Failed to retrieve Hosted Engine HA info, is Hosted Engine setup finished? May 05 11:06:22 rhsqa-grafton1.lab.eng.blr.redhat.com vdsm[278336]: WARN Failed to retrieve Hosted Engine HA info, is Hosted Engine setup finished? May 05 11:06:37 rhsqa-grafton1.lab.eng.blr.redhat.com vdsm[278336]: WARN Failed to retrieve Hosted Engine HA info, is Hosted Engine setup finished? May 05 11:06:43 rhsqa-grafton1.lab.eng.blr.redhat.com vdsm[278336]: WARN unhandled write event May 05 11:07:12 rhsqa-grafton1.lab.eng.blr.redhat.com vdsm[278336]: WARN Failed to retrieve Hosted Engine HA info: timed out May 05 11:07:14 rhsqa-grafton1.lab.eng.blr.redhat.com vdsm[278336]: WARN Failed to retrieve Hosted Engine HA info: timed out May 05 11:07:14 rhsqa-grafton1.lab.eng.blr.redhat.com vdsm[278336]: WARN Failed to retrieve Hosted Engine HA info: timed out May 05 11:08:24 rhsqa-grafton1.lab.eng.blr.redhat.com vdsm[278336]: WARN Attempting to add an existing net user: ovirtmgmt/6d150246-22bc-48ac-8a6b-98abea28d4d3
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHHI for Virtualization 1.8 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:3314