Bug 1810882

Summary: [vdsmd] Hosted Engine deployment failed when trying to restart vdsmd
Product: [oVirt] ovirt-engine Reporter: SATHEESARAN <sasundar>
Component: Host-DeployAssignee: Marcin Sobczyk <msobczyk>
Status: CLOSED CURRENTRELEASE QA Contact: SATHEESARAN <sasundar>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.4.0CC: bugs, eslutsky, godas, michal.skrivanek, msobczyk, mtessun, mwaykole, nsednev, rgolan, rhs-bugs, sasundar, sbonazzo, stirabos
Target Milestone: ovirt-4.4.0Keywords: TestBlocker
Target Release: ---Flags: pm-rhel: ovirt-4.4+
sasundar: blocker?
mtessun: planning_ack+
pm-rhel: devel_ack+
sasundar: testing_ack+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: rhv-4.4.0-29 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1810043 Environment:
rhhiv, rhel8
Last Closed: 2020-05-20 20:00:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1810043    

Description SATHEESARAN 2020-03-06 06:02:08 UTC
Description of problem:
HE deployment is Failing on RHVH-4.4 
i feel that its unable to start vdsmd.service
-------------------------------------------
 (if applicable):

vdsm-4.40.5-1.el8ev.x86_64
vdsm-client-4.40.5-1.el8ev.noarch
vdsm-api-4.40.5-1.el8ev.noarch
vdsm-hook-vhostmd-4.40.5-1.el8ev.noarch
vdsm-http-4.40.5-1.el8ev.noarch
vdsm-hook-ethtool-options-4.40.5-1.el8ev.noarch
vdsm-hook-openstacknet-4.40.5-1.el8ev.noarch
vdsm-common-4.40.5-1.el8ev.noarch
vdsm-network-4.40.5-1.el8ev.x86_64
vdsm-jsonrpc-4.40.5-1.el8ev.noarch
vdsm-hook-vmfex-dev-4.40.5-1.el8ev.noarch
vdsm-hook-fcoe-4.40.5-1.el8ev.noarch
vdsm-gluster-4.40.5-1.el8ev.x86_64
vdsm-yajsonrpc-4.40.5-1.el8ev.noarch
vdsm-python-4.40.5-1.el8ev.noarch

-----------------------------------------
How reproducible:
Always
------------------------------------------
Steps to Reproduce:
1. From cockpit click in Hyperconverged and deploy gluster it will be successfully complete
2. Deploy HE this step will fail

--------------------------------------------------------------------------------------


Additional info:
 ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": [{"address": "rhsqa-grafton1.lab.eng.blr.redhat.com", "affinity_labels": [], "auto_numa_status": "unknown", "certificate": {"organization": "lab.eng.blr.redhat.com", "subject": "O=lab.eng.blr.redhat.com,CN=rhsqa-grafton1.lab.eng.blr.redhat.com"}, "cluster": {"href": "/ovirt-engine/api/clusters/56a6240c-5e01-11ea-a454-004554194801", "id": "56a6240c-5e01-11ea-a454-004554194801"}, "comment": "", "cpu": {"speed": 0.0, "topology": {}}, "device_passthrough": {"enabled": false}, "devices": [], "external_network_provider_configurations": [], "external_status": "ok", "hardware_information": {"supported_rng_sources": []}, "hooks": [], "href": "/ovirt-engine/api/hosts/362dd036-aa2e-403b-9aee-f47ff7fa7496", "id": "362dd036-aa2e-403b-9aee-f47ff7fa7496", "katello_errata": [], "kdump_status": "unknown", "ksm": {"enabled": false}, "max_scheduling_memory": 0, "memory": 0, "name": "rhsqa-grafton1.lab.eng.blr.redhat.com", "network_attachments": [], "nics": [], "numa_nodes": [], "numa_supported": false, "os": {"custom_kernel_cmdline": ""}, "permissions": [], "port": 54321, "power_management": {"automatic_pm_enabled": true, "enabled": false, "kdump_detection": true, "pm_proxies": []}, "protocol": "stomp", "se_linux": {}, "spm": {"priority": 5, "status": "none"}, "ssh": {"fingerprint": "SHA256:afIfjlqbi4e9fzOARDkN0wfg2IVI3qI/Dejc3kTUHPo", "port": 22}, "statistics": [], "status": "install_failed", "storage_connection_extensions": [], "summary": {"total": 0}, "tags": [], "transparent_huge_pages": {"enabled": false}, "type": "rhel", "unmanaged_networks": [], "update_available": false, "vgpu_placement": "consolidated"}]}, "attempts": 120, "changed": false, "deprecations": [{"msg": "The 'ovirt_host_facts' module has been renamed to 'ovirt_host_info', and the renamed one no longer returns ansible_facts", "version": "2.13"}]}
[ INFO ] TASK [ovirt.hosted_engine_setup : Fetch logs from the engine VM]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Set destination directory path]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Create destination directory]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : include_tasks]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Find the local appliance image]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Set local_vm_disk_path]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Give the vm time to flush dirty buffers]
[ INFO ] ok: [localhost -> localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Copy engine logs]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : include_tasks]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Remove local vm dir]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Remove temporary entry in /etc/hosts for the local VM]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Notify the user about a failure]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}

--- Additional comment from milind on 2020-03-04 13:31:35 UTC ---

[root@ ovirt-hosted-engine-setup]# systemctl status vdsmd
● vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; disabled; vendor preset: enabled)
   Active: inactive (dead) since Wed 2020-03-04 12:55:35 UTC; 31min ago
  Process: 2345190 ExecStart=/usr/share/vdsm/daemonAdapter -0 /dev/null -1 /dev/null -2 /dev/null /usr/share/vdsm/vdsmd (code=exited, status=0/SUCCESS)
  Process: 2345133 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
 Main PID: 2345190 (code=exited, status=0/SUCCESS)

Mar 04 12:54:30 rhsqa-grafton1.lab.eng.blr.redhat.com vdsm[2345190]: WARN MOM not available, KSM stats will be missing. Error:
Mar 04 12:54:30 rhsqa-grafton1.lab.eng.blr.redhat.com vdsm[2345190]: WARN Not ready yet, ignoring event '|virt|VM_status|49729d41-8a2c-418e-9252-e4ab9ff9dfae' args={'49729d41-8a2c-418e-9252>
Mar 04 12:55:35 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Stopping Virtual Desktop Server Manager...
Mar 04 12:55:35 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Stopped Virtual Desktop Server Manager.
Mar 04 12:58:02 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Virtual Desktop Server Manager.
Mar 04 12:58:02 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: vdsmd.service: Job vdsmd.service/start failed with result 'dependency'.
Mar 04 12:58:14 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Virtual Desktop Server Manager.
Mar 04 12:58:14 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: vdsmd.service: Job vdsmd.service/start failed with result 'dependency'.
Mar 04 12:58:26 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Virtual Desktop Server Manager.
Mar 04 12:58:26 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: vdsmd.service: Job vdsmd.service/start failed with result 'dependency'.


------------------------------------------------------------------------------------

2020-03-04 10:20:47,769+0000 ERROR ansible failed {
    "ansible_host": "localhost",
    "ansible_playbook": "/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml",
    "ansible_result": {
        "_ansible_no_log": false,
        "changed": false,
        "invocation": {
            "module_args": {
                "ca_file": null,
                "compress": true,
                "headers": null,
                "hostname": null,
                "insecure": null,
                "kerberos": false,
                "ovirt_auth": {
                    "ansible_facts": {
                        "ovirt_auth": {
                            "ca_file": null,
                            "compress": true,
                            "headers": null,
                            "insecure": true,
                            "kerberos": false,
                            "timeout": 0,
                            "token": "-Y6DI1EMxYYOAB05BWlnV6Yq7HAkXcTzv7DibcdWb24NfHAJdUzVH2i5V8WrZcHkVijgmcmdMdewCcgE-0THlA",
                            "url": "https://hostedengineSM1.lab.eng.blr.redhat.com/ovirt-engine/api"
                        }
                    },
                    "attempts": 1,
                    "changed": false,
                    "failed": false
                },
                "password": null,
                "state": "absent",
                "timeout": 0,
                "token": null,
                "url": null,
                "username": null
            }
        },
        "msg": "You must specify either 'url' or 'hostname'."
    },
    "ansible_task": "Always revoke the SSO token",
    "ansible_type": "task",
    "status": "FAILED",
:

--- Additional comment from milind on 2020-03-04 14:02:30 UTC ---

sosreport:http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/milind/1810043/

--- Additional comment from SATHEESARAN on 2020-03-06 05:57:02 UTC ---

We could debug to some point.

When HE deployment is initiated from web console, 'HostedEngineLocal' is created, and its up, engine-setup
is done, and when adding that 'HE host' to the cluster, the host never becomes operational and moves in to
non-operational.

At the time these events happens, engine is trying to restart vdsmd and vdsmd couldn't start any more due
to dependency failed

<snip>
[root@rhsqa-grafton1 ~]# systemctl status vdsmd
● vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; disabled; vendor preset: enabled)
   Active: inactive (dead)

Mar 05 09:43:14 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Virtual Desktop Server Manager.
Mar 05 09:43:14 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: vdsmd.service: Job vdsmd.service/start failed with result 'dependency'.
[root@rhsqa-grafton1 ~]# systemctl status supervdsmd
● supervdsmd.service - Auxiliary vdsm service for running helper functions as root
   Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static; vendor preset: enabled)
   Active: inactive (dead) since Thu 2020-03-05 11:25:18 UTC; 18h ago
 Main PID: 1577 (code=exited, status=0/SUCCESS)

Mar 05 09:43:00 localhost.localdomain systemd[1]: Started Auxiliary vdsm service for running helper functions as root.
Mar 05 09:43:03 localhost.localdomain supervdsmd[1577]: failed to load module lvm: libbd_lvm.so.2: cannot open shared object file: No such file or directory
Mar 05 09:43:03 localhost.localdomain supervdsmd[1577]: failed to load module mpath: libbd_mpath.so.2: cannot open shared object file: No such file or directory
Mar 05 09:43:03 localhost.localdomain supervdsmd[1577]: failed to load module dm: libbd_dm.so.2: cannot open shared object file: No such file or directory
Mar 05 09:43:03 localhost.localdomain supervdsmd[1577]: failed to load module nvdimm: libbd_nvdimm.so.2: cannot open shared object file: No such file or directory
Mar 05 11:25:18 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Stopping Auxiliary vdsm service for running helper functions as root...
Mar 05 11:25:18 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Stopped Auxiliary vdsm service for running helper functions as root.
Mar 05 11:25:24 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Auxiliary vdsm service for running helper functions as root.
Mar 05 11:25:24 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: supervdsmd.service: Job supervdsmd.service/start failed with result 'dependency'.
[root@rhsqa-grafton1 ~]# systemctl status libvirtd-tls.socket
● libvirtd-tls.socket - Libvirt TLS IP socket
   Loaded: loaded (/usr/lib/systemd/system/libvirtd-tls.socket; enabled; vendor preset: disabled)
   Active: failed (Result: service-start-limit-hit) since Thu 2020-03-05 09:43:16 UTC; 20h ago
   Listen: [::]:16514 (Stream)

Mar 05 09:42:54 localhost.localdomain systemd[1]: Listening on Libvirt TLS IP socket.
Mar 05 09:43:16 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: libvirtd-tls.socket: Failed with result 'service-start-limit-hit'.
Mar 05 11:25:24 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: libvirtd-tls.socket: Socket service libvirtd.service already active, refusing.
Mar 05 11:25:24 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Failed to listen on Libvirt TLS IP socket.
</snip>

When checked with Marcin Sobcyzk, he exclaimed that this issue is the result improper host configuration and
pointed to get help from Martin Perina.

Martin Perina is currently investigating the issue. Access to the setup is made available for him to debug and find out the real cause.

This is not only the RHHI problem, but also the HE deployment problem at RHV 4.4

Comment 1 Marcin Sobczyk 2020-03-06 08:27:00 UTC
Satheesaran, please also include the output of 'journalctl --unit libvirtd' that shows the part, where libvirtd is complaining about having misconfigured certificates.

Comment 2 SATHEESARAN 2020-03-09 11:24:01 UTC
(In reply to Marcin Sobczyk from comment #1)
> Satheesaran, please also include the output of 'journalctl --unit libvirtd'
> that shows the part, where libvirtd is complaining about having
> misconfigured certificates.

Thanks Marcin. Output from 'journalctl --unit libvirtd'

-- Logs begin at Thu 2020-03-05 09:42:44 UTC, end at Mon 2020-03-09 11:21:31 UTC. --
Mar 05 09:43:13 localhost.localdomain systemd[1]: Starting Virtualization daemon...
Mar 05 09:43:14 rhsqa-grafton1.lab.eng.blr.redhat.com libvirtd[2385]: libvirt version: 6.0.0, package: 7.module+el8.2.0+5869+c23fe68b (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-02-25-16:32:10, )
Mar 05 09:43:14 rhsqa-grafton1.lab.eng.blr.redhat.com libvirtd[2385]: hostname: localhost.localdomain
Mar 05 09:43:14 rhsqa-grafton1.lab.eng.blr.redhat.com libvirtd[2385]: Cannot read CA certificate '/etc/pki/CA/cacert.pem': No such file or directory
Mar 05 09:43:14 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: libvirtd.service: Main process exited, code=exited, status=6/NOTCONFIGURED
Mar 05 09:43:14 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: libvirtd.service: Failed with result 'exit-code'.
Mar 05 09:43:14 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Failed to start Virtualization daemon.
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: libvirtd.service: Service RestartSec=100ms expired, scheduling restart.
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: libvirtd.service: Scheduled restart job, restart counter is at 1.
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Stopped Virtualization daemon.
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Starting Virtualization daemon...
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com libvirtd[2713]: libvirt version: 6.0.0, package: 7.module+el8.2.0+5869+c23fe68b (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-02-25-16:32:10, )
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com libvirtd[2713]: hostname: rhsqa-grafton1.lab.eng.blr.redhat.com
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com libvirtd[2713]: Cannot read CA certificate '/etc/pki/CA/cacert.pem': No such file or directory
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: libvirtd.service: Main process exited, code=exited, status=6/NOTCONFIGURED
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: libvirtd.service: Failed with result 'exit-code'.
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Failed to start Virtualization daemon.
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: libvirtd.service: Service RestartSec=100ms expired, scheduling restart.
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: libvirtd.service: Scheduled restart job, restart counter is at 2.
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Stopped Virtualization daemon.
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Starting Virtualization daemon...
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com libvirtd[2763]: libvirt version: 6.0.0, package: 7.module+el8.2.0+5869+c23fe68b (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-02-25-16:32:10, )
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com libvirtd[2763]: hostname: rhsqa-grafton1.lab.eng.blr.redhat.com
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com libvirtd[2763]: Cannot read CA certificate '/etc/pki/CA/cacert.pem': No such file or directory
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: libvirtd.service: Main process exited, code=exited, status=6/NOTCONFIGURED
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: libvirtd.service: Failed with result 'exit-code'.
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Failed to start Virtualization daemon.
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: libvirtd.service: Service RestartSec=100ms expired, scheduling restart.
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: libvirtd.service: Scheduled restart job, restart counter is at 3.
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Stopped Virtualization daemon.
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Starting Virtualization daemon...
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com libvirtd[2826]: libvirt version: 6.0.0, package: 7.module+el8.2.0+5869+c23fe68b (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-02-25-16:32:10, )
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com libvirtd[2826]: hostname: rhsqa-grafton1.lab.eng.blr.redhat.com
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com libvirtd[2826]: Cannot read CA certificate '/etc/pki/CA/cacert.pem': No such file or directory
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: libvirtd.service: Main process exited, code=exited, status=6/NOTCONFIGURED
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: libvirtd.service: Failed with result 'exit-code'.
Mar 05 09:43:15 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Failed to start Virtualization daemon.
Mar 05 09:43:16 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: libvirtd.service: Service RestartSec=100ms expired, scheduling restart.
Mar 05 09:43:16 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: libvirtd.service: Scheduled restart job, restart counter is at 4.
Mar 05 09:43:16 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Stopped Virtualization daemon.
Mar 05 09:43:16 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Starting Virtualization daemon...
Mar 05 09:43:16 rhsqa-grafton1.lab.eng.blr.redhat.com libvirtd[2980]: libvirt version: 6.0.0, package: 7.module+el8.2.0+5869+c23fe68b (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-02-25-16:32:10, )
Mar 05 09:43:16 rhsqa-grafton1.lab.eng.blr.redhat.com libvirtd[2980]: hostname: rhsqa-grafton1.lab.eng.blr.redhat.com
Mar 05 09:43:16 rhsqa-grafton1.lab.eng.blr.redhat.com libvirtd[2980]: Cannot read CA certificate '/etc/pki/CA/cacert.pem': No such file or directory
Mar 05 09:43:16 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: libvirtd.service: Main process exited, code=exited, status=6/NOTCONFIGURED
Mar 05 09:43:16 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: libvirtd.service: Failed with result 'exit-code'.
Mar 05 09:43:16 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Failed to start Virtualization daemon.

Comment 3 Michal Skrivanek 2020-03-18 08:47:42 UTC
IMO the libvirt service handling needs to be fixed. HE deploy starts "libvirtd" which is what vdsm is not using anymore later on. Worth a try to stop and disable the service before host redeploy

Comment 4 Michal Skrivanek 2020-03-25 08:42:46 UTC
it doesn't seem to be happening elsewhere. It could be RHHI specific, but it's worth retesting on latest build in any case..

Comment 5 Nikolai Sednev 2020-03-25 08:45:51 UTC
What are reproduction steps please, they're not clear?

Comment 6 SATHEESARAN 2020-03-30 07:14:27 UTC
(In reply to Nikolai Sednev from comment #5)
> What are reproduction steps please, they're not clear?

Hi Nikolai,

The steps are no different than setting up el8.2 Self Hosted-Engine (SHE) on the baremetal machine( el8.2)

In this test RHVH 4.4 is used, with rhvm-appliance.

As per the deployment that I did with the latest rhv-4.4.0-27, I still see that HE deployment is blocked.
Once HE deployment is done successful, this bug could be CLOSED

Comment 8 Nikolai Sednev 2020-03-31 08:56:23 UTC
(In reply to SATHEESARAN from comment #6)
> (In reply to Nikolai Sednev from comment #5)
> > What are reproduction steps please, they're not clear?
> 
> Hi Nikolai,
> 
> The steps are no different than setting up el8.2 Self Hosted-Engine (SHE) on
> the baremetal machine( el8.2)
> 
> In this test RHVH 4.4 is used, with rhvm-appliance.
> 
> As per the deployment that I did with the latest rhv-4.4.0-27, I still see
> that HE deployment is blocked.
> Once HE deployment is done successful, this bug could be CLOSED

Please proceed with the bug, as https://bugzilla.redhat.com/show_bug.cgi?id=1703429 was verified Yesterday. 

Deployment of HE 4.4 on NFS has succeeded.

Tested on host with these components:
rhvm-appliance.x86_64 2:4.4-20200326.0.el8ev
ovirt-hosted-engine-setup-2.4.4-1.el8ev.noarch
ovirt-hosted-engine-ha-2.4.2-1.el8ev.noarch
Red Hat Enterprise Linux release 8.2 Beta (Ootpa)
Linux 4.18.0-193.el8.x86_64 #1 SMP Fri Mar 27 14:35:58 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Engine:
ovirt-engine-setup-base-4.4.0-0.26.master.el8ev.noarch
ovirt-engine-4.4.0-0.26.master.el8ev.noarch
openvswitch2.11-2.11.0-48.el8fdp.x86_64
Linux 4.18.0-192.el8.x86_64 #1 SMP Tue Mar 24 14:06:40 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux release 8.2 Beta (Ootpa)

Comment 10 SATHEESARAN 2020-05-05 13:41:44 UTC
As the Vdsmd is successfully restated and HE Deployment is successfully .
Hence marking this bug as verified 
HE deployment is successful with this version of RHVH build

[root@rhsqa-grafton1 vdsm]# imgbase w
You are on rhvh-4.4.0.18-0.20200417.0+1
[root@rhsqa-grafton1 vdsm]# systemctl status vdsmd
● vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2020-05-05 09:49:44 UTC; 1h 46min ago
 Main PID: 278336 (vdsmd)
    Tasks: 58 (limit: 1648316)
   Memory: 115.6M
   CGroup: /system.slice/vdsmd.service
           ├─278336 /usr/bin/python3 /usr/share/vdsm/vdsmd
           └─328561 /usr/libexec/ioprocess --read-pipe-fd 59 --write-pipe-fd 58 --max-threads 10 --max-queued-requests 10

May 05 09:49:45 rhsqa-grafton1.lab.eng.blr.redhat.com vdsm[278336]: WARN Not ready yet, ignoring event '|virt|VM_status|0a3a305e-0d89-4ce4-93f9-bb3441e27874' args={'0a3a305e-0d89-4ce4-93f9->
May 05 11:05:52 rhsqa-grafton1.lab.eng.blr.redhat.com vdsm[278336]: WARN Failed to retrieve Hosted Engine HA info, is Hosted Engine setup finished?
May 05 11:06:07 rhsqa-grafton1.lab.eng.blr.redhat.com vdsm[278336]: WARN Failed to retrieve Hosted Engine HA info, is Hosted Engine setup finished?
May 05 11:06:22 rhsqa-grafton1.lab.eng.blr.redhat.com vdsm[278336]: WARN Failed to retrieve Hosted Engine HA info, is Hosted Engine setup finished?
May 05 11:06:37 rhsqa-grafton1.lab.eng.blr.redhat.com vdsm[278336]: WARN Failed to retrieve Hosted Engine HA info, is Hosted Engine setup finished?
May 05 11:06:43 rhsqa-grafton1.lab.eng.blr.redhat.com vdsm[278336]: WARN unhandled write event
May 05 11:07:12 rhsqa-grafton1.lab.eng.blr.redhat.com vdsm[278336]: WARN Failed to retrieve Hosted Engine HA info: timed out
May 05 11:07:14 rhsqa-grafton1.lab.eng.blr.redhat.com vdsm[278336]: WARN Failed to retrieve Hosted Engine HA info: timed out
May 05 11:07:14 rhsqa-grafton1.lab.eng.blr.redhat.com vdsm[278336]: WARN Failed to retrieve Hosted Engine HA info: timed out
May 05 11:08:24 rhsqa-grafton1.lab.eng.blr.redhat.com vdsm[278336]: WARN Attempting to add an existing net user: ovirtmgmt/6d150246-22bc-48ac-8a6b-98abea28d4d3

Comment 11 Sandro Bonazzola 2020-05-20 20:00:07 UTC
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020.

Since the problem described in this bug report should be
resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.