Bug 1770094 - Hosted engine deploy failed with libvirt.libvirtError: unable to connect to server: Connection refused
Summary: Hosted engine deploy failed with libvirt.libvirtError: unable to connect to s...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: cockpit-ovirt
Classification: oVirt
Component: Hosted Engine
Version: 0.13.8
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ovirt-4.4.0
: ---
Assignee: Yedidyah Bar David
QA Contact: Wei Wang
URL:
Whiteboard:
Depends On: 1785272
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-08 06:18 UTC by Wei Wang
Modified: 2020-05-20 20:02 UTC (History)
14 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2020-05-20 20:02:44 UTC
oVirt Team: Integration
Embargoed:
mtessun: ovirt-4.4+
mtessun: planning_ack+
sbonazzo: devel_ack+
cshao: testing_ack+


Attachments (Terms of Use)
connection refused picture (97.76 KB, image/png)
2019-11-08 06:18 UTC, Wei Wang
no flags Details
connection refused logs (936.51 KB, application/gzip)
2019-11-08 06:19 UTC, Wei Wang
no flags Details
/var/log files (2.04 MB, application/gzip)
2019-12-26 07:38 UTC, Wei Wang
no flags Details
journalctl log (1.31 MB, text/plain)
2019-12-26 07:39 UTC, Wei Wang
no flags Details

Description Wei Wang 2019-11-08 06:18:57 UTC
Created attachment 1633902 [details]
connection refused picture

Description of problem:
Hosted engine deploy failed with libvirt.libvirtError: unable to connect to server: Connection refused.

2019-11-08 00:11:59,001-0500 ERROR ansible failed {'status': 'FAILED', 'ansible_type': 'task', 'ansible_playbook': '/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml', 'ansible_host': 'localhost', 'ansible_task': 'Shutdown local VM', 'ansible_result': 'type: <class \'dict\'>\nstr: {\'msg\': "unable to connect to server at \'hp-dl388g9-04.lab.eng.pek2.**FILTERED**.com:16514\': Connection refused", \'exception\': \'Traceback (most recent call last):\\n  File "/tmp/ansible_virt_payload_83pxn_ex/__main__.py", line 593, in main\\n    rc, result = core(module)\\n  File "/tmp/ansible_virt_payload_8', 'task_duration': 2}
2019-11-08 00:11:59,001-0500 DEBUG ansible on_any args <ansible.executor.task_result.TaskResult object at 0x7fd064ac1908> kwargs ignore_errors:None
2019-11-08 00:11:59,003-0500 INFO ansible stats {
    "ansible_playbook": "/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml",
    "ansible_playbook_duration": "03:20 Minutes",
    "ansible_result": "type: <class 'dict'>\nstr: {'localhost': {'ok': 87, 'failures': 1, 'unreachable': 0, 'changed': 24, 'skipped': 10, 'rescued': 0, 'ignored': 0}}",
    "ansible_type": "finish",
    "status": "FAILED"
}


Version-Release number of selected component (if applicable):
rhvh-4.4.0.8-0.20191107.0
cockpit-packagekit-197.3-1.el8.noarch
cockpit-196.3-1.el8.x86_64
cockpit-system-196.3-1.el8.noarch
cockpit-dashboard-197.3-1.el8.noarch
cockpit-storaged-197.3-1.el8.noarch
cockpit-bridge-196.3-1.el8.x86_64
subscription-manager-cockpit-1.25.17-1.el8.noarch
cockpit-ovirt-dashboard-0.13.8-1.el8ev.noarch
cockpit-ws-196.3-1.el8.x86_64
ovirt-hosted-engine-setup-2.4.0-0.1.master.20191104160243.git0c51343.el8ev.noarch
ovirt-hosted-engine-ha-2.4.0-0.0.master.git633a1db.el8ev.noarch
rhvm-appliance-4.4-20190823.0.el8.x86_64


How reproducible:
100%

Steps to Reproduce:
1.Clean install rhvh-4.4.0.8-0.20191107.0
2.Deploy hosted engine via cockpit UI
3.

Actual results:
Hosted engine deploy failed with libvirt.libvirtError: unable to connect to server: Connection refused.

Expected results:
Hosted engine deploy successful without any error

Additional info:

Comment 1 Wei Wang 2019-11-08 06:19:52 UTC
Created attachment 1633903 [details]
connection refused logs

Comment 2 Yedidyah Bar David 2019-11-19 10:28:14 UTC
Isn't this a duplicate of bug 1766556?

Comment 3 Wei Wang 2019-11-20 01:53:40 UTC
(In reply to Yedidyah Bar David from comment #2)
> Isn't this a duplicate of bug 1766556?

I feel they are different ones. They occurs in different deploy stages. This bug occurs after creating local VM, when copy local vm to engine vm with added storage. But the bug 1766556 occurs when Waiting for the host to be up during local vm creation.

Comment 4 Yedidyah Bar David 2019-12-22 11:34:43 UTC
I now spent some time looking at the attached logs, and fail to find what shuts down libvirtd at this point. If this is reproducible, please attach all of /var/log and the journal logs. Thanks.

For the record, I now successfully finished a hosted-engine deploy using an up-to-date centos8 host (with nightly master snapshot) and ovirt-engine-appliance-4.4-20191221175026.1.el8.x86_64.

Comment 5 Wei Wang 2019-12-24 02:04:02 UTC
(In reply to Yedidyah Bar David from comment #4)
> I now spent some time looking at the attached logs, and fail to find what
> shuts down libvirtd at this point. If this is reproducible, please attach
> all of /var/log and the journal logs. Thanks.
> 
> For the record, I now successfully finished a hosted-engine deploy using an
> up-to-date centos8 host (with nightly master snapshot) and
> ovirt-engine-appliance-4.4-20191221175026.1.el8.x86_64.

I will try them after I finish RHVH 4.3.8 Tier2 test recent days.

Comment 7 Wei Wang 2019-12-26 07:38:26 UTC
Created attachment 1647664 [details]
/var/log files

Comment 8 Wei Wang 2019-12-26 07:39:05 UTC
Created attachment 1647665 [details]
journalctl log

Comment 9 Yedidyah Bar David 2019-12-26 09:45:42 UTC
According to the attached dnf.rpm.log, you have two-months-old packages:

2019-11-07T16:51:29Z SUBDEBUG Installed: ovirt-ansible-hosted-engine-setup-1.0.31-1.el8ev.noarch
2019-11-07T16:53:33Z SUBDEBUG Installed: ovirt-hosted-engine-setup-2.4.0-0.1.master.20191104160243.git0c51343.el8ev.noarch

Please try again with more recent packages, built after 2019-12-22. Thanks.

Comment 10 Wei Wang 2019-12-27 08:25:17 UTC
Test Version (The latest ovirt version)
ovirt-node-ng-installer-4.4.0-2019122607.el8.iso
ovirt-engine-appliance-4.4-20191226174442.1.el8.x86_64

Test Result:
Hosted engine deploy failed since 

[ INFO ] TASK [ovirt.hosted_engine_setup : Wait for the host to be up]
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": [{"address": "hp-dl388g9-04.lab.eng.pek2.redhat.com", "affinity_labels": [], "auto_numa_status": "unknown", "certificate": {"organization": "lab.eng.pek2.redhat.com", "subject": "O=lab.eng.pek2.redhat.com,CN=hp-dl388g9-04.lab.eng.pek2.redhat.com"}, "cluster": {"href": "/ovirt-engine/api/clusters/5bce182a-28c0-11ea-a794-5254003404b0", "id": "5bce182a-28c0-11ea-a794-5254003404b0"}, "comment": "", "cpu": {"speed": 0.0, "topology": {}}, "device_passthrough": {"enabled": false}, "devices": [], "external_network_provider_configurations": [], "external_status": "ok", "hardware_information": {"supported_rng_sources": []}, "hooks": [], "href": "/ovirt-engine/api/hosts/9013d38e-71ec-460e-b1af-f510d85a2592", "id": "9013d38e-71ec-460e-b1af-f510d85a2592", "katello_errata": [], "kdump_status": "unknown", "ksm": {"enabled": false}, "max_scheduling_memory": 0, "memory": 0, "name": "hp-dl388g9-04.lab.eng.pek2.redhat.com", "network_attachments": [], "nics": [], "numa_nodes": [], "numa_supported": false, "os": {"custom_kernel_cmdline": ""}, "permissions": [], "port": 54321, "power_management": {"automatic_pm_enabled": true, "enabled": false, "kdump_detection": true, "pm_proxies": []}, "protocol": "stomp", "se_linux": {}, "spm": {"priority": 5, "status": "none"}, "ssh": {"fingerprint": "SHA256:CNEZTFA8dISuv4k96apQsdOPWdOS9YvOltDeVKyEAtU", "port": 22}, "statistics": [], "status": "install_failed", "storage_connection_extensions": [], "summary": {"total": 0}, "tags": [], "transparent_huge_pages": {"enabled": false}, "type": "rhel", "unmanaged_networks": [], "update_available": false, "vgpu_placement": "consolidated"}]}, "attempts": 120, "changed": false, "deprecations": [{"msg": "The 'ovirt_host_facts' module has been renamed to 'ovirt_host_info', and the renamed one no longer returns ansible_facts", "version": "2.13"}]}

didi,
Please check the log in my machine environment(reserved until Jan, 2, 2020). I will send the info via email.

Comment 11 Yedidyah Bar David 2019-12-29 08:26:38 UTC
Thanks. I looked at the machine. It indeed failed in host deploy, as noted in previous comment. host-deploy log (in /var/log/ovirt-hosted-engine-setup/engine-logs-2019-12-27T15:43:34Z/ovirt-engine/host-deploy/ovirt-host-deploy-ansible-20191227235344-hp-dl388g9-04.lab.eng.pek2.redhat.com-3ad05d87.log ) has:

2019-12-27 23:54:01 CST - TASK [ovirt-host-deploy-facts : Install yum-utils] *****************************                                                                                   
2019-12-27 23:54:10 CST - fatal: [hp-dl388g9-04.lab.eng.pek2.redhat.com]: FAILED! => {"changed": false, "failures": ["No package yum-utils available."], "msg": "Failed to install some of the specified packages", "rc": 1, "results": []}

which looks like bug 1785272. So, how to continue? Some options:

1. Close as duplicate of 1785272. But note that 1785272 only affects node, so you can also:

2. Try again on EL8 (not node)

3. It's not clear if current bug is just plain 'hosted-engine deploy' or some more specific flow. If more specific, just keep it open, and depend on 1785272.

Thanks again.

Comment 12 Wei Wang 2020-01-02 03:35:45 UTC
(In reply to Yedidyah Bar David from comment #11)
> Thanks. I looked at the machine. It indeed failed in host deploy, as noted
> in previous comment. host-deploy log (in
> /var/log/ovirt-hosted-engine-setup/engine-logs-2019-12-27T15:43:34Z/ovirt-
> engine/host-deploy/ovirt-host-deploy-ansible-20191227235344-hp-dl388g9-04.
> lab.eng.pek2.redhat.com-3ad05d87.log ) has:
> 
> 2019-12-27 23:54:01 CST - TASK [ovirt-host-deploy-facts : Install yum-utils]
> *****************************                                               
> 
> 2019-12-27 23:54:10 CST - fatal: [hp-dl388g9-04.lab.eng.pek2.redhat.com]:
> FAILED! => {"changed": false, "failures": ["No package yum-utils
> available."], "msg": "Failed to install some of the specified packages",
> "rc": 1, "results": []}
> 
> which looks like bug 1785272. So, how to continue? Some options:
> 
> 1. Close as duplicate of 1785272. But note that 1785272 only affects node,
> so you can also:
> 
> 2. Try again on EL8 (not node)
> 
Sorry, do you mean I using RHEL8-host instead of ovirt-node to retry it?

> 3. It's not clear if current bug is just plain 'hosted-engine deploy' or
> some more specific flow. If more specific, just keep it open, and depend on
> 1785272.
> 
I think it is plain, no more specific operation during setting up Hosted-engine. It is failed when node try to be up after registering to the engine vm, so I think it is the same with BZ-1785272. But the original bug occurred after node up, when creating target VM which is not reproduced with the latest ovirt version.

> Thanks again.

Comment 13 Yedidyah Bar David 2020-01-02 07:16:02 UTC
(In reply to Wei Wang from comment #12)
> (In reply to Yedidyah Bar David from comment #11)
> > which looks like bug 1785272. So, how to continue? Some options:
> > 
> > 1. Close as duplicate of 1785272. But note that 1785272 only affects node,
> > so you can also:
> > 
> > 2. Try again on EL8 (not node)
> > 
> Sorry, do you mean I using RHEL8-host instead of ovirt-node to retry it?

Yes.

> 
> > 3. It's not clear if current bug is just plain 'hosted-engine deploy' or
> > some more specific flow. If more specific, just keep it open, and depend on
> > 1785272.
> > 
> I think it is plain, no more specific operation during setting up
> Hosted-engine. It is failed when node try to be up after registering to the
> engine vm, so I think it is the same with BZ-1785272. But the original bug
> occurred after node up, when creating target VM which is not reproduced with
> the latest ovirt version.

So I'd just close as duplicate. If either node or el8 fail after that one is
fixed, please open a new bug. Thanks!

Comment 14 Sandro Bonazzola 2020-01-08 08:18:06 UTC
Marking this bug as dependent on bug #1785272 and moving this to MODIFIED since that bug is in modified state.
We'll move to QE both at the same time and we'll reopen this if this still reproduce while the issue on yum-utils is fixed.

Comment 15 Sandro Bonazzola 2020-01-14 08:44:38 UTC
bug #1785272 moved to QE, moving this as well

Comment 16 Wei Wang 2020-01-14 09:47:25 UTC
QE will  verify this bug until getting the new 4.4 build.

Comment 17 Wei Wang 2020-02-06 08:21:45 UTC
Test version:
RHVH-4.4-20200205.1-RHVH-x86_64-dvd1.iso
cockpit-ovirt-dashboard-0.14.1-1.el8ev.noarch
cockpit-bridge-211.1-1.el8.x86_64
cockpit-dashboard-211-1.el8.noarch
cockpit-system-211.1-1.el8.noarch
cockpit-ws-211.1-1.el8.x86_64
cockpit-211.1-1.el8.x86_64
cockpit-storaged-211-1.el8.noarch
rhvm-appliance-4.4-20200123.0.el8ev.x86_64

Test Result:
Hosted engine deploy failed since

[ INFO ] TASK [ovirt.hosted_engine_setup : Wait for the host to be up]
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": [{"address": "hp-dl388g9-04.lab.eng.pek2.redhat.com", "affinity_labels": [], "auto_numa_status": "unknown", "certificate": {"organization": "lab.eng.pek2.redhat.com", "subject": "O=lab.eng.pek2.redhat.com,CN=hp-dl388g9-04.lab.eng.pek2.redhat.com"}, "cluster": {"href": "/ovirt-engine/api/clusters/3674fe3a-48b5-11ea-a66c-5254003404b0", "id": "3674fe3a-48b5-11ea-a66c-5254003404b0"}, "comment": "", "cpu": {"speed": 0.0, "topology": {}}, "device_passthrough": {"enabled": false}, "devices": [], "external_network_provider_configurations": [], "external_status": "ok", "hardware_information": {"supported_rng_sources": []}, "hooks": [], "href": "/ovirt-engine/api/hosts/54280d80-f3aa-4f68-b39c-1dcbbfcc8b45", "id": "54280d80-f3aa-4f68-b39c-1dcbbfcc8b45", "katello_errata": [], "kdump_status": "unknown", "ksm": {"enabled": false}, "max_scheduling_memory": 0, "memory": 0, "name": "hp-dl388g9-04.lab.eng.pek2.redhat.com", "network_attachments": [], "nics": [], "numa_nodes": [], "numa_supported": false, "os": {"custom_kernel_cmdline": ""}, "permissions": [], "port": 54321, "power_management": {"automatic_pm_enabled": true, "enabled": false, "kdump_detection": true, "pm_proxies": []}, "protocol": "stomp", "se_linux": {}, "spm": {"priority": 5, "status": "none"}, "ssh": {"fingerprint": "SHA256:piuq9fnOwos/lbsFaQKgYl7Mz+0rqlWNo/vqhZ39IPY", "port": 22}, "statistics": [], "status": "install_failed", "storage_connection_extensions": [], "summary": {"total": 0}, "tags": [], "transparent_huge_pages": {"enabled": false}, "type": "rhel", "unmanaged_networks": [], "update_available": false, "vgpu_placement": "consolidated"}]}, "attempts": 120, "changed": false, "deprecations": [{"msg": "The 'ovirt_host_facts' module has been renamed to 'ovirt_host_info', and the renamed one no longer returns ansible_facts", "version": "2.13"}]}

vdsm.log
2020-02-06 16:01:34,627+0800 ERROR (vm/7b97fbc8) [virt.vm] (vmId='7b97fbc8-4d9b-455c-b14a-afad0e136e5a') Failed to connect to guest agent channel (vm:2232)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 2230, in _vmDependentInit
    self.guestAgent.start()
  File "/usr/lib/python3.6/site-packages/vdsm/virt/guestagent.py", line 247, in start
    self._prepare_socket()
  File "/usr/lib/python3.6/site-packages/vdsm/virt/guestagent.py", line 289, in _prepare_socket
    supervdsm.getProxy().prepareVmChannel(self._socketName)
  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, in __call__
    return callMethod()
  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda>
    **kwargs)
  File "<string>", line 2, in prepareVmChannel
  File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod
    raise convert_to_error(kind, result)
FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/libvirt/qemu/channels/7b97fbc8-4d9b-455c-b14a-afad0e136e5a.com.redhat.rhevm.vdsm'

QE move bug status to "ASSIGNED"

Comment 18 Sandro Bonazzola 2020-03-03 08:49:07 UTC
Looks like the original issue about libvirt.libvirtError: unable to connect to server: Connection refused
is fixed. The new issue in comment #17 should be handled in a different bug, can you please open it?
Moving this back to QE for the libvirt.libvirtError: unable to connect to server: Connection refused
issue verification.

Comment 19 Wei Wang 2020-03-03 09:08:36 UTC
With the latest 4.4 build, this issue's verification is blocked by bug 1808253 and 1701491. QE will try to verify this bug after 1808253 and 1701491 are fixed.

Comment 20 cshao 2020-03-09 08:12:19 UTC
Original issue has gone, so verify this bug.

Comment 21 Sandro Bonazzola 2020-05-20 20:02:44 UTC
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020.

Since the problem described in this bug report should be
resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.