Created attachment 1641912 [details] log files Description of problem: Hosted engine deploys failed as "Failed to connect to guest agent channel" when creating target vm. ovirt-hosted-engine-setup-ansible-create_target_vm.log 2019-12-04 14:22:27,176+0800 ERROR ansible failed {'status': 'FAILED', 'ansible_type': 'task', 'ansible_playbook': '/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml', 'ansible_host': 'localhost', 'ansible_task': 'Check OVF_STORE volume status', 'ansible_result': 'type: <class \'dict\'>\nstr: {\'results\': [{\'cmd\': [\'vdsm-client\', \'Volume\', \'getInfo\', \'storagepoolID=3d184f44-165b-11ea-8453-5254003404b0\', \'storagedomainID=f3489f19-12d7-4215-8025-22c714ced863\', \'imageID=ccf5fc3b-83b8-4b11-b8ef-4bdfb25d943e\', \'volumeID=ea4c4547-2368-4f4a-aca5-8db9c002b787\'], \'stdout\': \'{\\n "apparentsize": ', 'task_duration': 261} 2019-12-04 14:22:27,176+0800 DEBUG ansible on_any args <ansible.executor.task_result.TaskResult object at 0x7f171838eb70> kwargs ignore_errors:None 2019-12-04 14:22:27,178+0800 INFO ansible stats { "ansible_playbook": "/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml", "ansible_playbook_duration": "07:40 Minutes", "ansible_result": "type: <class 'dict'>\nstr: {'localhost': {'ok': 76, 'failures': 1, 'unreachable': 0, 'changed': 21, 'skipped': 10, 'rescued': 0, 'ignored': 0}}", "ansible_type": "finish", "status": "FAILED" } vdsm.log 2019-12-04 14:04:38,489+0800 ERROR (vm/0c11a23b) [virt.vm] (vmId='0c11a23b-bc3a-4b97-9368-296c234e6611') Failed to connect to guest agent channel (vm:2252) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 2250, in _vmDependentInit self.guestAgent.start() File "/usr/lib/python3.6/site-packages/vdsm/virt/guestagent.py", line 257, in start self._prepare_socket() File "/usr/lib/python3.6/site-packages/vdsm/virt/guestagent.py", line 299, in _prepare_socket supervdsm.getProxy().prepareVmChannel(self._socketName) File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, in __call__ return callMethod() File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda> **kwargs) File "<string>", line 2, in prepareVmChannel File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod raise convert_to_error(kind, result) FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/libvirt/qemu/channels/0c11a23b-bc3a-4b97-9368-296c234e6611.com.redhat.rhevm.vdsm' Version-Release number of selected component (if applicable): RHVH-4.4-20191201.7-RHVH-x86_64-dvd1.iso cockpit-packagekit-197.3-1.el8.noarch cockpit-196.3-1.el8.x86_64 cockpit-system-196.3-1.el8.noarch cockpit-dashboard-197.3-1.el8.noarch cockpit-storaged-197.3-1.el8.noarch cockpit-bridge-196.3-1.el8.x86_64 subscription-manager-cockpit-1.25.17-1.el8.noarch cockpit-ws-196.3-1.el8.x86_64 cockpit-ovirt-dashboard-0.14.0-1.el8ev.noarch ovirt-hosted-engine-setup-2.4.0-1.el8ev.noarch ovirt-hosted-engine-ha-2.4.0-1.el8ev.noarch rhvm-appliance-4.4-20191202.1.el8ev.x86_64 How reproducible: 100% Steps to Reproduce: 1. Deploy hosted engine via cockpit UI 2. 3. Actual results: Hosted engine deploys failed as "Failed to connect to guest agent channel" when creating target vm. Expected results: Hosted engine deploys successfully. Additional info:
note that the appliance has no ovirt-guest-agent, only qemu-guest-agent. the test here seems to come from vdsm, if that's the case this needs to move to vdsm.
I am pretty certain(In reply to Wei Wang from comment #0) > Created attachment 1641912 [details] > log files > > Description of problem: > Hosted engine deploys failed as "Failed to connect to guest agent channel" > when creating target vm. > > ovirt-hosted-engine-setup-ansible-create_target_vm.log > 2019-12-04 14:22:27,176+0800 ERROR ansible failed {'status': 'FAILED', > 'ansible_type': 'task', 'ansible_playbook': > '/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml', > 'ansible_host': 'localhost', 'ansible_task': 'Check OVF_STORE volume > status', 'ansible_result': 'type: <class \'dict\'>\nstr: {\'results\': > [{\'cmd\': [\'vdsm-client\', \'Volume\', \'getInfo\', > \'storagepoolID=3d184f44-165b-11ea-8453-5254003404b0\', > \'storagedomainID=f3489f19-12d7-4215-8025-22c714ced863\', > \'imageID=ccf5fc3b-83b8-4b11-b8ef-4bdfb25d943e\', > \'volumeID=ea4c4547-2368-4f4a-aca5-8db9c002b787\'], \'stdout\': \'{\\n > "apparentsize": ', 'task_duration': 261} > 2019-12-04 14:22:27,176+0800 DEBUG ansible on_any args > <ansible.executor.task_result.TaskResult object at 0x7f171838eb70> kwargs > ignore_errors:None > 2019-12-04 14:22:27,178+0800 INFO ansible stats { > "ansible_playbook": > "/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml", > "ansible_playbook_duration": "07:40 Minutes", > "ansible_result": "type: <class 'dict'>\nstr: {'localhost': {'ok': 76, > 'failures': 1, 'unreachable': 0, 'changed': 21, 'skipped': 10, 'rescued': 0, > 'ignored': 0}}", > "ansible_type": "finish", > "status": "FAILED" > } I am pretty certain that this was caused by failure to update OVF store, bug 1779085. Please check engine.log - I have there errors like: 2019-12-18 08:50:44,855+02 ERROR [org.ovirt.engine.core.bll.storage.ovfstore.UploadStreamCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-79) [46e81c97] Command 'org.ovirt.engine.core.bll.storage.ovfstore.UploadStreamCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: response was missing the following header: Task-Id (Failed with error GeneralException and code 100) > > vdsm.log > 2019-12-04 14:04:38,489+0800 ERROR (vm/0c11a23b) [virt.vm] > (vmId='0c11a23b-bc3a-4b97-9368-296c234e6611') Failed to connect to guest > agent channel (vm:2252) > Traceback (most recent call last): > File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 2250, in > _vmDependentInit > self.guestAgent.start() > File "/usr/lib/python3.6/site-packages/vdsm/virt/guestagent.py", line 257, > in start > self._prepare_socket() > File "/usr/lib/python3.6/site-packages/vdsm/virt/guestagent.py", line 299, > in _prepare_socket > supervdsm.getProxy().prepareVmChannel(self._socketName) > File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, > in __call__ > return callMethod() > File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, > in <lambda> > **kwargs) > File "<string>", line 2, in prepareVmChannel > File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in > _callmethod > raise convert_to_error(kind, result) > FileNotFoundError: [Errno 2] No such file or directory: > '/var/lib/libvirt/qemu/channels/0c11a23b-bc3a-4b97-9368-296c234e6611.com. > redhat.rhevm.vdsm' I am pretty certain that this didn't cause the failure, although would be useful to fix to minimize confusion. Changing the summary line accordingly.
Guest agent presence is unrelated, this is about a missing socket file during recovery flow. That shouldn’t happen. But the logs only contain the recovery run, not the initial one woth domain xml so I can’t tell if the VM is created wrongly or something else happened.
Please try to reproduce with current builds. If it still fails, please attach all relevant logs. If unsure, a sosreport should be enough. Thanks.
(In reply to Yedidyah Bar David from comment #4) > Please try to reproduce with current builds. If it still fails, please > attach all relevant logs. If unsure, a sosreport should be enough. Thanks. ok, I will try it after RHVH 4.3.8 tier2 test.
Reproduce this bug with RHVH-4.4-20191205.t.1-RHVH-x86_64-dvd1 rhvm-appliance-4.4-20191204.3.el8ev.x86_64 Attach all the relevant log in attachment.
Created attachment 1647655 [details] /var/log files
Created attachment 1647656 [details] journalctl log
(In reply to Wei Wang from comment #6) > Reproduce this bug with > RHVH-4.4-20191205.t.1-RHVH-x86_64-dvd1 > rhvm-appliance-4.4-20191204.3.el8ev.x86_64 > > Attach all the relevant log in attachment. That's still too old. The error there is as before, failed during 'Check OVF_STORE volume status'. I didn't try a recent RHV build. I did try a few oVirt ones, and the one that worked for me was from 2012-12-22. So please try a later version.
(In reply to Yedidyah Bar David from comment #10) > (In reply to Wei Wang from comment #6) > > Reproduce this bug with > > RHVH-4.4-20191205.t.1-RHVH-x86_64-dvd1 > > rhvm-appliance-4.4-20191204.3.el8ev.x86_64 > > > > Attach all the relevant log in attachment. > > That's still too old. The error there is as before, failed during 'Check > OVF_STORE volume status'. > > I didn't try a recent RHV build. I did try a few oVirt ones, and the one > that worked for me was from 2012-12-22. So please try a later version. Let me test with upstream version now, will give the result later.
Test Version ovirt-node-ng-installer-4.4.0-2019122607.el8.iso ovirt-engine-appliance-4.4-20191226174442.1.el8.x86_64 Test Result: Hosted engine deploy failed since [ INFO ] TASK [ovirt.hosted_engine_setup : Wait for the host to be up] [ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": [{"address": "hp-dl388g9-04.lab.eng.pek2.redhat.com", "affinity_labels": [], "auto_numa_status": "unknown", "certificate": {"organization": "lab.eng.pek2.redhat.com", "subject": "O=lab.eng.pek2.redhat.com,CN=hp-dl388g9-04.lab.eng.pek2.redhat.com"}, "cluster": {"href": "/ovirt-engine/api/clusters/5bce182a-28c0-11ea-a794-5254003404b0", "id": "5bce182a-28c0-11ea-a794-5254003404b0"}, "comment": "", "cpu": {"speed": 0.0, "topology": {}}, "device_passthrough": {"enabled": false}, "devices": [], "external_network_provider_configurations": [], "external_status": "ok", "hardware_information": {"supported_rng_sources": []}, "hooks": [], "href": "/ovirt-engine/api/hosts/9013d38e-71ec-460e-b1af-f510d85a2592", "id": "9013d38e-71ec-460e-b1af-f510d85a2592", "katello_errata": [], "kdump_status": "unknown", "ksm": {"enabled": false}, "max_scheduling_memory": 0, "memory": 0, "name": "hp-dl388g9-04.lab.eng.pek2.redhat.com", "network_attachments": [], "nics": [], "numa_nodes": [], "numa_supported": false, "os": {"custom_kernel_cmdline": ""}, "permissions": [], "port": 54321, "power_management": {"automatic_pm_enabled": true, "enabled": false, "kdump_detection": true, "pm_proxies": []}, "protocol": "stomp", "se_linux": {}, "spm": {"priority": 5, "status": "none"}, "ssh": {"fingerprint": "SHA256:CNEZTFA8dISuv4k96apQsdOPWdOS9YvOltDeVKyEAtU", "port": 22}, "statistics": [], "status": "install_failed", "storage_connection_extensions": [], "summary": {"total": 0}, "tags": [], "transparent_huge_pages": {"enabled": false}, "type": "rhel", "unmanaged_networks": [], "update_available": false, "vgpu_placement": "consolidated"}]}, "attempts": 120, "changed": false, "deprecations": [{"msg": "The 'ovirt_host_facts' module has been renamed to 'ovirt_host_info', and the renamed one no longer returns ansible_facts", "version": "2.13"}]} didi, Please check the log in my machine environment(reserved until Jan, 2, 2020). I will send the info via email.
This is same machine and failure as bug 1770094 comment 11. I guess we do want to continue tracking the 'Failed to connect to guest agent channel' msg, but it's now clear that this isn't what's failing the deploy. So changing subject accordingly and removing TestBlocker.
Moving this to MODIFIED since bug #1785272 is in modified state. We'll move to QE both at the same time and we'll reopen this if this still reproduce while the issue on yum-utils is fixed.
QE will verify this bug until we get the new 4.4 build.
According to https://bugzilla.redhat.com/show_bug.cgi?id=1770094#c17, bug is still reproduced, move the status to "ASSIGNED" Test Vesion: RHVH-4.4-20200205.1-RHVH-x86_64-dvd1.iso cockpit-ovirt-dashboard-0.14.1-1.el8ev.noarch cockpit-bridge-211.1-1.el8.x86_64 cockpit-dashboard-211-1.el8.noarch cockpit-system-211.1-1.el8.noarch cockpit-ws-211.1-1.el8.x86_64 cockpit-211.1-1.el8.x86_64 cockpit-storaged-211-1.el8.noarch rhvm-appliance-4.4-20200123.0.el8ev.x86_64
Why is this a test blocker? What test does it block, other than things related to the guest agent?
(In reply to Yedidyah Bar David from comment #19) > Why is this a test blocker? What test does it block, other than things > related to the guest agent? All hosted engine deployment test cases are blocked by this bug, since hosted engine deploy failed before setting storage. Refer to https://bugzilla.redhat.com/show_bug.cgi?id=1770094#c17. Test version: RHVH-4.4-20200205.1-RHVH-x86_64-dvd1.iso cockpit-ovirt-dashboard-0.14.1-1.el8ev.noarch cockpit-bridge-211.1-1.el8.x86_64 cockpit-dashboard-211-1.el8.noarch cockpit-system-211.1-1.el8.noarch cockpit-ws-211.1-1.el8.x86_64 cockpit-211.1-1.el8.x86_64 cockpit-storaged-211-1.el8.noarch rhvm-appliance-4.4-20200123.0.el8ev.x86_64 Test Result: Hosted engine deploy failed since [ INFO ] TASK [ovirt.hosted_engine_setup : Wait for the host to be up] [ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": [{"address": "hp-dl388g9-04.lab.eng.pek2.redhat.com", "affinity_labels": [], "auto_numa_status": "unknown", "certificate": {"organization": "lab.eng.pek2.redhat.com", "subject": "O=lab.eng.pek2.redhat.com,CN=hp-dl388g9-04.lab.eng.pek2.redhat.com"}, "cluster": {"href": "/ovirt-engine/api/clusters/3674fe3a-48b5-11ea-a66c-5254003404b0", "id": "3674fe3a-48b5-11ea-a66c-5254003404b0"}, "comment": "", "cpu": {"speed": 0.0, "topology": {}}, "device_passthrough": {"enabled": false}, "devices": [], "external_network_provider_configurations": [], "external_status": "ok", "hardware_information": {"supported_rng_sources": []}, "hooks": [], "href": "/ovirt-engine/api/hosts/54280d80-f3aa-4f68-b39c-1dcbbfcc8b45", "id": "54280d80-f3aa-4f68-b39c-1dcbbfcc8b45", "katello_errata": [], "kdump_status": "unknown", "ksm": {"enabled": false}, "max_scheduling_memory": 0, "memory": 0, "name": "hp-dl388g9-04.lab.eng.pek2.redhat.com", "network_attachments": [], "nics": [], "numa_nodes": [], "numa_supported": false, "os": {"custom_kernel_cmdline": ""}, "permissions": [], "port": 54321, "power_management": {"automatic_pm_enabled": true, "enabled": false, "kdump_detection": true, "pm_proxies": []}, "protocol": "stomp", "se_linux": {}, "spm": {"priority": 5, "status": "none"}, "ssh": {"fingerprint": "SHA256:piuq9fnOwos/lbsFaQKgYl7Mz+0rqlWNo/vqhZ39IPY", "port": 22}, "statistics": [], "status": "install_failed", "storage_connection_extensions": [], "summary": {"total": 0}, "tags": [], "transparent_huge_pages": {"enabled": false}, "type": "rhel", "unmanaged_networks": [], "update_available": false, "vgpu_placement": "consolidated"}]}, "attempts": 120, "changed": false, "deprecations": [{"msg": "The 'ovirt_host_facts' module has been renamed to 'ovirt_host_info', and the renamed one no longer returns ansible_facts", "version": "2.13"}]} vdsm.log 2020-02-06 16:01:34,627+0800 ERROR (vm/7b97fbc8) [virt.vm] (vmId='7b97fbc8-4d9b-455c-b14a-afad0e136e5a') Failed to connect to guest agent channel (vm:2232) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 2230, in _vmDependentInit self.guestAgent.start() File "/usr/lib/python3.6/site-packages/vdsm/virt/guestagent.py", line 247, in start self._prepare_socket() File "/usr/lib/python3.6/site-packages/vdsm/virt/guestagent.py", line 289, in _prepare_socket supervdsm.getProxy().prepareVmChannel(self._socketName) File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, in __call__ return callMethod() File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda> **kwargs) File "<string>", line 2, in prepareVmChannel File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod raise convert_to_error(kind, result) FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/libvirt/qemu/channels/7b97fbc8-4d9b-455c-b14a-afad0e136e5a.com.redhat.rhevm.vdsm' Actually, this bug blocks bug 1770094 which occurs after setting storage.
The bug is reproduced with RHVH-UNSIGNED-ISO-4.4-20200212.0-RHVH-x86_64-dvd1.iso and rhvm-appliance-4.4-20200123.0.el8ev.rpm.
Wei, are you sure that hosted-engine deploy fails for you _because_ of current bug? About missing guest agent? I do not think so. I think you should be able to find some other error in the logs that explains why deploy failed. Current bug is only about its current subject: During hosted engine deploy, vdsm log has: "Failed to connect to guest agent channel". We might eventually fix it, not even sure about that, it might be harmless. If you think it's a TestBlocker, please explain why. If you think it causes deploy to fail, please explain why. As I said, if deploy fails, you should find other errors. Thanks! I personally got this error message in vdsm.log also during a successful deploy, so I do not think it's related to a failure to deploy. Some things we might do about _current_ bug: 1. Actually fix it - install the guest agent and make sure vdsm can contact it, thus not log this error. 2. Talk with vdsm people, and if it's not a real error, change it to a warning. 3. Do nothing and close it (and document here that it's harmless).
(In reply to Yedidyah Bar David from comment #22) > Wei, are you sure that hosted-engine deploy fails for you _because_ of > current bug? About missing guest agent? I do not think so. I think you > should be able to find some other error in the logs that explains why deploy > failed. > > Current bug is only about its current subject: During hosted engine deploy, > vdsm log has: "Failed to connect to guest agent channel". > > We might eventually fix it, not even sure about that, it might be harmless. > > If you think it's a TestBlocker, please explain why. > > If you think it causes deploy to fail, please explain why. As I said, if > deploy fails, you should find other errors. > > Thanks! > > I personally got this error message in vdsm.log also during a successful > deploy, so I do not think it's related to a failure to deploy. > > Some things we might do about _current_ bug: > > 1. Actually fix it - install the guest agent and make sure vdsm can contact > it, thus not log this error. > > 2. Talk with vdsm people, and if it's not a real error, change it to a > warning. > > 3. Do nothing and close it (and document here that it's harmless). Ok, I will check the logs again soon, and updating lately.
Test with RHVH-UNSIGNED-ISO-4.4-20200212.0-RHVH-x86_64-dvd1.iso and rhvm-appliance-4.4-20200123.0.el8ev.rpm again: Hosted engine deployment actually failed when "Wait for the host to be up" with the latest two builds and the symptom is same with this original bug. So add the comments #18, #20, #21. Besides, all the hosted engine test cases are failed or blocked since this failure, So move it to TestBlocker. From your comments I can see "vdsm log has: "Failed to connect to guest agent channel" is not the root cause of this hosted engine failure. I've checked all ERROR in the related /var/logs files, I list here: 1) ovirt-hosted-engine-setup-ansible-bootstrap_local_vm-2020118131514-z2hsax.log 2020-02-18 13:39:17,030+0800 ERROR ansible failed { "ansible_host": "localhost", "ansible_playbook": "/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml", "ansible_result": { "_ansible_no_log": false, "ansible_facts": { "ovirt_hosts": [ { "address": "hp-dl388g9-05.lab.eng.pek2.**FILTERED**.com", "affinity_labels": [], "auto_numa_status": "unknown", "certificate": { "organization": "lab.eng.pek2.**FILTERED**.com", "subject": "O=lab.eng.pek2.**FILTERED**.com,CN=hp-dl388g9-05.lab.eng.pek2.**FILTERED**.com" }, "cluster": { "href": "/ovirt-engine/api/clusters/c3957396-520e-11ea-889f-5254005d2164", "id": "c3957396-520e-11ea-889f-5254005d2164" }, "comment": "", "cpu": { "speed": 0.0, "topology": {} }, "device_passthrough": { "enabled": false }, "devices": [], "external_network_provider_configurations": [], "external_status": "ok", "hardware_information": { "supported_rng_sources": [] }, "hooks": [], "href": "/ovirt-engine/api/hosts/71298729-124e-4537-9bc3-af5e501dd484", "id": "71298729-124e-4537-9bc3-af5e501dd484", "katello_errata": [], "kdump_status": "unknown", "ksm": { "enabled": false }, "max_scheduling_memory": 0, "memory": 0, "name": "hp-dl388g9-05.lab.eng.pek2.**FILTERED**.com", "network_attachments": [], "nics": [], "numa_nodes": [], "numa_supported": false, "os": { "custom_kernel_cmdline": "" }, "permissions": [], "port": 54321, "power_management": { "automatic_pm_enabled": true, "enabled": false, "kdump_detection": true, "pm_proxies": [] }, "protocol": "stomp", "se_linux": {}, "spm": { "priority": 5, "status": "none" }, "ssh": { "fingerprint": "SHA256:GdXFF2XDf24xo5RoahSGbjVHcJkMIGgKmamMP0C3yuc", "port": 22 }, "statistics": [], "status": "install_failed", "storage_connection_extensions": [], "summary": { "total": 0 }, "tags": [], "transparent_huge_pages": { "enabled": false }, "type": "rhel", "unmanaged_networks": [], "update_available": false, "vgpu_placement": "consolidated" } ] }, "attempts": 120, "changed": false, "deprecations": [ { "msg": "The 'ovirt_host_facts' module has been renamed to 'ovirt_host_info', and the renamed one no longer returns ansible_facts", "version": "2.13" } ], "invocation": { "module_args": { "all_content": false, "cluster_version": null, "fetch_nested": false, "nested_attributes": [], "pattern": "name=hp-dl388g9-05.lab.eng.pek2.**FILTERED**.com" } } }, "ansible_task": "Wait for the host to be up", "ansible_type": "task", "status": "FAILED", "task_duration": 664 } 2) engine.log 2020-02-18 13:26:47,869+08 ERROR [org.ovirt.engine.core.bll.CpuFlagsManagerHandler] (ServerService Thread Pool -- 40) [] Error getting info for CPU ' ', not in expected format. 2020-02-18 13:26:47,870+08 ERROR [org.ovirt.engine.core.bll.CpuFlagsManagerHandler] (ServerService Thread Pool -- 40) [] Error getting info for CPU ' ', not in expected format. 2020-02-18 13:36:43,061+08 ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1) [6a42b7c1] Host installation failed for host '71298729-124e-4537-9bc3-af5e501dd484', 'hp-dl388g9-05.lab.eng.pek2.redhat.com': Task Ensure Open vSwitch is started failed to execute: 2020-02-18 13:36:43,066+08 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-engine-Thread-1) [6a42b7c1] START, SetVdsStatusVDSCommand(HostName = hp-dl388g9-05.lab.eng.pek2.redhat.com, SetVdsStatusVDSCommandParameters:{hostId='71298729-124e-4537-9bc3-af5e501dd484', status='InstallFailed', nonOperationalReason='NONE', stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 1e580687 2020-02-18 13:36:43,075+08 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-engine-Thread-1) [6a42b7c1] FINISH, SetVdsStatusVDSCommand, return: , log id: 1e580687 2020-02-18 13:36:43,101+08 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-1) [6a42b7c1] EVENT_ID: VDS_INSTALL_FAILED(505), Host hp-dl388g9-05.lab.eng.pek2.redhat.com installation failed. Task Ensure Open vSwitch is started failed to execute: . 2020-02-18 13:36:43,121+08 INFO [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1) [6a42b7c1] Lock freed to object 'EngineLock:{exclusiveLocks='[71298729-124e-4537-9bc3-af5e501dd484=VDS]', sharedLocks=''}' 3) vdsm.log 2020-02-18 13:35:37,251+0800 ERROR (vm/03110ca5) [virt.vm] (vmId='03110ca5-125b-41ef-b16b-abf13ec7b6f6') Failed to connect to guest agent channel (vm:2238) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 2236, in _vmDependentInit self.guestAgent.start() File "/usr/lib/python3.6/site-packages/vdsm/virt/guestagent.py", line 247, in start self._prepare_socket() File "/usr/lib/python3.6/site-packages/vdsm/virt/guestagent.py", line 289, in _prepare_socket supervdsm.getProxy().prepareVmChannel(self._socketName) File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, in __call__ return callMethod() File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda> **kwargs) File "<string>", line 2, in prepareVmChannel File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod raise convert_to_error(kind, result) FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/libvirt/qemu/channels/03110ca5-125b-41ef-b16b-abf13ec7b6f6.com.redhat.rhevm.vdsm' 4) supervdsm.log MainProcess|vm/03110ca5::DEBUG::2020-02-18 13:35:37,250::supervdsm_server::93::SuperVdsm.ServerCallback::(wrapper) call prepareVmChannel with ('/var/lib/libvirt/qemu/channels/03110ca5-125b-41ef-b16b-abf13ec7b6f6.com.redhat.rhevm.vdsm',) {} MainProcess|vm/03110ca5::ERROR::2020-02-18 13:35:37,250::supervdsm_server::97::SuperVdsm.ServerCallback::(wrapper) Error in prepareVmChannel Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/supervdsm_server.py", line 95, in wrapper res = func(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/supervdsm_api/virt.py", line 39, in prepareVmChannel fsinfo = os.stat(socketFile) FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/libvirt/qemu/channels/03110ca5-125b-41ef-b16b-abf13ec7b6f6.com.redhat.rhevm.vdsm' Maybe 1) and 2) have the key point of this issue, we can see host installed failed in engine. And I doubt this is related to rhvm-appliance, since rhvh build has updated but the rhvm-appliance is still the old one, I got the same failure. DEV has anything update with rhvm-appliance for 4.4? But I am not familiar with debug. Could devel can help to filter the true cause of this failure? So that I can report a new one to trace it. Thanks.
Wei, thanks a lot for trying again and report. I see that you too now understand that the deploy failure is not due to the guest agent but some other reason. We (dev) are currently working on moving the engine to el8, and things are a bit unstable. Hopefully we'll finish soon and then you can resume trying to deploy hosted-engine.
(In reply to Yedidyah Bar David from comment #25) > Wei, thanks a lot for trying again and report. I see that you too now > understand that the deploy failure is not due to the guest agent but some > other reason. > > We (dev) are currently working on moving the engine to el8, and things are a > bit unstable. Hopefully we'll finish soon and then you can resume trying to > deploy hosted-engine. ok, this must be traced by bug? Could you provide the bug ID? If necessary, I need this id to mark test case, thanks.
(In reply to Wei Wang from comment #26) > ok, this must be traced by bug? Could you provide the bug ID? If necessary, > I need this id to mark test case, thanks. This is tracked in bug 1701491.
Remove TestBlocker according to above comments.
Can you please lower the priority on this as it is not an urgent bug
How about you change the priority back to urgent and fix the issue. This is causing installs of RHV oVirt engine to fail following your install instructions. The only solution is to go back the previous version and get a clean install without errors. All of my servers are wanting to update and I have new servers to setup. It would be nice if I could trust the new versions that were released and not have to vet and test every patch and release that Red Hat publishes. This need to be the highest priority.
(In reply to Kenneth Weade from comment #31) > How about you change the priority back to urgent and fix the issue. This is > causing installs of RHV oVirt engine to fail following your install > instructions. The only solution is to go back the previous version and get > a clean install without errors. All of my servers are wanting to update and > I have new servers to setup. It would be nice if I could trust the new > versions that were released and not have to vet and test every patch and > release that Red Hat publishes. This need to be the highest priority. Hi Kenneth, If you can look at comment #22 and see the this specific bug is not failing the installation. If your installation fails you are more then welcome to: 1. open a bug and add the system logs and we will be happy to have a look. 2. start a thread on ovirt-users mail list
descresing prio/sev, this is not failing anything...
From gerrit review: > well, yeah, why do we create that channel anyway? > it's a slightly bigger change, so here i would indeed just changed to INFO so it's not too verbose, and open bug on clean removal of ovirt ga socket in engine
The latest build RHVH-4.4-20200722.1-RHVH-x86_64-dvd1.iso include vdsm-4.40.22-1.el8ev.x86_64. QE will wait for the build which include vdsm-4.40.24 to do verification.
Test with RHVH-4.4-20200812.4-RHVH-x86_64-dvd1.iso (Include vdsm-4.40.25-1.el8ev.x86_64) Bug is fixed, move the status to "VERIFIED"
This bugzilla is included in oVirt 4.4.2 release, published on September 17th 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.2 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.