Bug 1953313
| Summary: | Failure to hot plug a lease to the VM leading to a libvirt failure, causing the VDSM host to become 'unresponsive' | ||
|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Ilan Zuckerman <izuckerm> |
| Component: | BLL.Storage | Assignee: | Eyal Shenitzky <eshenitz> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Avihai <aefrat> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.4.6.5 | CC: | ahadas, bugs, lsvaty |
| Target Milestone: | ovirt-4.4.6-1 | Keywords: | Automation, AutomationBlocker, Regression, ZStream |
| Target Release: | --- | Flags: | sbonazzo:
ovirt-4.4+
lsvaty: blocker+ |
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-05-12 09:51:16 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1953577 | ||
| Bug Blocks: | |||
Ilan, are you testing with 4.6 cluster and RHEL-8.4 host? (In reply to Eyal Shenitzky from comment #2) > Ilan, are you testing with 4.6 cluster and RHEL-8.4 host? correct. [root@storage-ge13-vdsm1 ~]# cat /etc/redhat-release Red Hat Enterprise Linux release 8.4 (Ootpa) <cluster href="/ovirt-engine/api/clusters/b7228b2e-5e90-48ab-af54-e45f98e81f0c" id="b7228b2e-5e90-48ab-af54-e45f98e81f0c"> <actions> ... </actions> <name>golden_env_mixed_1</name> <description/> <comment/> ... <ballooning_enabled>true</ballooning_enabled> <bios_type>q35_sea_bios</bios_type> <cpu> ... </cpu> <custom_scheduling_policy_properties> ... </custom_scheduling_policy_properties> <error_handling> ... </error_handling> <fencing_policy> ... </fencing_policy> <fips_mode>disabled</fips_mode> <firewall_type>firewalld</firewall_type> <gluster_service>false</gluster_service> <ha_reservation>false</ha_reservation> <ksm> ... </ksm> <log_max_memory_used_threshold>95</log_max_memory_used_threshold> <log_max_memory_used_threshold_type>percentage</log_max_memory_used_threshold_type> <memory_policy> ... </memory_policy> <migration> ... </migration> <required_rng_sources> ... </required_rng_sources> <switch_type>legacy</switch_type> <threads_as_cores>false</threads_as_cores> <trusted_service>false</trusted_service> <tunnel_migration>false</tunnel_migration> <version> <major>4</major> <minor>6</minor> </version> <virt_service>true</virt_service> <vnc_encryption>true</vnc_encryption> <data_center href="/ovirt-engine/api/datacenters/14625613-3e8e-439f-b488-1c65925a0465" id="14625613-3e8e-439f-b488-1c65925a0465"/> <mac_pool href="/ovirt-engine/api/macpools/58ca604b-017d-0374-0220-00000000014e" id="58ca604b-017d-0374-0220-00000000014e"/> <scheduling_policy href="/ovirt-engine/api/schedulingpolicies/b4ed2332-a7ac-4d5f-9596-99a439cb2812" id="b4ed2332-a7ac-4d5f-9596-99a439cb2812"/> </cluster> Arik, are you aware of any changes in this flow? It smells like a libvirt issue to me (RHEL-8.4) BTW, Ilan the vdsm logs doesn't contain the error above. Attached the correct VDSM log. sorry about that. (In reply to Eyal Shenitzky from comment #4) > Arik, are you aware of any changes in this flow? nope > It smells like a libvirt issue to me (RHEL-8.4) yeah, if it happens only when hot-plugging the lease. Ilan, did you test if it works when starting the vm with that lease? Yes, I was able to create a VM with ISCSI lease on the same SD (iscsi_1) and start it.
But there was an ERROR during its creation [1]
[1]:
2021-04-26 11:45:06,835+03 INFO [org.ovirt.engine.core.bll.storage.lease.AddVmLeaseCommand] (default task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] Verify that VM '183d8dce-7aab-4ce8-9b6c-afadc5ff9ecf' lease doesn't exists on storage domain '9d3ad0b6-2eb3-436e-b4da-f1092b0dd599'
2021-04-26 11:45:06,846+03 INFO [org.ovirt.engine.core.bll.storage.lease.GetVmLeaseInfoCommand] (default task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] Running command: GetVmLeaseInfoCommand internal: true. Entities affected : ID: 9d3ad0b6-2eb3-436e-b4da-f1092b0dd599 Type: Storage
2021-04-26 11:45:06,846+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmLeaseInfoVDSCommand] (default task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] START, GetVmLeaseInfoVDSCommand( VmLeaseVDSParameters:{expectedEngineErrors='[NoSuchVmLeaseOnDomain]', storagePoolId='14625613-3e8e-439f-b488-1c65925a0465', ignoreFailoverLimit='false', leaseId='183d8dce-7aab-4ce8-9b6c-afadc5ff9ecf', storageDomainId='9d3ad0b6-2eb3-436e-b4da-f1092b0dd599'}), log id: 7bc8784f
2021-04-26 11:45:06,865+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmLeaseInfoVDSCommand] (default task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] Command 'GetVmLeaseInfoVDSCommand( VmLeaseVDSParameters:{expectedEngineErrors='[NoSuchVmLeaseOnDomain]', storagePoolId='14625613-3e8e-439f-b488-1c65925a0465', ignoreFailoverLimit='false', leaseId='183d8dce-7aab-4ce8-9b6c-afadc5ff9ecf', storageDomainId='9d3ad0b6-2eb3-436e-b4da-f1092b0dd599'})' execution failed: IRSGenericException: IRSErrorException: No such lease: 'lease=183d8dce-7aab-4ce8-9b6c-afadc5ff9ecf'
2021-04-26 11:45:06,865+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmLeaseInfoVDSCommand] (default task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] FINISH, GetVmLeaseInfoVDSCommand, return: , log id: 7bc8784f
2021-04-26 11:45:06,875+03 INFO [org.ovirt.engine.core.bll.storage.lease.AddVmLeaseCommand] (default task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] Creating new VM '183d8dce-7aab-4ce8-9b6c-afadc5ff9ecf' lease, because the VM lease doesn't exists on storage domain '9d3ad0b6-2eb3-436e-b4da-f1092b0dd599'
(In reply to Ilan Zuckerman from comment #8) > Yes, I was able to create a VM with ISCSI lease on the same SD (iscsi_1) and > start it. > But there was an ERROR during its creation [1] > > > [1]: > 2021-04-26 11:45:06,835+03 INFO > [org.ovirt.engine.core.bll.storage.lease.AddVmLeaseCommand] (default > task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] Verify that VM > '183d8dce-7aab-4ce8-9b6c-afadc5ff9ecf' lease doesn't exists on storage > domain '9d3ad0b6-2eb3-436e-b4da-f1092b0dd599' > 2021-04-26 11:45:06,846+03 INFO > [org.ovirt.engine.core.bll.storage.lease.GetVmLeaseInfoCommand] (default > task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] Running command: > GetVmLeaseInfoCommand internal: true. Entities affected : ID: > 9d3ad0b6-2eb3-436e-b4da-f1092b0dd599 Type: Storage > 2021-04-26 11:45:06,846+03 INFO > [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmLeaseInfoVDSCommand] > (default task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] START, > GetVmLeaseInfoVDSCommand( > VmLeaseVDSParameters:{expectedEngineErrors='[NoSuchVmLeaseOnDomain]', > storagePoolId='14625613-3e8e-439f-b488-1c65925a0465', > ignoreFailoverLimit='false', leaseId='183d8dce-7aab-4ce8-9b6c-afadc5ff9ecf', > storageDomainId='9d3ad0b6-2eb3-436e-b4da-f1092b0dd599'}), log id: 7bc8784f > 2021-04-26 11:45:06,865+03 ERROR > [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmLeaseInfoVDSCommand] > (default task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] Command > 'GetVmLeaseInfoVDSCommand( > VmLeaseVDSParameters:{expectedEngineErrors='[NoSuchVmLeaseOnDomain]', > storagePoolId='14625613-3e8e-439f-b488-1c65925a0465', > ignoreFailoverLimit='false', leaseId='183d8dce-7aab-4ce8-9b6c-afadc5ff9ecf', > storageDomainId='9d3ad0b6-2eb3-436e-b4da-f1092b0dd599'})' execution failed: > IRSGenericException: IRSErrorException: No such lease: > 'lease=183d8dce-7aab-4ce8-9b6c-afadc5ff9ecf' > 2021-04-26 11:45:06,865+03 INFO > [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmLeaseInfoVDSCommand] > (default task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] FINISH, > GetVmLeaseInfoVDSCommand, return: , log id: 7bc8784f > 2021-04-26 11:45:06,875+03 INFO > [org.ovirt.engine.core.bll.storage.lease.AddVmLeaseCommand] (default > task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] Creating new VM > '183d8dce-7aab-4ce8-9b6c-afadc5ff9ecf' lease, because the VM lease doesn't > exists on storage domain '9d3ad0b6-2eb3-436e-b4da-f1092b0dd599' This error is expected and it is part of the validation we are doing that the lease doesn't exist. Ilan, please create a libvirt bug on this and attach all the needed info. Keep this bug to track it. ok, so if this error is expected and the vm starts well with a lease we create, I'd ask Libvirt to check why it fails on hot-plug then (In reply to Eyal Shenitzky from comment #10) > Ilan, please create a libvirt bug on this and attach all the needed info. > Keep this bug to track it. Created libvirt bug: https://bugzilla.redhat.com/show_bug.cgi?id=1953577 The dependant bug is VERIFIED, can you check if this can be moved to VERIFIED as well? Verified on: libvirt-libs-7.0.0-14.module+el8.4.0+10886+79296686.x86_64 rhv-release-4.4.6-8-001.noarch |
Description of problem: Hot plugging an ISCSI lease (editing the VM to be highly available) causing libvirt failure which is leading to VDSM hosts 'unresponsive' state. If the customer has only one host with the VM, and and tries this scenario, he would fail to complete the VM reboot (in order for the HA VM to take place). And he would face a failed VDSM host. When invoking the 'edit' of the VM (to make it HA), engine fails with [1] IRSGenericException: IRSErrorException: No such lease: 'lease=95888924-202e-447a-91cc-73f08049aa63' Then, the libvirt is failing with [2]: libvirt.libvirtError: End of file while reading data: Input/output error The the host becomes unresponsive: 2021-04-25 13:05:04,733+03 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-5356) [] EVENT_ID: VDS_HOST_NOT_RESPONDING(9,027), Host host_mixed_1 is not responding. Host cannot be fenced automatically because power management for the host is disabled. Eventually (after few minutes) the VDSM host is able to recover: 2021-04-25 13:10:11,205+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorSe rvice-engineScheduledThreadPool-Thread-77) [5a9d3fd] EVENT_ID: VDS_DETECTED(13), Status of host host_mixed_1 was set to Up. [1]: 2021-04-25 13:04:50,722+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmLeaseInfoVDSCommand] (default task-25) [vms_update_45f57b60-73da-481a] Command 'GetVmLeaseInfoVDSCommand( VmLeaseVDSParameters:{expectedEngineErrors='[NoSuchVmLeaseOnDomain]', storagePoolId='14625613-3e8e-439f-b488-1c65925a0465', ignoreFailoverLimit='false', leaseId='95888924-202e-447a-91cc-73f08049aa63', storageDomainId='9d3ad0b6-2eb3-436e-b4da-f1092b0dd599'})' execution failed: IRSGenericException: IRSErrorException: No such lease: 'lease=95888924-202e-447a-91cc-73f08049aa63' [2] 2021-04-25 13:04:59,253+0300 ERROR (jsonrpc/3) [api] FINISH hotplugLease error=Failed to hotplug lease: {'lease': <lease.Device sd_id=9d3ad0b6-2eb3-436e-b4da-f1092b0dd599, lease_id=95888924-202e-447a-91cc-73f08049aa63, path=/dev/9d3ad0b6-2eb3-436e-b4da-f1092b0dd599/xleases, offset=3145728 at 0x7faa835fbb48>, 'reason': 'End of file while reading data: Input/output error'} (api:131) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 3837, in hotplugLease self._dom.attachDevice(leaseXml) File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f ret = attr(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python3.6/site-packages/libvirt.py", line 682, in attachDevice raise libvirtError('virDomainAttachDevice() failed') libvirt.libvirtError: End of file while reading data: Input/output error During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 124, in method ret = func(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/API.py", line 410, in hotplugLease return self.vm.hotplugLease(lease) File "<decorator-gen-263>", line 2, in hotplugLease File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 101, in method return func(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 3842, in hotplugLease raise exception.HotplugLeaseFailed(reason=str(e), lease=lease) vdsm.common.exception.HotplugLeaseFailed: Failed to hotplug lease: {'lease': <lease.Device sd_id=9d3ad0b6-2eb3-436e-b4da-f1092b0dd599, lease_id=95888924-202e-447a-91cc-73f08049aa63, path=/dev/9d3ad0b6-2eb3-436e-b4da-f1092b0dd599/xleases, offset=3145728 at 0x7faa835fbb48>, 'reason': 'End of file while reading data: Input/output error'} . . . . 2021-04-25 13:04:59,425+0300 ERROR (MainThread) [vds] Exception raised (vdsmd:167) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/vdsmd.py", line 165, in run serve_clients(log) File "/usr/lib/python3.6/site-packages/vdsm/vdsmd.py", line 132, in serve_clients cif.prepareForShutdown() File "/usr/lib/python3.6/site-packages/vdsm/clientIF.py", line 391, in prepareForShutdown secret.clear() File "/usr/lib/python3.6/site-packages/vdsm/virt/secret.py", line 96, in clear for virsecret in con.listAllSecrets(): File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python3.6/site-packages/libvirt.py", line 6074, in listAllSecrets raise libvirtError("virConnectListAllSecrets() failed") libvirt.libvirtError: internal error: client socket is closed Marking this as a 'regression' bug because: It Works on rhv-4.4.5-11 (tested manually + automatic test). FAILS on rhv-release-4.4.6-6 According the report portal, this test was working fine on rhv-4.4.5-10 as well In between those versions we didnt have tier3 test suit run. Version-Release number of selected component (if applicable): rhv-release-4.4.6-6-001.noarch How reproducible: Fully reproducible manually AND with the automation test Steps to Reproduce: 1. Clone a VM out of rhel 8.3 template and run it 2. Edit the running VM, and make it HA VM with ISCSI lease (at this point i made the dump xml for this VM). In the opened dialog box, dont mark the 'apply later' option and click 'ok'. Actual results: Multiple errors on engine + vdsm logs VDSM host fails and becomes 'unresponsive'. Expected results: Shouldnt be any errors or failures VDSM host shouldnt fail Additional info: Attaching engine + vdsm logs + VM dumpxml + libvirtd log