Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1953313

Summary: Failure to hot plug a lease to the VM leading to a libvirt failure, causing the VDSM host to become 'unresponsive'
Product: [oVirt] ovirt-engine Reporter: Ilan Zuckerman <izuckerm>
Component: BLL.StorageAssignee: Eyal Shenitzky <eshenitz>
Status: CLOSED CURRENTRELEASE QA Contact: Avihai <aefrat>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.4.6.5CC: ahadas, bugs, lsvaty
Target Milestone: ovirt-4.4.6-1Keywords: Automation, AutomationBlocker, Regression, ZStream
Target Release: ---Flags: sbonazzo: ovirt-4.4+
lsvaty: blocker+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-12 09:51:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1953577    
Bug Blocks:    

Description Ilan Zuckerman 2021-04-25 11:23:02 UTC
Description of problem:

Hot plugging an ISCSI lease (editing the VM to be highly available) causing libvirt failure which is leading to VDSM hosts 'unresponsive' state.
If the customer has only one host with the VM, and and tries this scenario, he would fail to complete the VM reboot (in order for the HA VM to take place). And he would face a failed VDSM host.

When invoking the 'edit' of the VM (to make it HA), engine fails with [1] IRSGenericException: IRSErrorException: No such lease: 'lease=95888924-202e-447a-91cc-73f08049aa63'

Then, the libvirt is failing with [2]:
libvirt.libvirtError: End of file while reading data: Input/output error

The the host becomes unresponsive:
2021-04-25 13:05:04,733+03 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-5356) [] EVENT_ID: VDS_HOST_NOT_RESPONDING(9,027), Host host_mixed_1 is not responding. Host cannot be fenced automatically because power management for the host is disabled.

Eventually (after few minutes) the VDSM host is able to recover:
2021-04-25 13:10:11,205+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorSe
rvice-engineScheduledThreadPool-Thread-77) [5a9d3fd] EVENT_ID: VDS_DETECTED(13), Status of host host_mixed_1 was set to Up.


[1]:
2021-04-25 13:04:50,722+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmLeaseInfoVDSCommand] (default task-25) [vms_update_45f57b60-73da-481a] Command 'GetVmLeaseInfoVDSCommand( VmLeaseVDSParameters:{expectedEngineErrors='[NoSuchVmLeaseOnDomain]', storagePoolId='14625613-3e8e-439f-b488-1c65925a0465', ignoreFailoverLimit='false', leaseId='95888924-202e-447a-91cc-73f08049aa63', storageDomainId='9d3ad0b6-2eb3-436e-b4da-f1092b0dd599'})' execution failed: IRSGenericException: IRSErrorException: No such lease: 'lease=95888924-202e-447a-91cc-73f08049aa63'


[2]
2021-04-25 13:04:59,253+0300 ERROR (jsonrpc/3) [api] FINISH hotplugLease error=Failed to hotplug lease: {'lease': <lease.Device sd_id=9d3ad0b6-2eb3-436e-b4da-f1092b0dd599, lease_id=95888924-202e-447a-91cc-73f08049aa63, path=/dev/9d3ad0b6-2eb3-436e-b4da-f1092b0dd599/xleases, offset=3145728 at 0x7faa835fbb48>, 'reason': 'End of file while reading data: Input/output error'} (api:131)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 3837, in hotplugLease
    self._dom.attachDevice(leaseXml)
  File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python3.6/site-packages/libvirt.py", line 682, in attachDevice
    raise libvirtError('virDomainAttachDevice() failed')
libvirt.libvirtError: End of file while reading data: Input/output error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 124, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/API.py", line 410, in hotplugLease
    return self.vm.hotplugLease(lease)
  File "<decorator-gen-263>", line 2, in hotplugLease
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 101, in method
    return func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 3842, in hotplugLease
    raise exception.HotplugLeaseFailed(reason=str(e), lease=lease)
vdsm.common.exception.HotplugLeaseFailed: Failed to hotplug lease: {'lease': <lease.Device sd_id=9d3ad0b6-2eb3-436e-b4da-f1092b0dd599, lease_id=95888924-202e-447a-91cc-73f08049aa63, path=/dev/9d3ad0b6-2eb3-436e-b4da-f1092b0dd599/xleases, offset=3145728 at 0x7faa835fbb48>, 'reason': 'End of file while reading data: Input/output error'}

.
.
.
.
2021-04-25 13:04:59,425+0300 ERROR (MainThread) [vds] Exception raised (vdsmd:167)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/vdsmd.py", line 165, in run
    serve_clients(log)
  File "/usr/lib/python3.6/site-packages/vdsm/vdsmd.py", line 132, in serve_clients
    cif.prepareForShutdown()
  File "/usr/lib/python3.6/site-packages/vdsm/clientIF.py", line 391, in prepareForShutdown
    secret.clear()
  File "/usr/lib/python3.6/site-packages/vdsm/virt/secret.py", line 96, in clear
    for virsecret in con.listAllSecrets():
  File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python3.6/site-packages/libvirt.py", line 6074, in listAllSecrets
    raise libvirtError("virConnectListAllSecrets() failed")
libvirt.libvirtError: internal error: client socket is closed




Marking this as a 'regression' bug because:
It Works on rhv-4.4.5-11 (tested manually + automatic test).
FAILS on rhv-release-4.4.6-6
According the report portal, this test was working fine on rhv-4.4.5-10 as well
In between those versions we didnt have tier3 test suit run.


Version-Release number of selected component (if applicable):
rhv-release-4.4.6-6-001.noarch


How reproducible:
Fully reproducible manually AND with the automation test

Steps to Reproduce:

1. Clone a VM out of rhel 8.3 template and run it
2. Edit the running VM, and make it HA VM with ISCSI lease (at this point i made the dump xml for this VM). In the opened dialog box, dont mark the 'apply later' option and click 'ok'.

Actual results:
Multiple errors on engine + vdsm logs
VDSM host fails and becomes 'unresponsive'.


Expected results:
Shouldnt be any errors or failures
VDSM host shouldnt fail


Additional info:
Attaching engine + vdsm logs + VM dumpxml + libvirtd log

Comment 2 Eyal Shenitzky 2021-04-25 11:34:55 UTC
Ilan, are you testing with 4.6 cluster and RHEL-8.4 host?

Comment 3 Ilan Zuckerman 2021-04-25 12:58:11 UTC
(In reply to Eyal Shenitzky from comment #2)
> Ilan, are you testing with 4.6 cluster and RHEL-8.4 host?

correct.

[root@storage-ge13-vdsm1 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.4 (Ootpa)


<cluster href="/ovirt-engine/api/clusters/b7228b2e-5e90-48ab-af54-e45f98e81f0c" id="b7228b2e-5e90-48ab-af54-e45f98e81f0c">
<actions>
...
</actions>
<name>golden_env_mixed_1</name>
<description/>
<comment/>
...
<ballooning_enabled>true</ballooning_enabled>
<bios_type>q35_sea_bios</bios_type>
<cpu>
...
</cpu>
<custom_scheduling_policy_properties>
...
</custom_scheduling_policy_properties>
<error_handling>
...
</error_handling>
<fencing_policy>
...
</fencing_policy>
<fips_mode>disabled</fips_mode>
<firewall_type>firewalld</firewall_type>
<gluster_service>false</gluster_service>
<ha_reservation>false</ha_reservation>
<ksm>
...
</ksm>
<log_max_memory_used_threshold>95</log_max_memory_used_threshold>
<log_max_memory_used_threshold_type>percentage</log_max_memory_used_threshold_type>
<memory_policy>
...
</memory_policy>
<migration>
...
</migration>
<required_rng_sources>
...
</required_rng_sources>
<switch_type>legacy</switch_type>
<threads_as_cores>false</threads_as_cores>
<trusted_service>false</trusted_service>
<tunnel_migration>false</tunnel_migration>
<version>
<major>4</major>
<minor>6</minor>
</version>
<virt_service>true</virt_service>
<vnc_encryption>true</vnc_encryption>
<data_center href="/ovirt-engine/api/datacenters/14625613-3e8e-439f-b488-1c65925a0465" id="14625613-3e8e-439f-b488-1c65925a0465"/>
<mac_pool href="/ovirt-engine/api/macpools/58ca604b-017d-0374-0220-00000000014e" id="58ca604b-017d-0374-0220-00000000014e"/>
<scheduling_policy href="/ovirt-engine/api/schedulingpolicies/b4ed2332-a7ac-4d5f-9596-99a439cb2812" id="b4ed2332-a7ac-4d5f-9596-99a439cb2812"/>
</cluster>

Comment 4 Eyal Shenitzky 2021-04-26 06:05:07 UTC
Arik, are you aware of any changes in this flow?

It smells like a libvirt issue to me  (RHEL-8.4)

BTW, Ilan the vdsm logs doesn't contain the error above.

Comment 6 Ilan Zuckerman 2021-04-26 06:56:05 UTC
Attached the correct VDSM log. sorry about that.

Comment 7 Arik 2021-04-26 07:22:02 UTC
(In reply to Eyal Shenitzky from comment #4)
> Arik, are you aware of any changes in this flow?

nope
 
> It smells like a libvirt issue to me  (RHEL-8.4)

yeah, if it happens only when hot-plugging the lease.
Ilan, did you test if it works when starting the vm with that lease?

Comment 8 Ilan Zuckerman 2021-04-26 08:48:23 UTC
Yes, I was able to create a VM with ISCSI lease on the same SD (iscsi_1) and start it.
But there was an ERROR during its creation [1]


[1]:
2021-04-26 11:45:06,835+03 INFO  [org.ovirt.engine.core.bll.storage.lease.AddVmLeaseCommand] (default task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] Verify that VM '183d8dce-7aab-4ce8-9b6c-afadc5ff9ecf' lease doesn't exists on storage domain '9d3ad0b6-2eb3-436e-b4da-f1092b0dd599'
2021-04-26 11:45:06,846+03 INFO  [org.ovirt.engine.core.bll.storage.lease.GetVmLeaseInfoCommand] (default task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] Running command: GetVmLeaseInfoCommand internal: true. Entities affected :  ID: 9d3ad0b6-2eb3-436e-b4da-f1092b0dd599 Type: Storage
2021-04-26 11:45:06,846+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmLeaseInfoVDSCommand] (default task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] START, GetVmLeaseInfoVDSCommand( VmLeaseVDSParameters:{expectedEngineErrors='[NoSuchVmLeaseOnDomain]', storagePoolId='14625613-3e8e-439f-b488-1c65925a0465', ignoreFailoverLimit='false', leaseId='183d8dce-7aab-4ce8-9b6c-afadc5ff9ecf', storageDomainId='9d3ad0b6-2eb3-436e-b4da-f1092b0dd599'}), log id: 7bc8784f
2021-04-26 11:45:06,865+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmLeaseInfoVDSCommand] (default task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] Command 'GetVmLeaseInfoVDSCommand( VmLeaseVDSParameters:{expectedEngineErrors='[NoSuchVmLeaseOnDomain]', storagePoolId='14625613-3e8e-439f-b488-1c65925a0465', ignoreFailoverLimit='false', leaseId='183d8dce-7aab-4ce8-9b6c-afadc5ff9ecf', storageDomainId='9d3ad0b6-2eb3-436e-b4da-f1092b0dd599'})' execution failed: IRSGenericException: IRSErrorException: No such lease: 'lease=183d8dce-7aab-4ce8-9b6c-afadc5ff9ecf'
2021-04-26 11:45:06,865+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmLeaseInfoVDSCommand] (default task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] FINISH, GetVmLeaseInfoVDSCommand, return: , log id: 7bc8784f
2021-04-26 11:45:06,875+03 INFO  [org.ovirt.engine.core.bll.storage.lease.AddVmLeaseCommand] (default task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] Creating new VM '183d8dce-7aab-4ce8-9b6c-afadc5ff9ecf' lease, because the VM lease doesn't exists on storage domain '9d3ad0b6-2eb3-436e-b4da-f1092b0dd599'

Comment 9 Eyal Shenitzky 2021-04-26 09:08:48 UTC
(In reply to Ilan Zuckerman from comment #8)
> Yes, I was able to create a VM with ISCSI lease on the same SD (iscsi_1) and
> start it.
> But there was an ERROR during its creation [1]
> 
> 
> [1]:
> 2021-04-26 11:45:06,835+03 INFO 
> [org.ovirt.engine.core.bll.storage.lease.AddVmLeaseCommand] (default
> task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] Verify that VM
> '183d8dce-7aab-4ce8-9b6c-afadc5ff9ecf' lease doesn't exists on storage
> domain '9d3ad0b6-2eb3-436e-b4da-f1092b0dd599'
> 2021-04-26 11:45:06,846+03 INFO 
> [org.ovirt.engine.core.bll.storage.lease.GetVmLeaseInfoCommand] (default
> task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] Running command:
> GetVmLeaseInfoCommand internal: true. Entities affected :  ID:
> 9d3ad0b6-2eb3-436e-b4da-f1092b0dd599 Type: Storage
> 2021-04-26 11:45:06,846+03 INFO 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmLeaseInfoVDSCommand]
> (default task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] START,
> GetVmLeaseInfoVDSCommand(
> VmLeaseVDSParameters:{expectedEngineErrors='[NoSuchVmLeaseOnDomain]',
> storagePoolId='14625613-3e8e-439f-b488-1c65925a0465',
> ignoreFailoverLimit='false', leaseId='183d8dce-7aab-4ce8-9b6c-afadc5ff9ecf',
> storageDomainId='9d3ad0b6-2eb3-436e-b4da-f1092b0dd599'}), log id: 7bc8784f
> 2021-04-26 11:45:06,865+03 ERROR
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmLeaseInfoVDSCommand]
> (default task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] Command
> 'GetVmLeaseInfoVDSCommand(
> VmLeaseVDSParameters:{expectedEngineErrors='[NoSuchVmLeaseOnDomain]',
> storagePoolId='14625613-3e8e-439f-b488-1c65925a0465',
> ignoreFailoverLimit='false', leaseId='183d8dce-7aab-4ce8-9b6c-afadc5ff9ecf',
> storageDomainId='9d3ad0b6-2eb3-436e-b4da-f1092b0dd599'})' execution failed:
> IRSGenericException: IRSErrorException: No such lease:
> 'lease=183d8dce-7aab-4ce8-9b6c-afadc5ff9ecf'
> 2021-04-26 11:45:06,865+03 INFO 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmLeaseInfoVDSCommand]
> (default task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] FINISH,
> GetVmLeaseInfoVDSCommand, return: , log id: 7bc8784f
> 2021-04-26 11:45:06,875+03 INFO 
> [org.ovirt.engine.core.bll.storage.lease.AddVmLeaseCommand] (default
> task-59) [7688815e-e757-41ba-9a7a-8d89099005fe] Creating new VM
> '183d8dce-7aab-4ce8-9b6c-afadc5ff9ecf' lease, because the VM lease doesn't
> exists on storage domain '9d3ad0b6-2eb3-436e-b4da-f1092b0dd599'

This error is expected and it is part of the validation we are doing that the lease doesn't exist.

Comment 10 Eyal Shenitzky 2021-04-26 09:09:54 UTC
Ilan, please create a libvirt bug on this and attach all the needed info.
Keep this bug to track it.

Comment 11 Arik 2021-04-26 09:10:48 UTC
ok, so if this error is expected and the vm starts well with a lease we create, I'd ask Libvirt to check why it fails on hot-plug then

Comment 12 Ilan Zuckerman 2021-04-26 12:23:13 UTC
(In reply to Eyal Shenitzky from comment #10)
> Ilan, please create a libvirt bug on this and attach all the needed info.
> Keep this bug to track it.

Created libvirt bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1953577

Comment 13 Lukas Svaty 2021-05-11 07:23:40 UTC
The dependant bug is VERIFIED, can you check if this can be moved to VERIFIED as well?

Comment 14 Ilan Zuckerman 2021-05-11 09:16:11 UTC
Verified on:
libvirt-libs-7.0.0-14.module+el8.4.0+10886+79296686.x86_64
rhv-release-4.4.6-8-001.noarch