Description of problem: The addition and removal of VM leases create SPM tasks. This is unexpected since we agreed that for 4.1 we would treat these operations as if they are synchronous. Of course, treating async operations as sync mean that possible failures won't be caught, but these are quick operations, relatively simple and since the creation of a lease should probably move to run VM in the future, there's no point in complicating import/add/edit VM flows with polling of these tasks. However, there is a major problem with not polling these tasks - they remain on the SPM and therefore host that served as SPM while VM lease was created or removed cannot be switched to maintenance without restarting its VDSM process. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Create a VM with a lease 2. Try to switch the SPM to maintenance 3. Actual results: Failure to move the SPM as there are uncleared SPM tasks Expected results: The SPM should move to maintenance Additional info: The tasks should either not be created on the VDSM side - this is dangerous though since no one can predict when and if the lease creation will ever be on run VM, as I believe it should. So another option is that the engine will automatically clear lease related tasks that are finished.
(In reply to Arik from comment #0) > Description of problem: > The addition and removal of VM leases create SPM tasks. > This is unexpected since we agreed that for 4.1 we would treat these > operations as if they are synchronous. I don't remember such agreement. The feature page is very clear about the behavior of these verbs: Lease.create(lease) Starts a SPM task creating a lease on the xleases volume in the lease storage domain. Can be used only on the SPM. Creates a sanlock resource on the domain xleases volume, and mapping from lease_id to the resource offset in the volume. Arguments: - lease (Lease): the lease to create This is an asynchronous operation, the caller must check the task status using vdsm tasks API's and usual SPM error handling policies. Lease.delete(lease) Starts a SPM task removing a lease on the xleases volume of lease storage domain. Can be used only on the SPM. Clear the sanlock resource allocated for lease_id, and remove the mapping from lease_id to resource offset in the volume. Arguments: - lease (Lease): the lease to delete This is an asynchronous operation, the caller must check the task status using vdsm tasks API's and usual SPM error handling policies. The first version of the feature page talked about new storage jobs, these are asynchronous as well. > Of course, treating async operations as sync mean that possible failures > won't be caught, but these are quick operations, relatively simple and since > the creation of a lease should probably move to run VM in the future, > there's no point in complicating import/add/edit VM flows with polling of > these tasks. Creating and removing leases is metadata operation, and the only way we can do these operations now is on the SPM. We rely on the SPM lease to modify the xleases volume. The only safe way to do these operations is in SPM task. This also prevent engine from stopping the SPM in the middle of a lease operation. > However, there is a major problem with not polling these tasks - they remain > on the SPM and therefore host that served as SPM while VM lease was created > or removed cannot be switched to maintenance without restarting its VDSM > process. Sure, SPM tasks must be monitored by engine and cleared when the task completes. > The tasks should either not be created on the VDSM side - this is dangerous > though since no one can predict when and if the lease creation will ever be > on run VM, as I believe it should. So another option is that the engine will > automatically clear lease related tasks that are finished. Of course, engine must clear these tasks. For 4.1 we must handle the task properly on the engine side. This should be treated like creating a disk. Adding or editing a vm can start creation of some disks or leases. Until all the operation are finished, the vm cannot be modified. If lease creation fails, engine should show a clear error, same way disk creation error fails, and when editing the vm, the lease should appear as "No lease". For future version, we like to move the creation and the deletion of leases out of the SPM, and run it on any host using storage jobs, and a special lease in the xleases volume. This will make it easier to consume this api and recover much quickly from errors, but the apis will be async as well. Tal, do you know why this is storage bug, when the root cause is misusing storage apis in a virt flow?
*** Bug 1420023 has been marked as a duplicate of this bug. ***
What is the workaround for this bug? Affected code made it into oVirt 4.1.0 release so anyone trying to use leases will end up with the SPM role stuck on a node that fails to go into Maintenance. Is it enough to manually clean tasks on the SPM using vdsClient?
Update: just cleaned finish lease-related tasks on SPM which made it possible to migrate the role: # vdsClient -s 0 getAllTasks 62497948-ad87-4907-856c-56d26ccdb8bd : verb = create_lease code = 0 state = finished tag = spm result = message = 1 jobs completed successfully id = 62497948-ad87-4907-856c-56d26ccdb8bd e6dd3ea5-c280-4e47-bd68-e0b14c962579 : verb = delete_lease code = 0 state = finished tag = spm result = message = 1 jobs completed successfully id = e6dd3ea5-c280-4e47-bd68-e0b14c962579 # vdsClient -s 0 clearTask 62497948-ad87-4907-856c-56d26ccdb8bd {'status': {'message': 'OK', 'code': 0}} # vdsClient -s 0 clearTask e6dd3ea5-c280-4e47-bd68-e0b14c962579 {'status': {'message': 'OK', 'code': 0}} # vdsClient -s 0 getAllTasks <empty output>
(In reply to Evgheni Dereveanchin from comment #3) > Is it enough to manually clean tasks on the SPM using vdsClient? Yes, but you should use vdsm-client, not vdsClient (depracated). The best way is to do: vdsm-client Host getAllTasks And then for each task you want to clear: vdsm-client Task clear taskID=xxxyyy
(In reply to Nir Soffer from comment #5) > (In reply to Evgheni Dereveanchin from comment #3) > > Is it enough to manually clean tasks on the SPM using vdsClient? > > Yes, but you should use vdsm-client, not vdsClient (depracated). Is there a note when running vdsClient that it'll be deprecated? > > The best way is to do: > > vdsm-client Host getAllTasks > > And then for each task you want to clear: > > vdsm-client Task clear taskID=xxxyyy
There will be a deprecation note for all modules using XML-RPC, and for vdsClient in particular. Please see https://gerrit.ovirt.org/#/c/71811/
-------------------------------------- Tested with the following code: ---------------------------------------- rhevm-4.1.1.2-0.1.el7.noarch vdsm-4.19.6-1.el7ev.x86_64 Tested with the following scenario: Steps to Reproduce: 1. Create a VM with a lease 2. Try to switch the SPM to maintenance Actual results: after the spm task is cleared the host can be switched to maintenance. the task is cleared automatically. Expected results: Moving to VERIFIED!
I hit same problem today. Host would not switch to maintenance mode due to uncleared tasks. After manually clearing them host would enter maintenance mode. OS Version: RHEL - 7 - 3.1611.el7.centos VDSM Version: vdsm-4.19.4-1.el7.centos
(In reply to Simon from comment #9) > I hit same problem today ... vdsm-4.19.4-1.el7.centos This was fixed in vdsm-4.19.6-1.el7ev.x86_64. Until this version is available, you should clear the task manually.