Bug 1930643 - Adding HA lease for VM, next actions on VMs will fail
Summary: Adding HA lease for VM, next actions on VMs will fail
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-ansible-collection
Version: 4.4.3
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ovirt-4.5.1
: ---
Assignee: Martin Necas
QA Contact: Barbora Dolezalova
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-19 10:44 UTC by Steffen Froemer
Modified: 2022-08-03 20:24 UTC (History)
7 users (show)

Fixed In Version: ovirt-ansible-collection
Doc Type: Bug Fix
Doc Text:
A wait_after_lease option has been added to the ovirt_vm Ansible module to provide a delay so that the VM lease creation is completed before the next action starts.
Clone Of:
Environment:
Last Closed: 2022-07-14 12:55:59 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-ansible-collection pull 524 0 None Merged ovirt_vm: add wait_after_lease 2022-06-08 08:34:06 UTC
Red Hat Product Errata RHBA-2022:5584 0 None None None 2022-07-14 12:56:08 UTC

Description Steffen Froemer 2021-02-19 10:44:19 UTC
Description of problem:
Setup lease for High-Availability using ansible, will fail any action, which is performed on VM afterwards

Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1. run following playbook to enforce the issue


~~~~~
---
- name: Reproducer lease problem
  hosts: localhost

  vars:
    engine_url: "https://rhv-m.crazy.lab/ovirt-engine/api"
    engine_user: "admin@internal"
    engine_password: "redhat04"
    engine_cafile: "/etc/pki/ovirt-engine/ca.pem"

    ovirt_storage_domain: "TMP"
    ovirt_cluster: "Default"
    vm_name: "lease-vm-001.security.crazy.lab"

  pre_tasks:
    - name: Login to oVirt
      ovirt_auth:
        url: "{{ engine_url }}"
        username: "{{ engine_user }}"
        password: "{{ engine_password }}"
        ca_file: "{{ engine_cafile | default(omit) }}"
        insecure: "{{ engine_insecure | default(true) }}"
      tags:
        - always

  tasks:
    - name: ensure VM is absent
      ovirt_vm:
        auth: "{{ ovirt_auth }}"
        name: "{{ vm_name }}"
        cluster: "{{ ovirt_cluster }}"
        state: absent
        force: true
        wait: true

    - name: create vm
      ovirt_vm:
        auth: "{{ ovirt_auth }}"
        cluster: "{{ ovirt_cluster }}"
        name: "{{ vm_name }}"
        cpu_cores: 1
        memory: 1GiB
        memory_guaranteed: 1GiB
        operating_system: rhel_7x64
        type: server


    - name: configure VM high-availability
      ovirt_vm:
        auth: "{{ ovirt_auth }}"
        name: "{{ vm_name }}"
        high_availability: yes
        lease: "{{ ovirt_storage_domain }}"
        wait: true

    - name: change memory 
      ovirt_vm:
        auth: "{{ ovirt_auth }}"
        name: "{{ vm_name }}"
        memory: 2GiB

  post_tasks:
    - name: Logout from oVirt
      ovirt_auth:
        state: absent
        ovirt_auth: "{{ ovirt_auth }}"
      tags:
        - always
~~~



Actual results:
TASK [configure VM high-availability] **********************************************************************************************************************************************************************
changed: [localhost]

TASK [change memory] ***************************************************************************************************************************************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Cannot edit VM. VM is being updated.]". HTTP response code is 409.
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Cannot edit VM. VM is being updated.]\". HTTP response code is 409."}



Expected results:
should work

Additional info:

It's possible to mitigate the issue with a small "sleep", but that's not a reliable solution.


[...]
    - name: configure VM high-availability
      ovirt_vm:
        auth: "{{ ovirt_auth }}"
        name: "{{ vm_name }}"
        high_availability: yes
        lease: "{{ ovirt_storage_domain }}"
        wait: true

    - name: wait 1 minute
      pause:
        minutes: 1

    - name: change memory 
      ovirt_vm:
        auth: "{{ ovirt_auth }}"
        name: "{{ vm_name }}"
        memory: 2GiB
[...]


The lease creation is an asynchronous task in ovirt. The state is not reflected in ansible.

Comment 1 Martin Necas 2021-02-19 11:38:53 UTC
An easy workaround for this is to move the `memory: 2GiB` to the `configure VM high-availability`
```
    - name: configure VM high-availability
      ovirt.ovirt.ovirt_vm:
        auth: "{{ ovirt_auth }}"
        name: "{{ vm_name }}"
        high_availability: yes
        lease: "{{ ovirt_storage_domain }}"
        wait: true
        memory: 2GiB
```
It is just a workaround and I'll investigate a proper solution.

Comment 3 Steffen Froemer 2021-02-23 10:21:45 UTC
(In reply to Martin Necas from comment #1)
> An easy workaround for this is to move the `memory: 2GiB` to the `configure
...
> It is just a workaround and I'll investigate a proper solution.

Hi Martin, this 'memory' change was only taken as an example to showcase the issue.
In reality there a multiple issues, which can't be combined directly, without a major re-engineering of the already existing solution.

Comment 4 Michal Skrivanek 2021-04-14 08:53:59 UTC
Steffen, you are right in comment #3, this is something that is unlikely to be changed. For asynchronous operations like setting HA lease the tasks have to check for its status before making any further changes. It's cumbersome but it's the way our API works. We prefer to start an aync task than block until it is finished as that can take potentially a very long time

Comment 6 Steffen Froemer 2021-04-15 06:53:02 UTC
(In reply to Michal Skrivanek from comment #4)
> Steffen, you are right in comment #3, this is something that is unlikely to
> be changed. For asynchronous operations like setting HA lease the tasks have
> to check for its status before making any further changes. It's cumbersome
> but it's the way our API works. We prefer to start an aync task than block
> until it is finished as that can take potentially a very long time

Michal, thanks for clarification. But how to deal with async tasks in ansible?
How can I get appropriate information about running tasks using the API/ansible. I checked tasks, but there was nothing shown up.

Or should I use the ovirt_vm_info module, to query the VM status, until it's finished and available?

Comment 7 Martin Necas 2021-04-20 10:58:27 UTC
We don't know the status of the task from the API so we can't use the ovirt_vm_info module.
Probably the only option would be to add some manual sleep even if it is not the best.

Comment 8 Steffen Froemer 2021-04-30 14:31:41 UTC
Martin, that's weird.

Michal mentioned:
> For asynchronous operations like setting HA lease the tasks have to check for its status before making any further changes.

You said:
> We don't know the status of the task from the API


How can I combine both messages together? I do not have a problem with tracking status of different tasks, but this need to be possible via API.
Adding a sleep is an error prone approach by design. This isn't accepted by Enterprise customers and should be avoided in productive environments.

Comment 10 Michal Skrivanek 2021-06-22 08:40:43 UTC
it would be enough to be able to get it reflected at the ovirt_vm_info level. A lock is being held...if it's not reflected in the status then that's the thing to fix...

Comment 11 Steffen Froemer 2021-07-20 07:31:59 UTC
Martin, can you check, if this is possible to get reflected in ovirt_vm_info?

Comment 12 Martin Necas 2021-07-20 09:02:09 UTC
It is not possible to get it in ovirt_vm_info because adding the lease creates async which locks the VM and it is not reflected in API.

Comment 13 Steffen Froemer 2021-08-30 21:16:13 UTC
What needs to be done, to have a possibility to monitor the async processes and if the VM is locked or not. That would be sufficient for the first step.
If it's identified, what needs to be done, to have this reflected in API?

Comment 14 Michal Skrivanek 2022-04-08 16:24:25 UTC
didn't make it in time for GA, untargetting for another review

Comment 18 Barbora Dolezalova 2022-06-16 12:15:05 UTC
Verified in ovirt-ansible-collection-2.1.0-1.el8ev.noarch

Comment 23 errata-xmlrpc 2022-07-14 12:55:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV Engine and Host Common Packages update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:5584

Comment 24 meital avital 2022-08-03 20:24:47 UTC
Due to QE capacity, we are not going to cover this issue in our automation


Note You need to log in before you can comment on or make changes to this bug.