Bug 1719271 - Document how to properly remove a host using ovirt_host Ansible module
Summary: Document how to properly remove a host using ovirt_host Ansible module
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ansible
Version: 4.3.4
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ovirt-4.3.7
: ---
Assignee: Martin Necas
QA Contact: Petr Matyáš
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-11 11:50 UTC by Petr Matyáš
Modified: 2020-01-09 13:27 UTC (History)
7 users (show)

Fixed In Version: ansible-2.9.0-2.el7
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-09 13:27:49 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine log (84.37 KB, text/plain)
2019-06-11 11:50 UTC, Petr Matyáš
no flags Details
engine log (119.24 KB, text/plain)
2019-07-30 10:42 UTC, Petr Matyáš
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github ansible ansible pull 58625 0 'None' closed ovirt add docs about remove retry 2020-06-05 20:41:09 UTC
Github ansible ansible pull 58718 0 'None' closed Ovirt add host retry doc backport 2020-06-05 20:41:09 UTC
Github ansible ansible pull 62491 0 'None' closed ovirt_host update force doc 2020-06-05 20:41:09 UTC
Github ansible ansible pull 62805 0 'None' closed Backport/2.9/docs2 2020-06-05 20:41:09 UTC

Description Petr Matyáš 2019-06-11 11:50:05 UTC
Created attachment 1579335 [details]
engine log

Description of problem:
When using ansible module ovirt_host for removing a host the action to remove it fails as DisconnectStoragePoolVDSCommand is finished only after attempting to remove the host.

Version-Release number of selected component (if applicable):
ovirt-engine-4.3.4.3-0.1.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. run ansible module ovirt_host with state: absent
2.
3.

Actual results:
host is in maintenance, but not removed

Expected results:
host is removed

Additional info:
2019-06-11 14:18:37,725+03 INFO  [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-20) [a187e2fd-93ad-419e-b1f0
-53ddeb5ec882] Running command: MaintenanceNumberOfVdssCommand internal: false. Entities affected :  ID: 67caae17-e5e6-42fb-b609-e1b11
9c0ee04 Type: VDSAction group MANIPULATE_HOST with role type ADMIN
2019-06-11 14:18:37,728+03 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (default task-20) [a187e2fd-93ad-419e-b1f0-5
3ddeb5ec882] START, SetVdsStatusVDSCommand(HostName = host_mixed_3, SetVdsStatusVDSCommandParameters:{hostId='67caae17-e5e6-42fb-b609-
e1b119c0ee04', status='PreparingForMaintenance', nonOperationalReason='NONE', stopSpmFailureLogged='true', maintenanceReason='null'}),
 log id: 4065bc0a
2019-06-11 14:18:37,731+03 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (default task-20) [a187e2fd-93ad-419e-b1f0-5
3ddeb5ec882] FINISH, SetVdsStatusVDSCommand, return: , log id: 4065bc0a
2019-06-11 14:18:37,774+03 INFO  [org.ovirt.engine.core.bll.MaintenanceVdsCommand] (default task-20) [a187e2fd-93ad-419e-b1f0-53ddeb5e
c882] Running command: MaintenanceVdsCommand internal: true. Entities affected :  ID: 67caae17-e5e6-42fb-b609-e1b119c0ee04 Type: VDS
2019-06-11 14:18:37,801+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SetHaMaintenanceModeVDSCommand] (default task-20) [a187e2f
d-93ad-419e-b1f0-53ddeb5ec882] START, SetHaMaintenanceModeVDSCommand(HostName = host_mixed_3, SetHaMaintenanceModeVDSCommandParameters
:{hostId='67caae17-e5e6-42fb-b609-e1b119c0ee04'}), log id: 2e7c1e5b
2019-06-11 14:18:37,804+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SetHaMaintenanceModeVDSCommand] (default task-20) [a187e2f
d-93ad-419e-b1f0-53ddeb5ec882] FINISH, SetHaMaintenanceModeVDSCommand, return: , log id: 2e7c1e5b
2019-06-11 14:18:37,810+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-20) [a187e2fd-93
ad-419e-b1f0-53ddeb5ec882] EVENT_ID: USER_VDS_MAINTENANCE_WITHOUT_REASON(620), Host host_mixed_3 was switched to Maintenance mode by a
dmin@internal-authz.
2019-06-11 14:18:38,319+03 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineS
cheduled-Thread-79) [a187e2fd-93ad-419e-b1f0-53ddeb5ec882] Command 'MaintenanceNumberOfVdss' id: '5ca57f6a-2c5d-401a-a00e-0d1bf54d6067
' child commands '[]' executions were completed, status 'SUCCEEDED'
2019-06-11 14:18:39,174+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-
Thread-65) [] Updated host status from 'Preparing for Maintenance' to 'Maintenance' in database, host 'host_mixed_3'(67caae17-e5e6-42f
b-b609-e1b119c0ee04)
2019-06-11 14:18:39,185+03 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-36520) []
 Clearing cache of pool: '3eee4c8c-b7a3-415e-ab32-aed29a97548f' for problematic entities of VDS: 'host_mixed_3'.
2019-06-11 14:18:39,185+03 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-36520) []
 Removing vds '[67caae17-e5e6-42fb-b609-e1b119c0ee04]' from the domain in maintenance cache
2019-06-11 14:18:39,185+03 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-36520) []
 Removing host(s) '[67caae17-e5e6-42fb-b609-e1b119c0ee04]' from hosts unseen domain report cache
2019-06-11 14:18:39,186+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-
engineScheduled-Thread-65) [] START, DisconnectStoragePoolVDSCommand(HostName = host_mixed_3, DisconnectStoragePoolVDSCommandParameter
s:{hostId='67caae17-e5e6-42fb-b609-e1b119c0ee04', storagePoolId='3eee4c8c-b7a3-415e-ab32-aed29a97548f', vds_spm_id='3'}), log id: 5dde
ad0e
2019-06-11 14:18:39,322+03 INFO  [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (EE-ManagedThreadFactory-engineScheduled-T
hread-16) [a187e2fd-93ad-419e-b1f0-53ddeb5ec882] Ending command 'org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand' successfull
y.
2019-06-11 14:18:41,078+03 INFO  [org.ovirt.engine.core.bll.RemoveVdsCommand] (default task-20) [f03bd365-fd94-4886-8826-ac490f90a654]
 Failed to Acquire Lock to object 'EngineLock:{exclusiveLocks='[67caae17-e5e6-42fb-b609-e1b119c0ee04=VDS, VDS_POOL_AND_STORAGE_CONNECT
IONS67caae17-e5e6-42fb-b609-e1b119c0ee04=VDS_POOL_AND_STORAGE_CONNECTIONS]', sharedLocks=''}'
2019-06-11 14:18:41,078+03 WARN  [org.ovirt.engine.core.bll.RemoveVdsCommand] (default task-20) [f03bd365-fd94-4886-8826-ac490f90a654]
 Validation of action 'RemoveVds' failed for user admin@internal-authz. Reasons: VAR__ACTION__REMOVE,VAR__TYPE__HOST,ACTION_TYPE_FAILE
D_OBJECT_LOCKED
2019-06-11 14:18:41,079+03 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-20) [] Operation Failed
: [Cannot remove Host. Related operation is currently in progress. Please try again later.]
2019-06-11 14:18:42,504+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-
engineScheduled-Thread-65) [] FINISH, DisconnectStoragePoolVDSCommand, return: , log id: 5ddead0e
2019-06-11 14:18:42,506+03 INFO  [org.ovirt.engine.core.bll.storage.pool.DisconnectHostFromStoragePoolServersCommand] (EE-ManagedThrea
dFactory-engineScheduled-Thread-65) [1e54bd0d] Running command: DisconnectHostFromStoragePoolServersCommand internal: true. Entities a
ffected :  ID: 3eee4c8c-b7a3-415e-ab32-aed29a97548f Type: StoragePool

Comment 1 Ondra Machacek 2019-06-19 08:34:52 UTC
So we decided to add retry action to Ansible ovirt_host module. We should make it configurable where by default we will try to retry 6 times with interval of 20s.
Unfortunately there is no error in audit log, which we could use as waiting condition. Maybe we should consider add this error to audit log, so the removal of the host is more reliable.

Comment 2 Martin Perina 2019-06-19 13:02:58 UTC
Moving to 4.4 now, we can move to 4.3.z, when we will know the date Ansbile 2.9 is released

Comment 3 RHEL Program Management 2019-06-19 13:03:01 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 4 Martin Necas 2019-07-02 10:30:42 UTC
You can use until in ansible (https://docs.ansible.com/ansible/latest/user_guide/playbooks_loops.html#retrying-a-task-until-a-condition-is-met)

example:
- ovirt_host:
    state: absent
    name: myhost
  register: result
  until: not result.failed
  retries: 6
  delay: 20

Comment 8 Sandro Bonazzola 2019-07-30 07:37:11 UTC
ansible-2.8.3 has been released, please check this bug is fixed there.

Comment 9 Petr Matyáš 2019-07-30 10:37:18 UTC
Using ansible-2.8.3-1.el7ae.noarch and this playbook:
---
- name: oVirt host
  hosts: localhost
  connection: local
  gather_facts: false

  vars_files:
    - engine_vars.yml
    - passwords.yml

  pre_tasks:
    - name: Login to oVirt
      ovirt_auth:
        hostname: "{{ engine_fqdn }}"
        username: "{{ engine_user }}"
        password: "{{ engine_password }}"
        ca_file: "{{ engine_cafile | default(omit) }}"
        insecure: "{{ engine_insecure | default(true) }}"
      tags:
        - always

  tasks:
    - ovirt_host:
        auth: "{{ ovirt_auth }}"
        state: absent
        name: host-01
        force: True

I'm getting this result:
TASK [ovirt_host] ********************************************************************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: Error: Fault reason is "Operation Failed". Fault detail is "[Cannot remove Host. Related operation is currently in progress. Please try again later.]". HTTP response code
is 409.
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Cannot remove Host. Related operation is currently in progress. Please try again later.]\". HTTP response code is 409."}

Comment 10 Petr Matyáš 2019-07-30 10:42:31 UTC
Created attachment 1594578 [details]
engine log

Comment 11 Ondra Machacek 2019-07-30 10:45:16 UTC
We need to update the documentation of the 'force' parameter. The proper documentation should be:

 Indicates that the host should be removed even if it is non-responsive, or if it is part of a Gluster Storage cluster and has volume bricks on it.

It don't forcibly remove host if it's being removed already.

Comment 12 Petr Matyáš 2019-07-30 10:48:14 UTC
This works the same way with/without force parameter, fails with related operation currently in progress.

Comment 13 Petr Matyáš 2019-07-30 11:40:06 UTC
Sorry, with:
  tasks:
    - ovirt_host:
        auth: "{{ ovirt_auth }}"
        state: absent
        name: host-01
      register: result
      until: not result.failed
      retries: 6
      delay: 20
this works correctly, however as Ondra pointed out there are still other issues with documentation so keeping in assigned.

Comment 14 Martin Perina 2019-09-02 12:43:34 UTC
Sorry, missed Comment 13

Comment 17 Petr Matyáš 2019-11-19 11:43:17 UTC
Verified https://docs.ansible.com/ansible/2.9/modules/ovirt_host_module.html looks good to me.

Comment 18 Sandro Bonazzola 2020-01-09 13:27:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3729


Note You need to log in before you can comment on or make changes to this bug.