Created attachment 1579335 [details] engine log Description of problem: When using ansible module ovirt_host for removing a host the action to remove it fails as DisconnectStoragePoolVDSCommand is finished only after attempting to remove the host. Version-Release number of selected component (if applicable): ovirt-engine-4.3.4.3-0.1.el7.noarch How reproducible: always Steps to Reproduce: 1. run ansible module ovirt_host with state: absent 2. 3. Actual results: host is in maintenance, but not removed Expected results: host is removed Additional info: 2019-06-11 14:18:37,725+03 INFO [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-20) [a187e2fd-93ad-419e-b1f0 -53ddeb5ec882] Running command: MaintenanceNumberOfVdssCommand internal: false. Entities affected : ID: 67caae17-e5e6-42fb-b609-e1b11 9c0ee04 Type: VDSAction group MANIPULATE_HOST with role type ADMIN 2019-06-11 14:18:37,728+03 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (default task-20) [a187e2fd-93ad-419e-b1f0-5 3ddeb5ec882] START, SetVdsStatusVDSCommand(HostName = host_mixed_3, SetVdsStatusVDSCommandParameters:{hostId='67caae17-e5e6-42fb-b609- e1b119c0ee04', status='PreparingForMaintenance', nonOperationalReason='NONE', stopSpmFailureLogged='true', maintenanceReason='null'}), log id: 4065bc0a 2019-06-11 14:18:37,731+03 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (default task-20) [a187e2fd-93ad-419e-b1f0-5 3ddeb5ec882] FINISH, SetVdsStatusVDSCommand, return: , log id: 4065bc0a 2019-06-11 14:18:37,774+03 INFO [org.ovirt.engine.core.bll.MaintenanceVdsCommand] (default task-20) [a187e2fd-93ad-419e-b1f0-53ddeb5e c882] Running command: MaintenanceVdsCommand internal: true. Entities affected : ID: 67caae17-e5e6-42fb-b609-e1b119c0ee04 Type: VDS 2019-06-11 14:18:37,801+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SetHaMaintenanceModeVDSCommand] (default task-20) [a187e2f d-93ad-419e-b1f0-53ddeb5ec882] START, SetHaMaintenanceModeVDSCommand(HostName = host_mixed_3, SetHaMaintenanceModeVDSCommandParameters :{hostId='67caae17-e5e6-42fb-b609-e1b119c0ee04'}), log id: 2e7c1e5b 2019-06-11 14:18:37,804+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SetHaMaintenanceModeVDSCommand] (default task-20) [a187e2f d-93ad-419e-b1f0-53ddeb5ec882] FINISH, SetHaMaintenanceModeVDSCommand, return: , log id: 2e7c1e5b 2019-06-11 14:18:37,810+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-20) [a187e2fd-93 ad-419e-b1f0-53ddeb5ec882] EVENT_ID: USER_VDS_MAINTENANCE_WITHOUT_REASON(620), Host host_mixed_3 was switched to Maintenance mode by a dmin@internal-authz. 2019-06-11 14:18:38,319+03 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineS cheduled-Thread-79) [a187e2fd-93ad-419e-b1f0-53ddeb5ec882] Command 'MaintenanceNumberOfVdss' id: '5ca57f6a-2c5d-401a-a00e-0d1bf54d6067 ' child commands '[]' executions were completed, status 'SUCCEEDED' 2019-06-11 14:18:39,174+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled- Thread-65) [] Updated host status from 'Preparing for Maintenance' to 'Maintenance' in database, host 'host_mixed_3'(67caae17-e5e6-42f b-b609-e1b119c0ee04) 2019-06-11 14:18:39,185+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-36520) [] Clearing cache of pool: '3eee4c8c-b7a3-415e-ab32-aed29a97548f' for problematic entities of VDS: 'host_mixed_3'. 2019-06-11 14:18:39,185+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-36520) [] Removing vds '[67caae17-e5e6-42fb-b609-e1b119c0ee04]' from the domain in maintenance cache 2019-06-11 14:18:39,185+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-36520) [] Removing host(s) '[67caae17-e5e6-42fb-b609-e1b119c0ee04]' from hosts unseen domain report cache 2019-06-11 14:18:39,186+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStoragePoolVDSCommand] (EE-ManagedThreadFactory- engineScheduled-Thread-65) [] START, DisconnectStoragePoolVDSCommand(HostName = host_mixed_3, DisconnectStoragePoolVDSCommandParameter s:{hostId='67caae17-e5e6-42fb-b609-e1b119c0ee04', storagePoolId='3eee4c8c-b7a3-415e-ab32-aed29a97548f', vds_spm_id='3'}), log id: 5dde ad0e 2019-06-11 14:18:39,322+03 INFO [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (EE-ManagedThreadFactory-engineScheduled-T hread-16) [a187e2fd-93ad-419e-b1f0-53ddeb5ec882] Ending command 'org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand' successfull y. 2019-06-11 14:18:41,078+03 INFO [org.ovirt.engine.core.bll.RemoveVdsCommand] (default task-20) [f03bd365-fd94-4886-8826-ac490f90a654] Failed to Acquire Lock to object 'EngineLock:{exclusiveLocks='[67caae17-e5e6-42fb-b609-e1b119c0ee04=VDS, VDS_POOL_AND_STORAGE_CONNECT IONS67caae17-e5e6-42fb-b609-e1b119c0ee04=VDS_POOL_AND_STORAGE_CONNECTIONS]', sharedLocks=''}' 2019-06-11 14:18:41,078+03 WARN [org.ovirt.engine.core.bll.RemoveVdsCommand] (default task-20) [f03bd365-fd94-4886-8826-ac490f90a654] Validation of action 'RemoveVds' failed for user admin@internal-authz. Reasons: VAR__ACTION__REMOVE,VAR__TYPE__HOST,ACTION_TYPE_FAILE D_OBJECT_LOCKED 2019-06-11 14:18:41,079+03 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-20) [] Operation Failed : [Cannot remove Host. Related operation is currently in progress. Please try again later.] 2019-06-11 14:18:42,504+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStoragePoolVDSCommand] (EE-ManagedThreadFactory- engineScheduled-Thread-65) [] FINISH, DisconnectStoragePoolVDSCommand, return: , log id: 5ddead0e 2019-06-11 14:18:42,506+03 INFO [org.ovirt.engine.core.bll.storage.pool.DisconnectHostFromStoragePoolServersCommand] (EE-ManagedThrea dFactory-engineScheduled-Thread-65) [1e54bd0d] Running command: DisconnectHostFromStoragePoolServersCommand internal: true. Entities a ffected : ID: 3eee4c8c-b7a3-415e-ab32-aed29a97548f Type: StoragePool
So we decided to add retry action to Ansible ovirt_host module. We should make it configurable where by default we will try to retry 6 times with interval of 20s. Unfortunately there is no error in audit log, which we could use as waiting condition. Maybe we should consider add this error to audit log, so the removal of the host is more reliable.
Moving to 4.4 now, we can move to 4.3.z, when we will know the date Ansbile 2.9 is released
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
You can use until in ansible (https://docs.ansible.com/ansible/latest/user_guide/playbooks_loops.html#retrying-a-task-until-a-condition-is-met) example: - ovirt_host: state: absent name: myhost register: result until: not result.failed retries: 6 delay: 20
ansible-2.8.3 has been released, please check this bug is fixed there.
Using ansible-2.8.3-1.el7ae.noarch and this playbook: --- - name: oVirt host hosts: localhost connection: local gather_facts: false vars_files: - engine_vars.yml - passwords.yml pre_tasks: - name: Login to oVirt ovirt_auth: hostname: "{{ engine_fqdn }}" username: "{{ engine_user }}" password: "{{ engine_password }}" ca_file: "{{ engine_cafile | default(omit) }}" insecure: "{{ engine_insecure | default(true) }}" tags: - always tasks: - ovirt_host: auth: "{{ ovirt_auth }}" state: absent name: host-01 force: True I'm getting this result: TASK [ovirt_host] ******************************************************************************************************************** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: Error: Fault reason is "Operation Failed". Fault detail is "[Cannot remove Host. Related operation is currently in progress. Please try again later.]". HTTP response code is 409. fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Cannot remove Host. Related operation is currently in progress. Please try again later.]\". HTTP response code is 409."}
Created attachment 1594578 [details] engine log
We need to update the documentation of the 'force' parameter. The proper documentation should be: Indicates that the host should be removed even if it is non-responsive, or if it is part of a Gluster Storage cluster and has volume bricks on it. It don't forcibly remove host if it's being removed already.
This works the same way with/without force parameter, fails with related operation currently in progress.
Sorry, with: tasks: - ovirt_host: auth: "{{ ovirt_auth }}" state: absent name: host-01 register: result until: not result.failed retries: 6 delay: 20 this works correctly, however as Ondra pointed out there are still other issues with documentation so keeping in assigned.
Sorry, missed Comment 13
Verified https://docs.ansible.com/ansible/2.9/modules/ovirt_host_module.html looks good to me.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:3729