Description of problem: The manager is rebooted when ansible is running role ovirt-ova-pack. ~~~ 2021-12-08 16:53:20 IST - TASK [ovirt-ova-pack : Run packing script] ************************************* ~~~ When the engine is back, it tried to check the status of the export. ~~~ 2021-12-08 16:56:19,067+05 INFO [org.ovirt.engine.core.uutils.config.ShellLikeConfd] (ServerService Thread Pool -- 61) [] Loaded file '/usr/share/ovirt-engine/services/ovirt-engine/ovirt-engine.conf'. >>> engine start 2021-12-08 16:56:42,509+05 INFO [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-57) [a9f35684-8b11-4dc9-804c-36db831c19a9] Command 'ExportVmToOva' (id: '9be2b484-1130-4c88-87d4-13453d8a6041') waiting on child command id: '3539c2f4-7f2b-4bf9-b3e3-18df1f0e4fa7' type:'CreateOva' to complete 2021-12-08 16:56:42,558+05 INFO [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-57) [a9f35684-8b11-4dc9-804c-36db831c19a9] Command 'CreateOva' (id: '3539c2f4-7f2b-4bf9-b3e3-18df1f0e4fa7') waiting on child command id: '8c7ae6d7-edb6-47f7-baee-27218e8b8881' type:'AnsiblePackOva' to complete ~~~ It's not in Ansible runner cache and the status request failed with the error "found an incomplete artifacts directory...Possible ansible_runner error". It's checking the "status" file under artifacts. ~~~ 2021-12-08 17:02:03,912 - runner_service.controllers.playbooks - INFO - 127.0.0.1 - GET /api/v1/playbooks/363026fa-5819-11ec-be69-002b6a01557c 2021-12-08 17:02:03,913 - runner_service.services.playbook - DEBUG - runner_cache 'miss' for run 363026fa-5819-11ec-be69-002b6a01557c 2021-12-08 17:02:03,913 - runner_service.services.playbook - WARNING - Status Request for Play uuid '363026fa-5819-11ec-be69-002b6a01557c', found an incomplete artifacts directory...Possible ansible_runner error? ~~~ The ansible-runner status file will be only created if the task fails/succeed. So we don't have the status file yet. ~~~ # ls -lt /usr/share/ovirt-engine/ansible-runner-service-project/artifacts/363026fa-5819-11ec-be69-002b6a01557c total 12 drwx------. 2 ovirt ovirt 4096 Dec 8 16:53 job_events -rw-------. 1 ovirt ovirt 1323 Dec 8 16:53 stdout drwxr-xr-x. 2 ovirt ovirt 47 Dec 8 16:53 fact_cache -rw-------. 1 ovirt ovirt 1368 Dec 8 16:53 command drwx------. 2 ovirt ovirt 54 Dec 8 16:53 env # tail /usr/share/ovirt-engine/ansible-runner-service-project/artifacts/363026fa-5819-11ec-be69-002b6a01557c/stdout TASK [ovirt-ova-export-pre-pack : Removing the temporary file] ***************** changed: [dell-r530-3.gsslab.pnq.redhat.com] TASK [ovirt-ova-export-pre-pack : Allocating the temporary path for the OVA file] *** changed: [dell-r530-3.gsslab.pnq.redhat.com] TASK [ovirt-ova-export-pre-pack : Retrieving the temporary path for the OVA file] *** changed: [dell-r530-3.gsslab.pnq.redhat.com] TASK [ovirt-ova-pack : Run packing script] ************************************* ~~~ Although the packing script finishes in the host, the engine lost the status of the ansible task and waits indefinitely keeping the VMs locked. We were only able to unlock the VM after manually creating a status file with content "failed". Version-Release number of selected component (if applicable): rhvm-4.4.8.6-0.1.el8ev.noarch How reproducible: 100% Steps to Reproduce: 1. Export a VM as OVA into the host filesystem. Get a VM with a 10+ GB disk size so that it takes time for running the "packing script". 2. Watch for the logs under "/var/log/ovirt-engine/ova/" and reboot the manager when it runs role ovirt-ova-pack. 3. After the manager reboot, the VM stays in locked status forever even when the pack_ova.py completes in the host (watch the ps output in the host). Actual results: VM in locked state forever if manager is rebooted while exporting VM as OVA Expected results: VM should be unlocked and mark the operation as failed. Additional info:
The problem is regarding the ansible-runner-service missing the artifact: 2021-12-08 16:53:20,021 - runner_service.controllers.playbooks - INFO - 127.0.0.1 - GET /api/v1/playbooks/363026fa-5819-11ec-be69-002b6a01557c 2021-12-08 16:53:20,021 - runner_service.services.playbook - DEBUG - runner_cache 'hit' for playbook status request 2021-12-08 16:53:20,023 - runner_service.controllers.jobs - DEBUG - Request received, content-type :None ... 2021-12-08 16:53:20,064 - runner_service.controllers.jobs - INFO - 127.0.0.1 - GET /api/v1/jobs/363026fa-5819-11ec-be69-002b6a01557c/events/30-002b6a01-557c-a4a3-25e5-00000000001e 2021-12-08 16:53:30,082 - runner_service.controllers.playbooks - DEBUG - Request received, content-type :None 2021-12-08 16:53:30,083 - runner_service.controllers.playbooks - INFO - 127.0.0.1 - GET /api/v1/playbooks/363026fa-5819-11ec-be69-002b6a01557c 2021-12-08 16:53:30,083 - runner_service.services.playbook - DEBUG - runner_cache 'hit' for playbook status request 2021-12-08 16:53:30,085 - runner_service.controllers.jobs - DEBUG - Request received, content-type :None 2021-12-08 16:53:30,085 - runner_service.controllers.jobs - INFO - 127.0.0.1 - GET /api/v1/jobs/363026fa-5819-11ec-be69-002b6a01557c/events 2021-12-08 16:53:30,085 - runner_service.services.jobs - DEBUG - Job events for play 363026fa-5819-11ec-be69-002b6a01557c: 31 2021-12-08 16:53:30,085 - runner_service.services.jobs - DEBUG - Active filter is :{} 2021-12-08 16:53:30,087 - runner_service.controllers.jobs - DEBUG - Request received, content-type :None 2021-12-08 16:53:30,087 - runner_service.controllers.jobs - INFO - 127.0.0.1 - GET /api/v1/jobs/363026fa-5819-11ec-be69-002b6a01557c/events 2021-12-08 16:53:30,087 - runner_service.services.jobs - DEBUG - Job events for play 363026fa-5819-11ec-be69-002b6a01557c: 31 2021-12-08 16:53:30,087 - runner_service.services.jobs - DEBUG - Active filter is :{} 2021-12-08 16:56:28,306 - root - INFO - Analysing local configuration options from /etc/ansible-runner-service/config.yaml 2021-12-08 16:56:28,313 - root - INFO - - setting playbooks_root_dir to /usr/share/ovirt-engine/ansible-runner-service-project 2021-12-08 16:56:28,313 - root - INFO - - setting ssh_private_key to /etc/pki/ovirt-engine/keys/engine_id_rsa 2021-12-08 16:56:28,313 - root - INFO - - setting port to 50001 2021-12-08 16:56:28,313 - root - INFO - - setting target_user to root 2021-12-08 16:56:28,314 - root - INFO - - setting log_path to /var/log/ovirt-engine 2021-12-08 16:56:28,314 - root - INFO - - setting ssh_checks to False 2021-12-08 16:56:28,314 - root - INFO - Analysing runtime overrides from environment variables 2021-12-08 16:56:28,314 - root - INFO - No configuration settings overridden 2021-12-08 16:56:28,316 - root - INFO - Loaded logging configuration from /etc/ansible-runner-service/logging.yaml 2021-12-08 16:56:28,330 - runner_service.controllers.playbooks - DEBUG - Request received, content-type :None 2021-12-08 16:56:28,331 - runner_service.controllers.playbooks - INFO - 127.0.0.1 - GET /api/v1/playbooks/363026fa-5819-11ec-be69-002b6a01557c 2021-12-08 16:56:28,331 - runner_service.services.playbook - DEBUG - runner_cache 'miss' for run 363026fa-5819-11ec-be69-002b6a01557c 2021-12-08 16:56:28,333 - runner_service.services.playbook - WARNING - Status Request for Play uuid '363026fa-5819-11ec-be69-002b6a01557c', found an incomplete artifacts directory...Possible ansible_runner error? 2021-12-08 16:56:28,344 - runner_service.controllers.jobs - DEBUG - Request received, content-type :None 2021-12-08 16:56:28,344 - runner_service.controllers.jobs - INFO - 127.0.0.1 - GET /api/v1/jobs/363026fa-5819-11ec-be69-002b6a01557c/events 2021-12-08 16:56:28,361 - runner_service.services.jobs - DEBUG - Job events for play 363026fa-5819-11ec-be69-002b6a01557c: 31 2021-12-08 16:56:28,361 - runner_service.services.jobs - DEBUG - Active filter is :{} Here we see the ansible-runner-service restarted, then not finding the uuid artifact all the sudden. In such case the response to the engine will be "unknown" and we will keep polling the job. We need to figure what happened to the artifact if we can and think of failing/succeeding the job in case of 'unknown' response.
Didn't reproduce with on latest version, should be fixed by other changesto ansible-runner
Verified (didn't manage to reproduce): ovirt-engine-4.5.1.1-0.14.el8ev vdsm-4.50.1.2-1.el8ev.x86_64 qemu-kvm-6.2.0-11.module+el8.6.0+15489+bc23efef.1.x86_64 libvirt-8.0.0-5.2.module+el8.6.0+15256+3a0914fe.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.1] security, bug fix and update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5555