Description of problem: When exporting as an OVA. The export gets to the stage of: TASK [ovirt-ova-pack : Run packing script] ************************************* Then gets hung indefinitely, eventually failing after 50 hours on the engine side by hitting the AsyncTaskZombieTaskLifeInMinutes limit. However, it gets hung a few hours after starting on in the ansible-runner-service.log with: ~~~ 2022-02-19 03:16:11,622 - runner_service.services.playbook - DEBUG - runner_cache 'miss' for run a0be3df4-9117-11ec-bb3f-0cc47a078176 2022-02-19 03:16:11,622 - runner_service.services.playbook - WARNING - Status Request for Play uuid 'a0be3df4-9117-11ec-bb3f-0cc47a078176', found an incomplete artifacts directory...Possible ansible_runner error? ~~~ Then it doesn't proceed, seemingly just gets stuck running until the 50 hour limit when the engine fails it. Normally, what happens is the pack_ova.py script gets run on the host. After that completes, the .ova.tmp is renamed to .ova on the host. Here it never gets to the rename, as it seems to get some unknown status back from the artifacts directory (not sure where/what that is?) In any case, this results in the export failing and leaving a .tmp.ova file on the storage. Version-Release number of selected component (if applicable): 4.4.9 How reproducible: Unknown - this doesn't happen all the time, but appears to have happened twice according to the logs in this environment. Actual results: Export fails, and ansible-runner-service complains about an incomplete artifacts dir Expected results: Export should complete and ansible-runner-service should not find incomplete artifacts directory Additional info: Logs will be attached shortly.
So an update, our customer observed the issue occurs during the daily reload of httpd by logrotate. So to reproduce: 1. Start OVA export 2. Wait until it reaches pack_ova.py stage 3. on RHV-M: # systemctl reload httpd Then check the /var/log/ovirt-engine/ansible-runner-service.log and it will have cache miss/incomplete artifacts logged. Other than to disable logrotate during a long export, are there any other suggestions to workaround this? Thanks, Amar
As a part of BZ2052690 we are going to remove dependency on ansible-runner-service and we are going to run ansible playbooks directly from engine. So this means that reloading httpd will not affect currently running OVA exports. Of course if you will restart ovirt-engine during OVA export, then OVA export will also be interrupted, but as ovirt-engine restart is much less common than reloading httpd, it might mitigate the issue
We have seen this before, unrelated to the specific task that's mentioned. Try to delete the contents of share/ovirt-engine/ansible-runner-service-project/artifacts and run it again Let me know the results
let's see how this behaves with ansible 2.12 and new ansible-runner
Verified in ovirt-engine-4.5.0.6-0.7.el8ev.noarch Reproduction steps from comment #2 were used.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.0] security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:4711
Due to QE capacity, we are not going to cover this issue in our automation