Bug 2058264 - Export as OVA playbook gets stuck with 'found an incomplete artifacts directory...Possible ansible_runner error?'
Summary: Export as OVA playbook gets stuck with 'found an incomplete artifacts directo...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.5.0
: ---
Assignee: Dana
QA Contact: Barbora Dolezalova
URL:
Whiteboard:
Depends On: 2052690
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-24 15:36 UTC by amashah
Modified: 2022-08-07 11:45 UTC (History)
4 users (show)

Fixed In Version: ovirt-engine-4.5.0.2
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-26 16:23:55 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-44832 0 None None None 2022-02-24 15:41:03 UTC
Red Hat Knowledge Base (Solution) 6778591 0 None None None 2022-03-03 18:42:34 UTC
Red Hat Product Errata RHSA-2022:4711 0 None None None 2022-05-26 16:24:06 UTC

Description amashah 2022-02-24 15:36:05 UTC
Description of problem:
When exporting as an OVA. The export gets to the stage of:
TASK [ovirt-ova-pack : Run packing script] *************************************

Then gets hung indefinitely, eventually failing after 50 hours on the engine side by hitting the AsyncTaskZombieTaskLifeInMinutes limit.

However, it gets hung a few hours after starting on in the ansible-runner-service.log with:

~~~
2022-02-19 03:16:11,622 - runner_service.services.playbook - DEBUG - runner_cache 'miss' for run a0be3df4-9117-11ec-bb3f-0cc47a078176
2022-02-19 03:16:11,622 - runner_service.services.playbook - WARNING - Status Request for Play uuid 'a0be3df4-9117-11ec-bb3f-0cc47a078176', found an incomplete artifacts directory...Possible ansible_runner  error?
~~~


Then it doesn't proceed, seemingly just gets stuck running until the 50 hour limit when the engine fails it.

Normally, what happens is the pack_ova.py script gets run on the host. After that completes, the .ova.tmp is renamed to .ova on the host.

Here it never gets to the rename, as it seems to get some unknown status back from the artifacts directory (not sure where/what that is?)

In any case, this results in the export failing and leaving a .tmp.ova file on the storage.


Version-Release number of selected component (if applicable):
4.4.9


How reproducible:
Unknown - this doesn't happen all the time, but appears to have happened twice according to the logs in this environment.



Actual results:
Export fails, and ansible-runner-service complains about an incomplete artifacts dir


Expected results:
Export should complete and ansible-runner-service should not find incomplete artifacts directory


Additional info:
Logs will be attached shortly.

Comment 2 amashah 2022-02-24 21:37:09 UTC
So an update, our customer observed the issue occurs during the daily reload of httpd by logrotate.

So to reproduce:
1. Start OVA export
2. Wait until it reaches pack_ova.py stage
3. on RHV-M:
    # systemctl reload httpd 


Then check the /var/log/ovirt-engine/ansible-runner-service.log and it will have cache miss/incomplete artifacts logged.

Other than to disable logrotate during a long export, are there any other suggestions to workaround this?

Thanks,
Amar

Comment 3 Martin Perina 2022-02-28 10:29:57 UTC
As a part of BZ2052690 we are going to remove dependency on ansible-runner-service and we are going to run ansible playbooks directly from engine. So this means that reloading httpd will not affect currently running OVA exports. Of course if you will restart ovirt-engine during OVA export, then OVA export will also be interrupted, but as ovirt-engine restart is much less common than reloading httpd, it might mitigate the issue

Comment 4 Dana 2022-03-01 11:36:16 UTC
We have seen this before, unrelated to the specific task that's mentioned.
Try to delete the contents of share/ovirt-engine/ansible-runner-service-project/artifacts
and run it again
Let me know the results

Comment 6 Michal Skrivanek 2022-04-08 16:15:00 UTC
let's see how this behaves with ansible 2.12 and new ansible-runner

Comment 10 Barbora Dolezalova 2022-05-05 15:20:58 UTC
Verified in ovirt-engine-4.5.0.6-0.7.el8ev.noarch
Reproduction steps from comment #2 were used.

Comment 15 errata-xmlrpc 2022-05-26 16:23:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.0] security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4711

Comment 16 meital avital 2022-08-07 11:45:47 UTC
Due to QE capacity, we are not going to cover this issue in our automation


Note You need to log in before you can comment on or make changes to this bug.