Bug 2058264

Summary: Export as OVA playbook gets stuck with 'found an incomplete artifacts directory...Possible ansible_runner error?'
Product: Red Hat Enterprise Virtualization Manager Reporter: amashah
Component: ovirt-engineAssignee: Dana <delfassy>
Status: CLOSED ERRATA QA Contact: Barbora Dolezalova <bdolezal>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4.9CC: gdeolive, mavital, michal.skrivanek, mperina
Target Milestone: ovirt-4.5.0Keywords: TestOnly
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.5.0.2 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-26 16:23:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2052690    
Bug Blocks:    

Description amashah 2022-02-24 15:36:05 UTC
Description of problem:
When exporting as an OVA. The export gets to the stage of:
TASK [ovirt-ova-pack : Run packing script] *************************************

Then gets hung indefinitely, eventually failing after 50 hours on the engine side by hitting the AsyncTaskZombieTaskLifeInMinutes limit.

However, it gets hung a few hours after starting on in the ansible-runner-service.log with:

~~~
2022-02-19 03:16:11,622 - runner_service.services.playbook - DEBUG - runner_cache 'miss' for run a0be3df4-9117-11ec-bb3f-0cc47a078176
2022-02-19 03:16:11,622 - runner_service.services.playbook - WARNING - Status Request for Play uuid 'a0be3df4-9117-11ec-bb3f-0cc47a078176', found an incomplete artifacts directory...Possible ansible_runner  error?
~~~


Then it doesn't proceed, seemingly just gets stuck running until the 50 hour limit when the engine fails it.

Normally, what happens is the pack_ova.py script gets run on the host. After that completes, the .ova.tmp is renamed to .ova on the host.

Here it never gets to the rename, as it seems to get some unknown status back from the artifacts directory (not sure where/what that is?)

In any case, this results in the export failing and leaving a .tmp.ova file on the storage.


Version-Release number of selected component (if applicable):
4.4.9


How reproducible:
Unknown - this doesn't happen all the time, but appears to have happened twice according to the logs in this environment.



Actual results:
Export fails, and ansible-runner-service complains about an incomplete artifacts dir


Expected results:
Export should complete and ansible-runner-service should not find incomplete artifacts directory


Additional info:
Logs will be attached shortly.

Comment 2 amashah 2022-02-24 21:37:09 UTC
So an update, our customer observed the issue occurs during the daily reload of httpd by logrotate.

So to reproduce:
1. Start OVA export
2. Wait until it reaches pack_ova.py stage
3. on RHV-M:
    # systemctl reload httpd 


Then check the /var/log/ovirt-engine/ansible-runner-service.log and it will have cache miss/incomplete artifacts logged.

Other than to disable logrotate during a long export, are there any other suggestions to workaround this?

Thanks,
Amar

Comment 3 Martin Perina 2022-02-28 10:29:57 UTC
As a part of BZ2052690 we are going to remove dependency on ansible-runner-service and we are going to run ansible playbooks directly from engine. So this means that reloading httpd will not affect currently running OVA exports. Of course if you will restart ovirt-engine during OVA export, then OVA export will also be interrupted, but as ovirt-engine restart is much less common than reloading httpd, it might mitigate the issue

Comment 4 Dana 2022-03-01 11:36:16 UTC
We have seen this before, unrelated to the specific task that's mentioned.
Try to delete the contents of share/ovirt-engine/ansible-runner-service-project/artifacts
and run it again
Let me know the results

Comment 6 Michal Skrivanek 2022-04-08 16:15:00 UTC
let's see how this behaves with ansible 2.12 and new ansible-runner

Comment 10 Barbora Dolezalova 2022-05-05 15:20:58 UTC
Verified in ovirt-engine-4.5.0.6-0.7.el8ev.noarch
Reproduction steps from comment #2 were used.

Comment 15 errata-xmlrpc 2022-05-26 16:23:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.0] security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4711

Comment 16 meital avital 2022-08-07 11:45:47 UTC
Due to QE capacity, we are not going to cover this issue in our automation