Bug 2087738

Summary: ovirt-engine is not able to kill hanged ansible-runner process after execution timeout passed
Product: [oVirt] ovirt-engine Reporter: Dana <delfassy>
Component: ovirt-host-deploy-ansibleAssignee: Dana <delfassy>
Status: CLOSED CURRENTRELEASE QA Contact: Pavol Brilla <pbrilla>
Severity: high Docs Contact:
Priority: high    
Version: 4.5.0.8CC: bugs, dfodor, lrotenbe, mnecas, mperina
Target Milestone: ovirt-4.5.1Flags: mperina: ovirt-4.5+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.5.1.1 Doc Type: Release Note
Doc Text:
ansible-runner stop command is executed to kill ansibe-runner process after execution timeout. If there is an error during the operation, then we just log the error.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-06-27 07:10:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dana 2022-05-18 11:53:12 UTC
Description of problem:
ansible-runner stop <UUID> fails with the following message in the audit log:

Host stream2 installation failed. Failed to execute Ansible host-deploy role: Cannot run program "ansible-runner stop /home/delfassy/ovirt-engine-master-git6/var/lib/ovirt-engine/ansible-runner/8e6d8743-b770-414a-9ef6-e24d88c563b9": error=2, No such file or directory. Please check logs for more details: /home/delfassy/ovirt-engine-master-git6/var/log/ovirt-engine/host-deploy/ovirt-host-deploy-ansible-20220518143210-192.168.100.194-787ec059-ff73-4121-aa7d-cd4ec80eb473.log.

* host deploy log doesn't have any failure info
* ansible-runner stop failure log is set here- https://github.com/oVirt/ovirt-engine/pull/261/files#diff-b3c46eeff4f2d9d10c5c1c21d106107aad63afa7c2f8c11d365f76dc35e267ce
this file also doesn't contain any info


Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1. set timeout to 2 min. 
2. as timeout is reached, host deploy process executes ansible-runner stop <UUID> 

Actual results:
ansible-runner stop <UUID> fails


Expected results:
ansible-runner stop <UUID> process ends successfully, host deploy fails due to timeout.


Additional info:

Comment 1 Dana 2022-05-18 12:02:42 UTC
I checked the artifacts-
All artifacts exist (last one includes the recap) and stdout file is complete,
so host deploy process has ended and indeed there's nothing to stop.

Comment 2 Pavol Brilla 2022-06-23 10:09:51 UTC
after shortening timeout to just 2 minutes, deploy task is reaching timeout and even message is mirroring it.

Software Version:4.5.1.2-0.11.el8ev
Host test installation failed. Failed to execute Ansible host-deploy: Play execution has reached timeout. Please check logs for more details: /path/to/ansible/playbook.log.


Opening new bug to improve message more as current log doesn't contain more info.