Bug 1312965

Summary: Host is not released prior to second run of beaker provisionment
Product: [Community] khaleesi Reporter: Filip Hubík <fhubik>
Component: rdo-managerAssignee: wes hayutin <whayutin>
Status: CLOSED WONTFIX QA Contact: Attila Darazs <adarazs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecified   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-16 16:30:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Filip Hubík 2016-02-29 16:10:00 UTC
Description of problem:
Task "Return the Beaker machine if it is checked out" of "playbook playbooks/provisioner/beaker/main.yml" fails to release the machine in prior of new provisionment (in CI environment). This task execute script return2beaker.sh on host to release it back to beaker pool, but host is not returned even when script succeeded, ansible output:

TASK: [Return the Beaker machine if it is checked out] ************************
cmd: ssh -o 'StrictHostKeyChecking=no' root.eng.brq.redhat.com "return2beaker.sh"
start: 2016-02-26 17:25:45.245267
end: 2016-02-26 17:25:45.624755
delta: 0:00:00.379488
stdout: Going on...

Version-Release number of selected component (if applicable):
2016-Feb - still getting this issue, for more than month

Configuration (baremetal ospd7 deployment):
ksgen --config-dir settings generate     --provisioner=beaker     --provisioner-distro=rhel     --provisioner-distro-version=7.2     --provisioner-site=bkr     --provisioner-site-user=rdoci-jenkins     --product-version=7_director     --product-version-build=latest     --product-version-repo=puddle     --distro=rhel-7.2     --product=rhos     --installer=rdo_manager     --installer-deploy=templates     --installer-env=baremetal     --installer-images=import     --installer-introspection_method=node_by_node     --installer-network=neutron     --installer-network-isolation=none     --installer-network-variant=ml2-vxlan     --installer-post_action=default     --installer-topology=minimal     --installer-tempest=all     --workarounds=enabled     --extra-vars @../khaleesi-settings/hardware_environments/qe/dell_pe_c6220_1-5/network_configs/none/hw_settings.yml     --extra-vars @../khaleesi-settings/settings/product/rhos/private_settings/redhat_internal.yml     ksgen_settings.yml

Steps to Reproduce:
0. Have clean host in beaker, not loaned to anyone, not running any task, previously cleanly provisioned by beaker (script return2beaker.sh exists)
1. First run - "ansible-playbook -vvvv --extra-vars @ksgen_settings.yml -i local_hosts playbooks/provision.yml"
 A - Attempt to return2beaker.sh succeeded/"command not found"/"ssh timeout" - doesn't really matter, depends on previous state
 B - Host is reserved for deployment and then for some time with reservesys task (~1 day)
2. Second run - ansible-playbook -vvvv --extra-vars @ksgen_settings.yml -i local_hosts playbooks/provision.yml
 C - Attempt to return2beaker.sh suceeded, but host is not released and still blocked by previous reservesys task
 D - Task C is queued and time out after 720 seconds waiting for successful provisionment
3. All tasks incoming are queued since this point (this means host is still blocked with first task)

Workaround:
Remove reservesys task from recipe (https://review.gerrithub.io/#/c/264489/).

Example of dummy deployments with workaround used, 2 sequential khaleesi runs right after, provisioning only:
I.) reservesys task removed - no queued jobs in beaker:
1. http://jenkins-fhubik.usersys.redhat.com/job/beaker-reservation-debugging-13-noreserve/8/console
2. http://jenkins-fhubik.usersys.redhat.com/job/beaker-reservation-debugging-13-noreserve/9/console
II.) reservesys task present - second job queued, first must be cancelled (or time out) with "bkr job-cancel ..." to let second run:
1. http://jenkins-fhubik.usersys.redhat.com/job/beaker-reservation-debugging-14-reserve/9/console
2. http://jenkins-fhubik.usersys.redhat.com/job/beaker-reservation-debugging-14-reserve/10/console