Bug 1253918 - Multihost Beaker jobs hang forever
Multihost Beaker jobs hang forever
Status: NEW
Product: Beaker
Classification: Community
Component: general (Show other bugs)
develop
Unspecified Unspecified
unspecified Severity unspecified (vote)
: ---
: ---
Assigned To: beaker-dev-list
tools-bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-08-15 12:09 EDT by Jiri Hladky
Modified: 2016-03-14 05:15 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jiri Hladky 2015-08-15 12:09:18 EDT
Description of problem:


The multi-host jobs with exactly same XML as previously have suddenly started to fail. We started to experience this behaviour on August 10. Please refer to

https://beaker.engineering.redhat.com/jobs/1048984

and check console.log

2015-08-12 12:16:34,250 rhts_task.twisted emit: ERROR Unhandled Error 
Traceback (most recent call last): 
Failure: exceptions.RuntimeError: Timeout waiting for RHTS variable 
 
2015-08-12 12:16:41,257 rhts_task.twisted emit: ERROR Unhandled Error 
Traceback (most recent call last): 
Failure: exceptions.RuntimeError: Timeout waiting for RHTS variable 
 
2015-08-12 12:16:48,265 rhts_task.twisted emit: ERROR Unhandled Error 
Traceback (most recent call last): 
Failure: exceptions.RuntimeError: Timeout waiting for RHTS variable 

Bill Peck has looked into it and has recommended us to use restrain instead of beah to get around. WA is working fine but we would like beah to get fixed.

Thanks a lot
Jirka
Comment 1 Jeff Burke 2015-09-09 14:54:14 EDT
Jirka,
 I took a look at this a little. I think this is a good Job (one that worked) is
https://beaker.engineering.redhat.com/jobs/1043190
 vs one that failed:
https://beaker.engineering.redhat.com/jobs/1048912
Also your originally reported one 1048984

 Although the XMLs are identical. the test versions that were used are different.
In the one that worked J:1043190 the version of the test was:
 Package kernel_netperf-performance-network_perftest.noarch 0:3.0-7
In the one that is failed 1048912(or your the job in description 1048984 the test was:
 Package kernel_netperf-performance-network_perftest.noarch 0:3.0-11

Do you know what changed in the test?

Thanks,
Jeff
Comment 2 Otto Sabart 2015-09-29 06:17:55 EDT
Hi Jeff,
in 0:3.0-7 version of our tests we have not used RHTS synchronization at all.
The tests were working because we used _our_ own implementation of synchronization written in python, using xmlrpc.

Starting from 0:3.0-11 version we wanted to move our tests under RHTS
synchronization according to Beaker documentation [0].

[0] https://beaker-project.org/docs/user-guide/multihost.html

After that our jobs started to getting stalled with "Timeout waiting for RHTS
variable" in console log.
Comment 3 Nate Straz 2015-11-11 15:56:24 EST
I've run into something similar on the Cluster QE beaker instance and found that restarting beah-fwd-backend on the affected hosts works around the problem.  Not sure what the actual bug is.  Perhaps beah-fwd-backend.service needs a dependency on beah-srv.service.
Comment 4 Abhijeet Kasurde 2016-03-14 05:15:07 EDT
I am also observing same results while running IPA QE downstream automation in beaker 

https://beaker.engineering.redhat.com/jobs/1259997

Note You need to log in before you can comment on or make changes to this bug.