Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1143996

Summary:	multihost tests with uneven number of task fail
Product:	[Retired] Beaker	Reporter:	Patrik Kis <pkis>
Component:	beah	Assignee:	beaker-dev-list
Status:	CLOSED EOL	QA Contact:	tools-bugs <tools-bugs>
Severity:	medium	Docs Contact:
Priority:	low
Version:	0.17	CC:	ksrot, mastyk, mcsontos
Target Milestone:	---	Keywords:	Documentation
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-02-11 12:17:26 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Patrik Kis 2014-09-18 12:19:54 UTC

Description of problem:
If a multihost test is scheduled with uneven number of tasks (recipes) the two side can not synchronise.

Version-Release number of selected component (if applicable):
beah-0.7.6-1.el5

How reproducible:
always

Steps to Reproduce:
1. Schedule a job with uneven number of tasks. An example job can be seen here:
https://beaker.engineering.redhat.com/jobs/750609

SERVER side:
  <task name="/distribution/install" role="STANDALONE"/>
  <task name=" a multihost test" role="SERVERS"/>
  <task name=" whatever job" role="None"/>
  <task name=" a multihost test" role="SERVERS"/>

CLIENT Sside:
  <task name="/distribution/install" role="STANDALONE"/>
  <task name=" a multihost test" role="SERVERS"/>
  <task name=" a multihost test" role="SERVERS"/>

2. The first multihost test task will pass but the second will fail in rhts-sync because the two servers do not see ea other. Investigation showed that the reason is that not all environment variables required by  rhts-sync-set and rhts-sync-block are set in this second task.

These script need the following variables set properly (otherwise the two sides do not see each other):
SERVERS - the hostname of the server
CLIENTS - the hostname of the server
RESULT_SERVER - the address where the /usr/bin/beah-rhts-task process listens for xmlrpc commands (e.g. localhost:port)
RECIPESETID - the ID of the recipe set 
TESTORDER - the order of recipe in recipe-set increased by 8 (or binary shifted laft); e.g. the 1st recipe-set has 8 the 2nd 16, etc

The issue is that the SERVERS or CLIENTS and the TESTORDER parameters are not set properly in the second multihost test:

For the example above the SERVER have set:
SERVERS=hostname of the server
CLIENST is not set at all
TESTORDER=32

The CLINEN have:
SERVERS is not set at all
CLIENST=hostname of tyhe client
TESTORDER=24

For more details check the test results in https://beaker.engineering.redhat.com/jobs/750609. In the secind multihost test job the following parameters were monitored (see the TESTOUT.log). 

IMHO the missing setting of CLIENTS and SERVERS is a bug and should be fixed.
The TESTORDER parameter seems to correct but unfortunately it prevents to schedule this kind of jobs. I'm not sure if this can be solved properly in beah.

Please note that these jobs has a real use case and are could have a real usage.

Comment 1 Dan Callaghan 2014-09-18 22:53:06 UTC

This is a known weakness with how multi-host testing in Beaker is structured. The tasks in each recipe must line up.

It is mentioned in this doc:

https://beaker-project.org/docs/user-guide/multihost.html
"Firstly, any multihost testing must ensure that the task execution order aligns correctly on all machines..."

But I will be the first to admit that that doc could be improved. It's a really long narrative tutorial-style doc which would benefit from being split into something more task-focused with discrete sections.

You can use /distribution/dummy as a placeholder to fill out gaps in your recipes so that all tasks correctly line up.

I'm converting this to a docs bug, about making this limitation (and the recommended solution) much clearer and easier to find in the docs.

Comment 2 Patrik Kis 2014-09-19 08:08:11 UTC

Thanks Dan for the quick response.

It really looks like that this multihost test weakness is by design, but is there any chance that at least the SERVERS and CLIENTS variables are set properly when the tests do not line up?

IMHO this is still a bug and we have a real scenario where this flaw matters. We are using in some tests an ad-hoc synchronisation mechanism (through nfs) other than rhts-sync-* which requires only the SERVERS and CLIENTS variables (so TESTORDER mismatch doas not matter).
So while for legacy synchronisation mechanism we still need to use place-holder tasks, as you pointed out, for this nfs based synchronisation we wouldn't.

Comment 3 Marian Csontos 2014-09-19 08:15:09 UTC

Tasks are always paired by their order and the roles in tasks are not visible outside of corresponding "task-set".

But you can set a role in recipe which will be available in all tasks.

Comment 4 Karel Srot 2014-09-19 13:09:08 UTC

The placeholder also won't solve an issue when the respective task on the second system takes significantly more time. I think that systems should be synchronized by harness before each multihost task to ensure that tests starts at the same moment and that one system is not waiting in the TC while the other system is finishing the previous task. Such tests might be killed by local watchdog.

Comment 5 Patrik Kis 2014-09-19 15:37:30 UTC

(In reply to Marian Csontos from comment #3)
> Tasks are always paired by their order and the roles in tasks are not
> visible outside of corresponding "task-set".
> 
> But you can set a role in recipe which will be available in all tasks.

I'm not sure if I understand correctly what you mean, but the problem what should be fixed is the following.
Let's consider the following recipe set:

<recipe kernel_options="" kernel_options_post="" ks_meta="method=nfs" role="None" whiteboard="">
	<task name="/distribution/install" role="STANDALONE"/>
	<task name="/multihost test" role="SERVERS"/>
	<task name="/singlehost test" role="None"/>
	<task name="/multihost test" role="SERVERS"/>
</recipe>

<recipe kernel_options="" kernel_options_post="" ks_meta="method=nfs" role="None" whiteboard="">
	<task name="/distribution/install" role="STANDALONE"/>
	<task name="/multihost test" role="CLIENTS"/>
	<task name="/multihost test" role="CLIENTS"/>
</recipe>


In the result the key environment variables will be set like this:

<recipe kernel_options="" kernel_options_post="" ks_meta="method=nfs" role="None" whiteboard="">
	<task name="/distribution/install" role="STANDALONE"/>

	<task name="/multihost test" role="SERVERS"/>
        SERVERS=<servers_hostname>
        CLIENTS=<clients_hostname>

	<task name="/singlehost test" role="None"/>
        CLIENTS=<clients_hostname>

	<task name="/multihost test" role="SERVERS"/>
        SERVERS=<servers_hostname>
</recipe>

<recipe kernel_options="" kernel_options_post="" ks_meta="method=nfs" role="None" whiteboard="">
	<task name="/distribution/install" role="STANDALONE"/>

	<task name="/multihost test" role="CLIENTS"/>
        SERVERS=<servers_hostname>
        CLIENTS=<clients_hostname>

	<task name="/multihost test" role="CLIENTS"/>
        CLIENTS=<clients_hostname>

</recipe>

Comment 6 Martin Styk 2020-02-11 12:17:26 UTC

Beah is no longer supported by Beaker development team.
Instead of that, we are working on Restraint test harness. You can find all the features of Restraint here.

https://restraint.readthedocs.io/en/latest/

If you think your RFE should be still implemented as part of Restraint feel free to create a new BZ ticket.

https://bugzilla.redhat.com/enter_bug.cgi?product=Restraint

In case you have any question feel free to reach out to me
Thank you,
Martin Styk <martin.styk>