Bug 960434 - task roles are not visible across guests and hosts in the same recipe set
task roles are not visible across guests and hosts in the same recipe set
Status: CLOSED CURRENTRELEASE
Product: Beaker
Classification: Community
Component: scheduler (Show other bugs)
0.12
Unspecified Unspecified
high Severity high (vote)
: 20.0
: ---
Assigned To: Dan Callaghan
tools-bugs
: Patch, TestBlocker
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-07 04:07 EDT by Michal Kovarik
Modified: 2018-02-05 19:41 EST (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-04-19 22:22:33 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Michal Kovarik 2013-05-07 04:07:02 EDT
Description of problem:
I've ran job with guest_recipe, where role for recipe and guest_recipe is 'RECIPE_MEMBERS'.

hypervisor_tasks:
/distribution/virt/install
/distribution/virt/start
/distribution/command role="SERVERS"

guest_tasks:
/distribution/dummy
/distribution/dummy
/distribution/command role="CLIENTS"

I ran command 'env' in /distribution/command task.
System environment RECIPE_MEMBERS contained both machines.

Hypervisor had only SERVERS, but CLIENTS variable missed.
Guest had only CLIENTS, but SERVERS variable missed.

Expected result: 
CLIENTS and SERVERS environment variable should be in both machines for third task.
Comment 2 Nick Coghlan 2013-05-07 04:19:13 EDT
Task roles are currently considered to be distinct between hosts and guests. To get shared roles between hosts and guests, you must use recipe roles.

This was apparently a deliberate design decision, but the rationale hasn't been documented clearly. Bill, do you recall a specific reason we shouldn't treat task roles the same as recipe roles and let hosts and guests see each other? Was it simply due to the fact that until 0.12 you couldn't easily set up bidirectional communication between the hosts and the guests in the first place?
Comment 3 Bill Peck 2013-05-07 08:45:31 EDT
Hi Nick,

The original intent was to allow for migration tests in xen where you would be running a different set of tests on the host machines (migrate guestA from hostA to hostB) while running multi-host tests between guestA and guestB.

But this could probably be supported by simply using different roles between hosts and guests.

I wasn't aware that recipe roles went across hosts and guests.
Comment 4 Nick Coghlan 2013-05-09 02:30:16 EDT
OK, I expect we'll eventually change this to make task and recipe roles consistent (i.e. being shared between HOST and GUEST), but probably not for 1.0
Comment 5 Tomas Rusnak 2013-06-25 09:27:59 EDT
We are using roles specified per task:

<recipe>
  <guestrecipe>
    <task name="/distribution/task" role="ROLE_A">
    </task>
  </guestrecipe>
  <guestrecipe>
    <task name="/distribution/task" role="ROLE_A">
    </task>
  </guestrecipe>
  <task name="/distribution/task" role="ROLE_B">
  </task>
</recipe>

The result from environment is:
Guests: 
ROLE_A="guest1.hostname guest2.hostname"
Host:
ROLE_A="guest1.hostname guest2.hostname"
ROLE_B="host1.hostname"


The information about Host role is missing completely on guests. Do you think, that is related to this bug, too?
Comment 6 Dan Callaghan 2013-06-25 18:59:51 EDT
(In reply to Tomas Rusnak from comment #5)
> The information about Host role is missing completely on guests. Do you
> think, that is related to this bug, too?

Yes, that is exactly the bug (or mis-feature :-).

One workaround is to use recipe roles instead, which are visible across all guests and hosts in the recipe set.
Comment 7 Luigi Toscano 2013-06-26 06:09:23 EDT
(In reply to Dan Callaghan from comment #6)

> One workaround is to use recipe roles instead, which are visible across all
> guests and hosts in the recipe set.

Is the content of RECIPE_MEMBERS ordered?
In some tests we rely on a special pattern for roles to properly order the list of hosts (not just CLIENTS/SERVER).
While waiting for a fix, if the first host in RECIPE_MEMBERS is always the hypervisor than it should be enough for a simple workaround, otherwise we need a more complex workaround (i.e. pass more parameters).
Comment 8 Dan Callaghan 2013-07-11 02:01:17 EDT
(In reply to Luigi Toscano from comment #7)

The hostnames in role environment variables will be in the same order as the recipes in the recipe set. But unfortunately RECIPE_MEMBERS passes through a Python set, so its order is arbitrary.
Comment 9 Tomas Rusnak 2013-09-20 08:59:52 EDT
Our typical use case is to preconfigure environment with one task, first, then run another test on top of it. Both tasks with different variables (per task). Setting global variable per recipe is not proper workaround for us, than.

This issue is blocking our testcases. Is there any estimate when this should be fixed rather then write our own code to go around this?
Comment 10 Raymond Mancy 2013-09-23 03:17:34 EDT
I'm not sure if it quite meets your requirement, but you can use the 'HYPERVISOR_HOSTNAME' to get the hostname from the guest.
Comment 11 Eric Sammons 2013-10-14 09:07:06 EDT
I'm raising the severity/priority of this issue as it is blocking to my team.  While there appears to be a work-a-round the work-a-round will take the team 2 - 3 weeks to apply and will result in tests taking longer than they already do to run/complete.  

Ideally, we'd like to see this addressed as soon as possible.
Comment 12 Nick Coghlan 2013-10-16 01:18:44 EDT
Unfortunately, it isn't simply a matter of changing the behaviour to make host roles visible on guest systems. As Bill indicated above, existing tests may be relying on the host's task role *not* being visible to the guests (for example, if the guests are using a role to coordinate scheduling through rhts-sync-set and rhts-sync-block, then adding the host with the same role to that mix could result in the rhts-sync-block waiting until the external watchdog times out).

That means resolving this issue on the Beaker side requires designing and deploying a mechanism to opt in to sharing the host's task role with the quests.

That's certainly possible, but it wasn't something we were planning on doing any time soon.

I'm not clear on why a workaround like setting a "hypervisor role" attribute in the task parameters and using $HYPERVISOR_HOSTNAME to adjust the role definitions appropriately would result in the tests taking longer to run though. (I don't know how many tests are affected, so I can't assess the projected implementation time for the incorporation of such a workaround).
Comment 14 Tomas Rusnak 2013-10-16 09:48:17 EDT
Let me clear our requirements and explain standard situation when you are *not* using virtualization and you have 4 systems deployed. The same task is running on all deployed system. On each system you can specify different ROLE for that task and synchronize it with rhts-sync-block (necessary when you want to test multihost environment like cluster, grid, ...)

For example, on each system you will get same environment variables for all systems in recipeset, like:

ROLE1="host1 host2"
ROLE2="host3 host4"

Imagine the situation when you want to test cluster with virtualization and fencing. The task is running on all GUESTs, and on the HOST, too. It's because you need to configure cluster service on each GUEST and fencing on the HOST. If you have incomplete environment, in such case, there is impossible to synchronize GUESTs tasks with task on HOST system.

The requested feature here. is to have same ENV as you had on deployed systems without virtualization. That means, all systems in recipeset must to have complete environment set. In our case, for each task the environment must to have information about all ROLEs for all GUESTs and HOST ROLE.

In general, all task parameters must to be available on all systems in recipeset which are running same task. Then we can synchronize and do other magic we need.

Workaround with HYPERVISOR_HOSTNAME is not enough. Main problem here is, you are not sure how many, and which hostnames you need to synchronize together. Each GUEST should have info about HYPERVISOR, but how many other GUESTs are in recipeset, and what ROLE they are running? Same problem with other parameters set for task.

As I hope, doesn't matter if you are using virtualization or not. The environment should be the same in both cases for tasks parameters (not only the ROLEs).
Comment 15 Nick Coghlan 2013-10-16 22:48:40 EDT
Yes, we know the current behaviour is highly arguable and we *do* want to change it eventually (or at least provide the option to make task roles work more like recipe roles). However, we need to be careful with how we do that, since existing tests may be relying on the current (admittedly odd) behaviour.

The other alternative (as Dan mentioned earlier) is to use *recipe* roles, since those *are* properly shared across all systems. To expand on the example posted above:

<recipe role="ROLE_D">
  <guestrecipe role="ROLE_C">
    <task name="/distribution/task" role="ROLE_A">
    </task>
  </guestrecipe>
  <guestrecipe role="ROLE_C">
    <task name="/distribution/task" role="ROLE_A">
    </task>
  </guestrecipe>
  <task name="/distribution/task" role="ROLE_B">
  </task>
</recipe>

The result from the environment should then be:
Guests: 
ROLE_A="guest1.hostname guest2.hostname"
ROLE_C="guest1.hostname guest2.hostname"
ROLE_D="host1.hostname"
Host:
ROLE_A="guest1.hostname guest2.hostname"
ROLE_B="host1.hostname"
ROLE_C="guest1.hostname guest2.hostname"
ROLE_D="host1.hostname"

Beaker's multihost testing support *really* leans in the direct of hosts maintaining a consistent role for the duration of the recipe (see http://beaker-project.org/docs/user-guide/multihost.html). While task-specific roles are also supported, they're not quite the same thing (as Bill noted above, they were added to support particular kinds of virtualisation testing, while recipe roles are the preferred mechanism for general multihost testing).
Comment 16 Nick Coghlan 2013-11-06 22:44:12 EST
After a recent IRC discussion about this, I think the simplest thing to do is to offer a "share_task_roles" attribute at the recipe set level in the job XML, and offer "--shared-task-roles" and "--split-task-roles" common workflow options in the bkr CLI.

--shared-task-roles would set share_task_roles=True on the recipe sets in the generated job XML and ensure task roles, like recipe roles, are shared across both host and guest recipes

--split-task-roles would set share_task_roles=False on the recipe sets in the generated job XML, and preserve the status quo, where task roles are separate between hosts and guests.

The default setting for share_task_roles would be configurable through the server configuration files. The upstream default would be to share them, but existing installations could choose to default to the old behaviour.

So we'll definitely fix this, but it isn't yet clear when the change will make it into a release.
Comment 17 Dan Callaghan 2013-11-11 02:15:40 EST
(In reply to Nick Coghlan from comment #16)

This seems like a situation where the cost for Beaker to maintain compatibility is much higher than the cost for users to adapt to the new behaviour...
Comment 18 Dan Callaghan 2014-11-12 03:03:45 EST
We can use our Teiid virtual database to check how many recipe sets in Red Hat's Beaker have used the same task role in host and guest. We filter out empty string and 'None' (which is what Beaker produces if you omit the XML attribute, and which beah ignores).

I *think* we can safely filter out STANDALONE as well... that is actually a role like any other, but by convention it's the role you use when you *aren't* doing any syncing. It's also filled in by default by Beaker in many cases so it occurs very often.

That gives us a reasonably small number of recipes:

public=> SELECT COUNT(DISTINCT host_recipe.id) FROM Beaker.recipe_task guest_task INNER JOIN recipe guest_recipe ON guest_task.recipe_id = guest_recipe.id INNER JOIN machine_guest_map ON machine_guest_map.guest_recipe_id = guest_recipe.id INNER JOIN recipe host_recipe ON machine_guest_map.machine_recipe_id = host_recipe.id INNER JOIN recipe_task host_task ON host_task.recipe_id = host_recipe.id WHERE guest_task.role = host_task.role AND guest_task.role != '' AND guest_task.role != 'None' AND guest_task.role != 'STANDALONE';
 expr1 
-------
    66
(1 row)

We can also filter out some well-known tasks which we know aren't doing any sync'ing. In these cases the roles were probably just copy-pasted blindly from somewhere.

public=> SELECT COUNT(DISTINCT host_recipe.id) FROM Beaker.recipe_task guest_task INNER JOIN recipe guest_recipe ON guest_task.recipe_id = guest_recipe.id INNER JOIN machine_guest_map ON machine_guest_map.guest_recipe_id = guest_recipe.id INNER JOIN recipe host_recipe ON machine_guest_map.machine_recipe_id = host_recipe.id INNER JOIN recipe_task host_task ON host_task.recipe_id = host_recipe.id WHERE guest_task.role = host_task.role AND guest_task.role != '' AND guest_task.role != 'None' AND guest_task.role != 'STANDALONE' AND guest_task.name NOT IN ('/distribution/reservesys', '/distribution/command', '/distribution/install', '/distribution/virt/install', '/distribution/dummy') AND host_task.name NOT IN ('/distribution/reservesys', '/distribution/command', '/distribution/install', '/distribution/virt/install', '/distribution/dummy');
 expr1 
-------
     2
(1 row)

So that leaves just one task. It was deleted in early 2014, and looking at its source code it *was* doing multihost sync/block but not related to guest recipes. It wasn't relying on the separation of host and guest roles.

So I think we can safely just change this in the next Beaker release.
Comment 19 Dan Callaghan 2014-11-12 03:45:13 EST
Actually there was a flaw in my queries... we need to consider not only guest recipes having the same task roles as their own host recipes, but *any other* host recipe in the recipe set. So the queries need to be a bit trickier.
Comment 20 Dan Callaghan 2014-11-12 03:54:34 EST
That gives a slightly more problematic number:

public=> SELECT COUNT(DISTINCT guest_recipe.id) FROM Beaker.recipe_task guest_task INNER JOIN recipe guest_recipe ON guest_task.recipe_id = guest_recipe.id INNER JOIN machine_guest_map ON machine_guest_map.guest_recipe_id = guest_recipe.id INNER JOIN recipe host_recipe ON machine_guest_map.machine_recipe_id = host_recipe.id INNER JOIN recipe host_sibling_recipe ON host_recipe.recipe_set_id = host_sibling_recipe.recipe_set_id INNER JOIN recipe_task host_task ON host_task.recipe_id = host_sibling_recipe.id WHERE guest_task.role = host_task.role AND guest_task.role != '' AND guest_task.role != 'None' AND guest_task.role != 'STANDALONE' AND guest_task.name NOT IN ('/distribution/reservesys', '/distribution/command', '/distribution/install', '/distribution/virt/install', '/distribution/dummy') AND host_task.name NOT IN ('/distribution/reservesys', '/distribution/command', '/distribution/install', '/distribution/virt/install', '/distribution/dummy');
 expr1 
-------
  3550
(1 row)

public=> SELECT COUNT(DISTINCT guest_task.name) FROM Beaker.recipe_task guest_task INNER JOIN recipe guest_recipe ON guest_task.recipe_id = guest_recipe.id INNER JOIN machine_guest_map ON machine_guest_map.guest_recipe_id = guest_recipe.id INNER JOIN recipe host_recipe ON machine_guest_map.machine_recipe_id = host_recipe.id INNER JOIN recipe host_sibling_recipe ON host_recipe.recipe_set_id = host_sibling_recipe.recipe_set_id INNER JOIN recipe_task host_task ON host_task.recipe_id = host_sibling_recipe.id WHERE guest_task.role = host_task.role AND guest_task.role != '' AND guest_task.role != 'None' AND guest_task.role != 'STANDALONE' AND guest_task.name NOT IN ('/distribution/reservesys', '/distribution/command', '/distribution/install', '/distribution/virt/install', '/distribution/dummy') AND host_task.name NOT IN ('/distribution/reservesys', '/distribution/command', '/distribution/install', '/distribution/virt/install', '/distribution/dummy');                                                          
 expr1 
-------
    89
(1 row)

That's 3550 different recipes to look at, with 89 different tasks...
Comment 21 Dan Callaghan 2014-11-12 04:06:25 EST
(In reply to Dan Callaghan from comment #20)
> That gives a slightly more problematic number:

Never mind, this query was wrong... host_sibling_recipe was not limited to host recipes so it was joining back to the same guest recipe.

The revised query gives the same two results:

public=> SELECT COUNT(DISTINCT guest_recipe.id) FROM Beaker.recipe_task guest_task INNER JOIN recipe guest_recipe ON guest_task.recipe_id = guest_recipe.id INNER JOIN machine_guest_map ON machine_guest_map.guest_recipe_id = guest_recipe.id INNER JOIN recipe host_recipe ON machine_guest_map.machine_recipe_id = host_recipe.id INNER JOIN recipe host_sibling_recipe ON host_recipe.recipe_set_id = host_sibling_recipe.recipe_set_id AND host_sibling_recipe.type = 'machine_recipe' INNER JOIN recipe_task host_task ON host_task.recipe_id = host_sibling_recipe.id WHERE guest_task.role = host_task.role AND guest_task.role != '' AND guest_task.role != 'None' AND guest_task.role != 'STANDALONE' AND guest_task.name NOT IN ('/distribution/reservesys', '/distribution/command', '/distribution/install', '/distribution/virt/install', '/distribution/dummy') AND host_task.name NOT IN ('/distribution/reservesys', '/distribution/command', '/distribution/install', '/distribution/virt/install', '/distribution/dummy');
 expr1 
-------
     2
(1 row)

So, unless I have another mistake in my query somewhere, I think we are all good.
Comment 22 Dan Callaghan 2014-11-12 05:03:42 EST
http://gerrit.beaker-project.org/3480
Comment 25 wangdong 2014-12-05 04:24:07 EST
Created attachment 965023 [details]
Client env
Comment 27 Dan Callaghan 2015-04-19 22:22:33 EDT
Beaker 20.0 has been released.

Note You need to log in before you can comment on or make changes to this bug.