Bug 1426210 - peer roles should not include roles of different tasks
Summary: peer roles should not include roles of different tasks
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Beaker
Classification: Retired
Component: scheduler
Version: develop
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
Assignee: beaker-dev-list
QA Contact: tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-23 12:26 UTC by Jan Tluka
Modified: 2019-07-29 17:01 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-07-29 16:58:10 UTC
Embargoed:


Attachments (Terms of Use)

Description Jan Tluka 2017-02-23 12:26:43 UTC
Description of problem:

I hit a job deadlock with job: https://beaker.engineering.redhat.com/jobs/1729599

With a manual debugging I found out that guest's task /kernel/distribution/lnst/install fetched SERVERS and CLIENTS from host's /distribution/beaker/beah/misc/sync task. Therefore the guest's task gets stalled on
report_result() in rhts-test-runner.sh:
<snip>
    rhts-sync-set -s DONE
    rhts-sync-block -s DONE $SERVERS $CLIENTS $DRIVER
</snip>

I tried to remove this multihost task from the job and it helped:
https://beaker.engineering.redhat.com/jobs/1730043

Just to show the task layout within the deadlocked job, for simplicity
I'm showing only one host and one guest,

No|   host1                                       |    host1-guest1
-------------------------------------------------------------------------------
1  /distribution/virt/image-install                /distribution/dummy
2  /distribution/beaker/beah/misc/sync             /distribution/dummy
3  /distribution/virt/start                        /distribution/dummy
4  /kernel/networking/virt/add_iface               /distribution/dummy
5  /kernel/networking/virt/add_iface               /distribution/dummy
6  /distribution/command                           /distribution/dummy
7  /distribution/command                           /distribution/dummy
8  /kernel/distribution/lnst/install               /distribution/dummy
9  /distribution/command                           /distribution/dummy
10 /distribution/beaker/beah/misc/sync (SERVERS)   /kernel/distribution/lnst/install
11 /kernel/distribution/lnst/prepare-testbed       /kernel/distribution/lnst/prepare-testbed
...

The deadlock comes in this order:

1. Host runs all of tasks until task #9, where it runs
   wait4guesttasks /kernel/distribution/lnst/install
2. Guest executes all tasks until it gets to task #10, where it runs the task
   and executes report_finish() from rhts-test-runner.sh
   At this point it stalls since it fetched SERVERS=host1 and rhts-sync-blocks
3. Host stays at wait4guesttasks of command #9 infinitely

This looks like a bug in roles fetching for a task. The roles should be fetched from the tasks of same name only.

Version-Release number of selected component (if applicable):
Beaker-24.0

How reproducible:
100%

Steps to Reproduce:
1. rerun the job in description
2.
3.

Actual results:
Deadlock

Expected results:
No deadlock

Additional info:

Comment 1 Dan Callaghan 2017-12-06 06:45:18 UTC
(In reply to Jan Tluka from comment #0)
> This looks like a bug in roles fetching for a task. The roles should be
> fetched from the tasks of same name only.

What makes you say this?

The roles have always been based on the TESTORDER, each task will see the role variables for the corresponding task in the same position in each recipe. It has never been limited to tasks of the same name as far as I know.

In position 10 of your example you have /distribution/beaker/beah/misc/sync in the host recipe. Can you just use /distribution/dummy there instead with role=STANDALONE to avoid the syncing behaviour? And then let them sync on the start of task 11?

Comment 2 Jan Tluka 2017-12-06 07:54:29 UTC
(In reply to Dan Callaghan from comment #1)
> (In reply to Jan Tluka from comment #0)
> > This looks like a bug in roles fetching for a task. The roles should be
> > fetched from the tasks of same name only.
> 
> What makes you say this?

Hi Dan,

I think lack of some deeper documentation for this feature. Not sure why I thought it worked this way. Anyway ...
 
> The roles have always been based on the TESTORDER, each task will see the
> role variables for the corresponding task in the same position in each
> recipe. It has never been limited to tasks of the same name as far as I know.
> 
> In position 10 of your example you have /distribution/beaker/beah/misc/sync
> in the host recipe. Can you just use /distribution/dummy there instead with
> role=STANDALONE to avoid the syncing behaviour? And then let them sync on
> the start of task 11?


Thanks for explaining and pointing me to TESTORDER. Do I understand this correctly that adding role=STANDALONE would turn off the sync feature for a task?

Maybe it's just enough to add role=STANDALONE to lnst/install task. The beah/misc/sync task syncs with other task on second baremetal host.

Comment 3 Dan Callaghan 2017-12-07 06:27:09 UTC
(In reply to Jan Tluka from comment #2)
> Maybe it's just enough to add role=STANDALONE to lnst/install task. The
> beah/misc/sync task syncs with other task on second baremetal host.

Yeah this sounds like the right approach, if you don't actually want that task to sync with the others.


Note You need to log in before you can comment on or make changes to this bug.