Description of problem: I hit a job deadlock with job: https://beaker.engineering.redhat.com/jobs/1729599 With a manual debugging I found out that guest's task /kernel/distribution/lnst/install fetched SERVERS and CLIENTS from host's /distribution/beaker/beah/misc/sync task. Therefore the guest's task gets stalled on report_result() in rhts-test-runner.sh: <snip> rhts-sync-set -s DONE rhts-sync-block -s DONE $SERVERS $CLIENTS $DRIVER </snip> I tried to remove this multihost task from the job and it helped: https://beaker.engineering.redhat.com/jobs/1730043 Just to show the task layout within the deadlocked job, for simplicity I'm showing only one host and one guest, No| host1 | host1-guest1 ------------------------------------------------------------------------------- 1 /distribution/virt/image-install /distribution/dummy 2 /distribution/beaker/beah/misc/sync /distribution/dummy 3 /distribution/virt/start /distribution/dummy 4 /kernel/networking/virt/add_iface /distribution/dummy 5 /kernel/networking/virt/add_iface /distribution/dummy 6 /distribution/command /distribution/dummy 7 /distribution/command /distribution/dummy 8 /kernel/distribution/lnst/install /distribution/dummy 9 /distribution/command /distribution/dummy 10 /distribution/beaker/beah/misc/sync (SERVERS) /kernel/distribution/lnst/install 11 /kernel/distribution/lnst/prepare-testbed /kernel/distribution/lnst/prepare-testbed ... The deadlock comes in this order: 1. Host runs all of tasks until task #9, where it runs wait4guesttasks /kernel/distribution/lnst/install 2. Guest executes all tasks until it gets to task #10, where it runs the task and executes report_finish() from rhts-test-runner.sh At this point it stalls since it fetched SERVERS=host1 and rhts-sync-blocks 3. Host stays at wait4guesttasks of command #9 infinitely This looks like a bug in roles fetching for a task. The roles should be fetched from the tasks of same name only. Version-Release number of selected component (if applicable): Beaker-24.0 How reproducible: 100% Steps to Reproduce: 1. rerun the job in description 2. 3. Actual results: Deadlock Expected results: No deadlock Additional info:
(In reply to Jan Tluka from comment #0) > This looks like a bug in roles fetching for a task. The roles should be > fetched from the tasks of same name only. What makes you say this? The roles have always been based on the TESTORDER, each task will see the role variables for the corresponding task in the same position in each recipe. It has never been limited to tasks of the same name as far as I know. In position 10 of your example you have /distribution/beaker/beah/misc/sync in the host recipe. Can you just use /distribution/dummy there instead with role=STANDALONE to avoid the syncing behaviour? And then let them sync on the start of task 11?
(In reply to Dan Callaghan from comment #1) > (In reply to Jan Tluka from comment #0) > > This looks like a bug in roles fetching for a task. The roles should be > > fetched from the tasks of same name only. > > What makes you say this? Hi Dan, I think lack of some deeper documentation for this feature. Not sure why I thought it worked this way. Anyway ... > The roles have always been based on the TESTORDER, each task will see the > role variables for the corresponding task in the same position in each > recipe. It has never been limited to tasks of the same name as far as I know. > > In position 10 of your example you have /distribution/beaker/beah/misc/sync > in the host recipe. Can you just use /distribution/dummy there instead with > role=STANDALONE to avoid the syncing behaviour? And then let them sync on > the start of task 11? Thanks for explaining and pointing me to TESTORDER. Do I understand this correctly that adding role=STANDALONE would turn off the sync feature for a task? Maybe it's just enough to add role=STANDALONE to lnst/install task. The beah/misc/sync task syncs with other task on second baremetal host.
(In reply to Jan Tluka from comment #2) > Maybe it's just enough to add role=STANDALONE to lnst/install task. The > beah/misc/sync task syncs with other task on second baremetal host. Yeah this sounds like the right approach, if you don't actually want that task to sync with the others.