Bug 1147807 - beah can get stuck in systemd dependency cycle
Summary: beah can get stuck in systemd dependency cycle
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Beaker
Classification: Retired
Component: beah
Version: 0.18
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: 19.0
Assignee: Dan Callaghan
QA Contact: tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-09-30 06:18 UTC by Dan Callaghan
Modified: 2018-02-06 00:41 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-11-25 07:18:47 UTC
Embargoed:


Attachments (Terms of Use)

Description Dan Callaghan 2014-09-30 06:18:11 UTC
Description of problem:
Beah can get stuck in systemd dependency cycle on Fedora 20 under some circumstances.

Version-Release number of selected component (if applicable):
beah-0.7.6-1.fc19.noarch
systemd-208-9.fc20.x86_64

How reproducible:
We found this while testing bug 1108455, booting the Fedora 20 cloud image and then installing beah into it. The problem does not seem to reproduce in ordinary installations of Fedora 20.

Steps to Reproduce:
1. As per bug 1108455, boot Fedora-x86_64-20-20140407-sda.qcow2 in a guest and install beah, then reboot.

Actual results:
The system boots, but beah does not start and so the recipe is not run. systemd seems to be stuck in a dependency cycle:

[root@dhcp70-216 ~]# systemctl list-jobs
JOB UNIT                                 TYPE  STATE  
 88 multi-user.target                    start waiting
145 systemd-readahead-done.timer         start waiting
148 beah-beaker-backend.service          start waiting
156 beah-srv.service                     start waiting
157 beah-fwd-backend.service             start waiting
168 systemd-update-utmp-runlevel.service start waiting

All six jobs are waiting for nothing (or each other, I suspect).

Expected results:
System boots, beah starts, runs recipe.

Additional info:
There are some differences in the cloud image vs. a normal installation, which might be triggering this bug:
* default.target is multi-user.target, not graphical.target
* systemd-readahead stuff is disabled
* some other services missing/disabled???

Comment 1 Dan Callaghan 2014-09-30 06:21:25 UTC
The cause seems to be the (undocumented) hack we used for making beah start after readahead collection in bug 1072284. Reverting that hack allows all services to start normally.

Given that the readahead hack has proven quite difficult to get right, my plan is to revert it and update Beaker to disable systemd-readahead for Beaker recipes, the same as we disable the readahead service on RHEL6. Systemd has already been removed systemd-readahead in their next release and it will presumably be dropped from Fedora as well.

Comment 2 Dan Callaghan 2014-09-30 06:27:34 UTC
Unfortunately removing After=multi-user.target will effectively remove the workaround which was allowing beah to avoid bug 967502 (EIO from /dev/console, due to racing with plymouth) so we will need to fix that too.

Comment 3 Dan Callaghan 2014-09-30 06:44:29 UTC
Beah patch to revert the ordering stuff:
http://gerrit.beaker-project.org/3360

Beaker patch to disable systemd-readahead collection:
http://gerrit.beaker-project.org/3361

Comment 5 Amit Saha 2014-10-01 00:56:38 UTC
(In reply to Dan Callaghan from comment #0)
> Description of problem:
> Beah can get stuck in systemd dependency cycle on Fedora 20 under some
> circumstances.
> 
> Version-Release number of selected component (if applicable):
> beah-0.7.6-1.fc19.noarch
> systemd-208-9.fc20.x86_64
> 
> How reproducible:
> We found this while testing bug 1108455, booting the Fedora 20 cloud image
> and then installing beah into it. The problem does not seem to reproduce in
> ordinary installations of Fedora 20.

I believe the issue that xma found and that you saw as well is one of the cloud services still running which doesn't allow the multi-user target to reach. For example, the cloud-final service once took a while to complete and hence for the multi-user.target to reach. On the other hand, on another try, I could get beah services running almost instantly upon boot with Fedora 20 cloud image.

I am pretty sure this will *not* be seen all the time but at times when the cloud-final or one of the other services need a while to complete doing whatever they are. I think that explains why mjia didn't come across it. 

Here is a quick script to try out fedora cloud with beah installed: http://fpaste.org/137975/12124481/

Hence, whereas removing the dependency hacks on systemd-readahead-done doesn't do any harm for various reasons, i think we may still see this happen even after that.

Comment 6 Dan Callaghan 2014-10-01 01:03:41 UTC
(In reply to Amit Saha from comment #5)

In the cases where systemd got stuck, the cloud-init services were finished starting. Otherwise the start job would be listed in systemctl list-jobs as running. I think the problem is caused by something else.

Comment 9 Dan Callaghan 2014-11-25 07:18:47 UTC
Beaker 19.0 has been released.


Note You need to log in before you can comment on or make changes to this bug.