Bug 1188664 - beah sometimes starts before the network is ready, causing the recipe to hang
Summary: beah sometimes starts before the network is ready, causing the recipe to hang
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Beaker
Classification: Retired
Component: beah
Version: 19
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: 21.0
Assignee: Dan Callaghan
QA Contact: tools-bugs
URL:
Whiteboard:
Depends On:
Blocks: 1084527 1085937
TreeView+ depends on / blocked
 
Reported: 2015-02-03 13:51 UTC by Pavel Holica
Modified: 2018-02-06 00:41 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-08-26 06:17:35 UTC
Embargoed:


Attachments (Terms of Use)

Comment 3 Pavel Holica 2015-02-09 12:00:48 UTC
After some debugging, the issue is that beah starts before network is up, and before /etc/resolv.conf contains nameservers. This leads to beah resolving from localhost DNS server (which is not running).

The same issue applies to grosse harness (both are written in python using same resolving code).

Looking to: http://www.freedesktop.org/software/systemd/man/systemd.unit.html
There should be probably both After= and Requires= (or Wants=) in unit files.

I've checked if NetworkManager-wait-online.service is enabled, and it's not, so we're facing race condition here and reason why it can be seen on arm is because the system is slow :) On another arm system, I'm not seeing this issue.

# systemctl is-enabled NetworkManager-wait-online.service
disabled

# systemctl status NetworkManager-wait-online.service 
NetworkManager-wait-online.service - Network Manager Wait Online
   Loaded: loaded (/usr/lib/systemd/system/NetworkManager-wait-online.service; disabled)
   Active: inactive (dead)

Content of current beah-beaker-backend.service file:
[Unit]
Description=The Beaker backend server.
After=network.target NetworkManager-wait-online.service time-sync.target

[Service]
Type=simple
ExecStart=/usr/bin/beah-beaker-backend

[Install]
WantedBy=multi-user.target

Comment 4 Pavel Holica 2015-02-09 14:22:11 UTC
Workaround for beah is to put following lines to kickstart:
%post
systemctl enable NetworkManager-wait-online.service
%end

Comment 5 Nick Coghlan 2015-02-10 02:47:49 UTC
Thanks for the explanation Pavel - we'll aim to get this into the next harness release.

Comment 6 Dan Callaghan 2015-02-13 05:50:18 UTC
We can't just unconditionally require and/or enable NetworkManager-wait-online.service because the recipe might not be using NetworkManager...

The real fix is to make beah use a sane resolver (i.e. the glibc one) that actually obeys changes in /etc/resolv.conf instead of Twisted's flawed reimplementation. Sigh.

Comment 7 Pavel Holica 2015-02-13 08:22:11 UTC
Just to make decision easier. NM is default in RHEL-7, it's even in core group so test that doesn't want to use NM needs to already take extra steps disabling it (either masking it or disabling it in ifcfg script), so additional extra step in masking NetworkManager-wait-online.service in such tests shouln't be problem.

I'm aware that this may break some tests, but noting this change in some release notes should be ok.

Comment 8 Dan Callaghan 2015-04-07 03:58:45 UTC
After reading http://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ I think the right answer is actually for the beah services to have:

After=network-online.target
Wants=network-online.target

That way it is independent of the network management daemon, *and* it doesn't impact any other services on the system (which enabling NetworkManager-wait-online.service would).

Comment 9 Dan Callaghan 2015-07-16 07:23:48 UTC
http://gerrit.beaker-project.org/4301

Comment 10 Dan Callaghan 2015-07-16 07:51:20 UTC
Test builds are here:
http://galangal.usersys.redhat.com/~dcallagh/bz1188664/

To try it in your RHEL7 recipes:
<repos>
  <repo name="beaker-harness-bz1188664" url="http://galangal.usersys.redhat.com/~dcallagh/bz1188664/" />
</repos>

Comment 13 Dan Callaghan 2015-08-26 06:17:35 UTC
Beaker 21.0 has been released.


Note You need to log in before you can comment on or make changes to this bug.