Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1678549

Summary: restraint starts too early for the system to get ready for testing
Product: [Retired] Restraint Reporter: Jun'ichi NOMURA <junichi.nomura>
Component: generalAssignee: Martin Styk <mastyk>
Status: CLOSED CURRENTRELEASE QA Contact: tools-bugs <tools-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 0.1.32CC: asavkov, bpeck, breilly, cbeer, kazuhito.hagio, kueda, mastyk, tatsu-ab1, tumeya
Target Milestone: 0.1.40   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-14 12:59:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jun'ichi NOMURA 2019-02-19 04:37:57 UTC
Description of problem:
  restraint starts test programs before system is ready for testing.

Version-Release number of selected component (if applicable):
  restraint-0.1.32-1.el8+5

How reproducible:
  Not always but often depending on the kind of system and tasks

Steps to Reproduce:
  Run a test job on a system with slow link up of network interface.

Expected results:
  Restraint should start test program after the system is ready for testing.
  The system is considered as 'ready for testing' at least following
  systemd units are started if enabled:
    - network-online.target
    - remote-fs.target
    - kdump.service

Actual results:
  Restraint starts test program before above mentioned units are started.
  Examples:
    - Test program could not access expected data on NFS mount points
      in /etc/fstab, which should be available when systemd reached to
      remote-fs.target.
    - Crash dump could not be taken for panic occurred during the test
      because restraint had started the task while kdump service was
      still building up initramfs.

Additional information:
  We confirmed the problem could be fixed by adding following directive
  to /usr/lib/systemd/system/restraintd.service:
     After=network-online.target remote-fs.target kdump.service

  Since restraint is the only available harness for RHEL8, the problem
  affects our automated testing of RHEL8 in beaker. We currently apply
  the above workaround using local beaker snippet.

  The same ordering problem exists in beah though it occurs less often.

Comment 1 Martin Styk 2019-02-19 14:18:42 UTC
I'm not sure if it is a good idea to postpone the execution of restraintd.
Some teams may require restraintd execution as soon as possible.

You may write a task which will check status (for example of NFS mount points and actively waiting for availability)  before further execution.
In this situation, you don't have to even change restraintd.service or am I missing something?

Comment 2 Jun'ichi NOMURA 2019-02-20 05:21:37 UTC
Test harness should provide stable environment for test programs unless otherwise requested.

Each test program does not explicitly request NFS mount point and/or kdump service. It's the
system that is configured to mount NFS and/or enable kdump. So requiring all test programs
to inspect the system configuration and wait for several things, is not very reasonable.

Rather, I think it's natural for restraint, as a test harness, to wait the start of test
program until the system reaches to stable point.

(In reply to Martin Styk from comment #1)
> Some teams may require restraintd execution as soon as possible.

Do you have such requirement? If so, it may be good to have an option to allow such test
programs to declare itself as 'start me as soon as possible'.
But I think the default should be good for many normal test programs.

Comment 3 Bill Peck 2019-02-22 15:32:44 UTC
Hello,

I understand you want a stable environment but what you define as stable may not be what someone else does.  

Currently we wait for network to be up and time to be synced with ntp or chrony.  This is the minimum because we can't fetch the recipe without the network and if the tasks are over SSL we need proper time for encryption to work peoperly.

As it is I get complaints that restraintd is not running right away when the system boots up.  I would not want to add additional delays to that.

But because we are using systemd it is very easy to add overrides which your testing can do or kickstart %post can do.

# mkdir -p /etc/systemd/system/restraint.d

# cat <<EOF > /etc/systemd/system/restraint.d/customdependency.conf
[Unit]
Requires=new dependency
After=new dependency
EOF

I hope this makes sense.

Ultimately the tests themselves are responsible for a stable environment.  Only the test can know what that is.

Comment 4 Jun'ichi NOMURA 2019-02-25 06:17:22 UTC
(In reply to Bill Peck from comment #3)
> Currently we wait for network to be up and time to be synced with ntp or
> chrony.  This is the minimum because we can't fetch the recipe without the
> network

To achieve that, I think you need After=network-online.target.

> As it is I get complaints that restraintd is not running right away when the
> system boots up.  I would not want to add additional delays to that.

OK. But their task might suddenly or intermittently fail if it runs on system
that depends on remote-fs.target, for example.

And just adding 'After=' doesn't add delay if the specified unit is not being
activated on the system.

Comment 5 Martin Styk 2019-02-25 08:54:28 UTC
(In reply to Jun'ichi NOMURA from comment #4)
> (In reply to Bill Peck from comment #3)
> > Currently we wait for network to be up and time to be synced with ntp or
> > chrony.  This is the minimum because we can't fetch the recipe without the
> > network
> 
> To achieve that, I think you need After=network-online.target.

Requires - Configures dependencies on other units. The units listed in Requires are activated together with the unit. If any of the required units fail to start, the unit is not activated.
This is sufficient for us. network-online.target is activated with restraintd.service. 

> 
> > As it is I get complaints that restraintd is not running right away when the
> > system boots up.  I would not want to add additional delays to that.
> 
> OK. But their task might suddenly or intermittently fail if it runs on system
> that depends on remote-fs.target, for example.

Yes, and they should add this dependency via %post or create an additional task to ensure remote-fs.target is running.

remote-fs.target unit depends on user test case, however, network.target, time-sync.target, and network-online.target are dependencies for Restraint itself as Bill explained. 

> 
> And just adding 'After=' doesn't add delay if the specified unit is not being
> activated on the system.

Comment 6 Jun'ichi NOMURA 2019-02-25 09:53:39 UTC
(In reply to Martin Styk from comment #5)
> > > Currently we wait for network to be up and time to be synced with ntp or
> > > chrony.  This is the minimum because we can't fetch the recipe without the
> > > network
> > 
> > To achieve that, I think you need After=network-online.target.
> 
> Requires - Configures dependencies on other units. The units listed in
> Requires are activated together with the unit. If any of the required units
> fail to start, the unit is not activated.
> This is sufficient for us. network-online.target is activated with
> restraintd.service. 

Could you elaborate on why it is sufficient?
Without 'After', systemd doesn't wait for the network to be up before
starting restraintd.service.
Where does the 'wait' happen?

Comment 7 Martin Styk 2019-03-01 07:47:53 UTC
There is not 'wait'. Restraint and network-online are started at the same moment and it is still provided to us enough time before we start fetching the data. However, based on a discussion between me and Bill we decided to include network-online.target to 'After'.

Comment 8 Jun'ichi NOMURA 2019-03-01 08:37:31 UTC
(In reply to Martin Styk from comment #7)
> However, based on a discussion between me and Bill we decided to
> include network-online.target to 'After'.

Thank you!

(In reply to Jun'ichi NOMURA from comment #4)
> And just adding 'After=' doesn't add delay if the specified unit is not being
> activated on the system.

What do you think about this?

I mean users who want quicker start up can do 'systemctl disable kdump.service'.
That's safer than users/admins to start playing various systemd config tricks and cause unexpected problems.

Comment 9 Martin Styk 2019-03-01 11:26:45 UTC
(In reply to Jun'ichi NOMURA from comment #8)
> (In reply to Martin Styk from comment #7)
> > However, based on a discussion between me and Bill we decided to
> > include network-online.target to 'After'.
> 
> Thank you!
> 
> (In reply to Jun'ichi NOMURA from comment #4)
> > And just adding 'After=' doesn't add delay if the specified unit is not being
> > activated on the system.
> 
> What do you think about this?
> 
> I mean users who want quicker start up can do 'systemctl disable
> kdump.service'.
> That's safer than users/admins to start playing various systemd config
> tricks and cause unexpected problems.

I think we should keep it as it is. Restraintd.service doesn't really depend on kdump.
But as Bill mentioned you can update your kickstart and store it in post script.

# mkdir -p /etc/systemd/system/restraint.d

# cat <<EOF > /etc/systemd/system/restraint.d/customdependency.conf
[Unit]
Requires=new dependency <-- You don't need this
After=new dependency <-- kdump.service 
EOF

Comment 10 Martin Styk 2019-09-10 05:56:14 UTC
Restraint 0.1.40 has been released.