Bug 1678549
| Summary: | restraint starts too early for the system to get ready for testing | ||
|---|---|---|---|
| Product: | [Retired] Restraint | Reporter: | Jun'ichi NOMURA <junichi.nomura> |
| Component: | general | Assignee: | Martin Styk <mastyk> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | tools-bugs <tools-bugs> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 0.1.32 | CC: | asavkov, bpeck, breilly, cbeer, kazuhito.hagio, kueda, mastyk, tatsu-ab1, tumeya |
| Target Milestone: | 0.1.40 | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-03-14 12:59:35 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I'm not sure if it is a good idea to postpone the execution of restraintd. Some teams may require restraintd execution as soon as possible. You may write a task which will check status (for example of NFS mount points and actively waiting for availability) before further execution. In this situation, you don't have to even change restraintd.service or am I missing something? Test harness should provide stable environment for test programs unless otherwise requested. Each test program does not explicitly request NFS mount point and/or kdump service. It's the system that is configured to mount NFS and/or enable kdump. So requiring all test programs to inspect the system configuration and wait for several things, is not very reasonable. Rather, I think it's natural for restraint, as a test harness, to wait the start of test program until the system reaches to stable point. (In reply to Martin Styk from comment #1) > Some teams may require restraintd execution as soon as possible. Do you have such requirement? If so, it may be good to have an option to allow such test programs to declare itself as 'start me as soon as possible'. But I think the default should be good for many normal test programs. Hello, I understand you want a stable environment but what you define as stable may not be what someone else does. Currently we wait for network to be up and time to be synced with ntp or chrony. This is the minimum because we can't fetch the recipe without the network and if the tasks are over SSL we need proper time for encryption to work peoperly. As it is I get complaints that restraintd is not running right away when the system boots up. I would not want to add additional delays to that. But because we are using systemd it is very easy to add overrides which your testing can do or kickstart %post can do. # mkdir -p /etc/systemd/system/restraint.d # cat <<EOF > /etc/systemd/system/restraint.d/customdependency.conf [Unit] Requires=new dependency After=new dependency EOF I hope this makes sense. Ultimately the tests themselves are responsible for a stable environment. Only the test can know what that is. (In reply to Bill Peck from comment #3) > Currently we wait for network to be up and time to be synced with ntp or > chrony. This is the minimum because we can't fetch the recipe without the > network To achieve that, I think you need After=network-online.target. > As it is I get complaints that restraintd is not running right away when the > system boots up. I would not want to add additional delays to that. OK. But their task might suddenly or intermittently fail if it runs on system that depends on remote-fs.target, for example. And just adding 'After=' doesn't add delay if the specified unit is not being activated on the system. (In reply to Jun'ichi NOMURA from comment #4) > (In reply to Bill Peck from comment #3) > > Currently we wait for network to be up and time to be synced with ntp or > > chrony. This is the minimum because we can't fetch the recipe without the > > network > > To achieve that, I think you need After=network-online.target. Requires - Configures dependencies on other units. The units listed in Requires are activated together with the unit. If any of the required units fail to start, the unit is not activated. This is sufficient for us. network-online.target is activated with restraintd.service. > > > As it is I get complaints that restraintd is not running right away when the > > system boots up. I would not want to add additional delays to that. > > OK. But their task might suddenly or intermittently fail if it runs on system > that depends on remote-fs.target, for example. Yes, and they should add this dependency via %post or create an additional task to ensure remote-fs.target is running. remote-fs.target unit depends on user test case, however, network.target, time-sync.target, and network-online.target are dependencies for Restraint itself as Bill explained. > > And just adding 'After=' doesn't add delay if the specified unit is not being > activated on the system. (In reply to Martin Styk from comment #5) > > > Currently we wait for network to be up and time to be synced with ntp or > > > chrony. This is the minimum because we can't fetch the recipe without the > > > network > > > > To achieve that, I think you need After=network-online.target. > > Requires - Configures dependencies on other units. The units listed in > Requires are activated together with the unit. If any of the required units > fail to start, the unit is not activated. > This is sufficient for us. network-online.target is activated with > restraintd.service. Could you elaborate on why it is sufficient? Without 'After', systemd doesn't wait for the network to be up before starting restraintd.service. Where does the 'wait' happen? There is not 'wait'. Restraint and network-online are started at the same moment and it is still provided to us enough time before we start fetching the data. However, based on a discussion between me and Bill we decided to include network-online.target to 'After'. (In reply to Martin Styk from comment #7) > However, based on a discussion between me and Bill we decided to > include network-online.target to 'After'. Thank you! (In reply to Jun'ichi NOMURA from comment #4) > And just adding 'After=' doesn't add delay if the specified unit is not being > activated on the system. What do you think about this? I mean users who want quicker start up can do 'systemctl disable kdump.service'. That's safer than users/admins to start playing various systemd config tricks and cause unexpected problems. (In reply to Jun'ichi NOMURA from comment #8) > (In reply to Martin Styk from comment #7) > > However, based on a discussion between me and Bill we decided to > > include network-online.target to 'After'. > > Thank you! > > (In reply to Jun'ichi NOMURA from comment #4) > > And just adding 'After=' doesn't add delay if the specified unit is not being > > activated on the system. > > What do you think about this? > > I mean users who want quicker start up can do 'systemctl disable > kdump.service'. > That's safer than users/admins to start playing various systemd config > tricks and cause unexpected problems. I think we should keep it as it is. Restraintd.service doesn't really depend on kdump. But as Bill mentioned you can update your kickstart and store it in post script. # mkdir -p /etc/systemd/system/restraint.d # cat <<EOF > /etc/systemd/system/restraint.d/customdependency.conf [Unit] Requires=new dependency <-- You don't need this After=new dependency <-- kdump.service EOF Restraint 0.1.40 has been released. |
Description of problem: restraint starts test programs before system is ready for testing. Version-Release number of selected component (if applicable): restraint-0.1.32-1.el8+5 How reproducible: Not always but often depending on the kind of system and tasks Steps to Reproduce: Run a test job on a system with slow link up of network interface. Expected results: Restraint should start test program after the system is ready for testing. The system is considered as 'ready for testing' at least following systemd units are started if enabled: - network-online.target - remote-fs.target - kdump.service Actual results: Restraint starts test program before above mentioned units are started. Examples: - Test program could not access expected data on NFS mount points in /etc/fstab, which should be available when systemd reached to remote-fs.target. - Crash dump could not be taken for panic occurred during the test because restraint had started the task while kdump service was still building up initramfs. Additional information: We confirmed the problem could be fixed by adding following directive to /usr/lib/systemd/system/restraintd.service: After=network-online.target remote-fs.target kdump.service Since restraint is the only available harness for RHEL8, the problem affects our automated testing of RHEL8 in beaker. We currently apply the above workaround using local beaker snippet. The same ordering problem exists in beah though it occurs less often.