Bug 1719366

Summary: OnCalendar timers with RandomizedDelaySec != 0 may never run
Product: Red Hat Enterprise Linux 7 Reporter: Renaud Métrich <rmetrich>
Component: systemdAssignee: systemd-maint
Status: CLOSED WONTFIX QA Contact: Frantisek Sumsal <fsumsal>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.6CC: dtardon, fsumsal, systemd-maint-list, systemd-maint
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1719364 Environment:
Last Closed: 2020-11-11 21:46:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Renaud Métrich 2019-06-11 15:25:21 UTC
+++ This bug was initially created as a clone of Bug #1719364 +++

Description of problem:

When a OnCalendar timer is used and RandomizedDelaySec is not zero (case of insights-client.timer for example), the timer may never run, typically when systemd regularly reloads its configuration.


Version-Release number of selected component (if applicable):

systemd-219


How reproducible:

Always


Steps to Reproduce:
1. Create a dummy service

# cat << EOF > /etc/systemd/system/my.service
[Unit]
Description=My one-shot service triggered by a timer
[Service]
Type=oneshot
ExecStart=/bin/echo "I'm running"
EOF

2. Create a timer triggered on calendar with random delay (here, every 10 minutes + 5 minutes delay)

# cat << EOF > /etc/systemd/system/my.timer
[Timer]
OnCalendar=*-*-* *:5,15,25,35,45,55:00
RandomizedDelaySec=300
EOF

3. Set debug level and reload systemd

# systemd-analyze set-log-level debug
# systemctl daemon-reload

4. Follow the journal and start the timer

# journalctl --follow -u my.service -u my.timer &
# systemctl start my.timer

5. Just before timer elapses (but after expected time, issue systemctl daemon-reload)

Journal:

Jun 11 17:08:24 vm-rhel8 systemd[1]: my.timer: Adding 3min 35.811652s random time.
Jun 11 17:08:24 vm-rhel8 systemd[1]: my.timer: Realtime timer elapses at Tue 2019-06-11 17:18:35 CEST.

--> issue "systemctl daemon-reload" between 17:15 and 17:18
In the example below, it was done at 17:16:

Jun 11 17:16:01 vm-rhel8 systemd[1]: my.timer: Adding 4min 19.759514s random time.
Jun 11 17:16:01 vm-rhel8 systemd[1]: my.timer: Realtime timer elapses at Tue 2019-06-11 17:29:19 CEST.

--> service will not run before 17:29

...

Additional info:

This is a huge issue for timer elapsing only once a day, such as insights-client.timer which runs Daily + 4 hours:
as soon as a daemon-reload occurs between midnight and 4 o'clock, the timer won't elapse until next day.

Comment 2 David Tardon 2020-04-28 06:31:34 UTC
Doing a lot of reloads is not a typical scenario, I assume...

Comment 3 RHEL Program Management 2020-04-28 06:31:42 UTC
Development Management has reviewed and declined this request. You may appeal this decision by using your Red Hat support channels, who will make certain  the issue receives the proper prioritization with product and development management.

https://www.redhat.com/support/process/production/#howto

Comment 4 Renaud Métrich 2020-04-28 07:05:09 UTC
I disagree with the decision, this bug is hit regularly by our customers having insights-client installed.

The insights-client.timer unit is configured with this:

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
[Timer]
OnCalendar=daily
RandomizedDelaySec=14400
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

"daily" means every midnight 00:00:00 from systemd.time(7)
14400 delay is 4 hours

With this setup, it is enough that *one* reload happens some time after midnight for insights-client to not run that day, something that customers notice through their monitoring tools.
Usually automatic administration (ansible, puppet) is done during the night during these hours, this kind of administration tasks definitely causes spurious reloads leading to the issue.

Depending on the administration task performed, we may take a very long time, the spurious reloads can lead to not executing insights-client timer at all, as shown in the example below:

1. Timestamp 00:00 (midnight): say systemd configured insights-client to run at 2 o'clock (middle of the interval)
2. Timestamp 01:30: admin task provokes a reload implicitly (e.g. because a "systemctl enable foo" is executed, *even* when foo was already enabled ... systemctl *will* reload in any case)
  -> systemd reconfigures insights-client to run NEXT DAY (because we are past midnight, so next midnight, being the reference for timer computation will be the day after)

  -> insights-client not executing today

Since everyday the admin task will reload implicitly, insights-clients will *never* run.

Hence, please reconsider.

Comment 6 Chris Williams 2020-11-11 21:46:31 UTC
Red Hat Enterprise Linux 7 shipped it's final minor release on September 29th, 2020. 7.9 was the last minor releases scheduled for RHEL 7.
From intial triage it does not appear the remaining Bugzillas meet the inclusion criteria for Maintenance Phase 2 and will now be closed. 

From the RHEL life cycle page:
https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_2_Phase
"During Maintenance Support 2 Phase for Red Hat Enterprise Linux version 7,Red Hat defined Critical and Important impact Security Advisories (RHSAs) and selected (at Red Hat discretion) Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available."

If this BZ was closed in error and meets the above criteria please re-open it flag for 7.9.z, provide suitable business and technical justifications, and follow the process for Accelerated Fixes:
https://source.redhat.com/groups/public/pnt-cxno/pnt_customer_experience_and_operations_wiki/support_delivery_accelerated_fix_release_handbook  

Feature Requests can re-opened and moved to RHEL 8 if the desired functionality is not already present in the product. 

Please reach out to the applicable Product Experience Engineer[0] if you have any questions or concerns.  

[0] https://bugzilla.redhat.com/page.cgi?id=agile_component_mapping.html&product=Red+Hat+Enterprise+Linux+7