Bug 1941335
| Summary: | Starting raid-check.timer renders system unusable | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Jonathan Dieter <jonathan> | ||||||
| Component: | systemd | Assignee: | systemd-maint | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | urgent | ||||||||
| Version: | 33 | CC: | aireilly, bugzilla, cleaver-redhat, cramerd, dominik, ed.greshko, ego.cordatus, fedoraproject, filbranden, flepied, fweimer, gryan, jen, john.kissane, kasong, kevin, lnykryn, mramendi, msekleta, murphy.john69, ol+redhat, przemo, rjones, samuel-rhbugs, scorreia, sgraf, ssahani, s, systemd-maint, tom, yuwatana, zbyszek | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | systemd-248~rc4-3.fc34 systemd-246.13-1.fc33 | Doc Type: | If docs needed, set a value | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2021-03-25 00:18:55 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Jonathan Dieter
2021-03-21 17:08:53 UTC
Created attachment 1765087 [details]
raid-check.timer
Additional experimentation has confirmed that this is date related. If the date is after Mar 3rd, 2021 at 1:00AM, this bug will be triggered. Possibly an overflow issue? The bug is causing people's systems to fail to boot without a clear cause, leading to several posts on reddit and discussion on IRC. I was hit too; after being pointed to this bugzilla entry I was able to recover and will now post some instructions on reddit. Because of the massive user impact, I have upgraded severity and urgency to maximum available values. A workaround is to add "systemd.mask=raid-check.timer" to the kernel command line when booting which should allow the machine to boot after which "systemctl disable raid-check.timer" can be used to prevent a recurrence. Created attachment 1765096 [details]
Journal from affected system
This is from a system with a clean install of F33.
(In reply to Tom Hughes from comment #4) > A workaround is to add "systemd.mask=raid-check.timer" to the kernel command > line when booting which should allow the machine to boot after which > "systemctl disable raid-check.timer" can be used to prevent a recurrence. It looks like systemd won't allow you to disable a masked service, even if it's masked in the kernel command line. If using the above workaround, you'll need to run the following to manually remove the timer: rm /etc/systemd/system/timers.target.wants/raid-check.timer Some good news: as chrisawi pointed out on IRC, it looks like this is tied to the Europe/Dublin time zone. Switching to Etc/UTC, Europe/London, or other time zones fixes the problem. Unfortunately I can't reproduce this here...
The most likely explanation is some infinite loop in the timer handling code. Could
someone who is affected provide a stack trace (with 'gdb -p1' or 'pstack 1'), or maybe
a core file ('kill -ABRT 1' and then look look in the journal for information in the core file
and upload it here).
I see the issue. This also triggers an issue
[egreshko@f33g ~]$ date
Mon Mar 22 06:57:03 CST 2021
[egreshko@f33g ~]$ sudo systemctl --now disable raid-check.timer
Removed /etc/systemd/system/timers.target.wants/raid-check.timer.
[egreshko@f33g ~]$ sudo systemctl status raid-check.timer
● raid-check.timer - Weekly RAID setup health check
Loaded: loaded (/usr/lib/systemd/system/raid-check.timer; disabled; vendor p>
Active: inactive (dead)
Trigger: n/a
Triggers: ● raid-check.service
Mar 22 06:56:47 f33g.greshko.com systemd[1]: Started Weekly RAID setup health che>
Mar 22 07:17:56 f33g.greshko.com systemd[1]: raid-check.timer: Succeeded.
Mar 22 07:17:56 f33g.greshko.com systemd[1]: Stopped Weekly RAID setup health che>
[egreshko@f33g ~]$ timedatectl status | grep zone
Time zone: Asia/Taipei (CST, +0800)
[egreshko@f33g ~]$ sudo timedatectl set-timezone Europe/Dublin
[egreshko@f33g ~]$ sudo timedatectl set-timezone Asia/Taipei
Note there is no problem. And then,
[egreshko@f33g ~]$ sudo systemctl --now enable raid-check.timer
Created symlink /etc/systemd/system/timers.target.wants/raid-check.timer → /usr/lib/systemd/system/raid-check.timer.
[egreshko@f33g ~]$ sudo timedatectl set-timezone Europe/Dublin
[egreshko@f33g ~]$ sudo timedatectl set-timezone Asia/Taipei
Failed to set time zone: Connection timed out
Is this affecting all DST transitions? FEDORA-2021-ea92e5703f has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-ea92e5703f FEDORA-2021-1c1a870ceb has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2021-1c1a870ceb FEDORA-2021-ea92e5703f has been pushed to the Fedora 34 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-ea92e5703f` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-ea92e5703f See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. *** Bug 1942298 has been marked as a duplicate of this bug. *** FEDORA-2021-1c1a870ceb has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2021-1c1a870ceb FEDORA-2021-ea92e5703f has been pushed to the Fedora 34 stable repository. If problem still persists, please make note of it in this bug report. FEDORA-2021-1c1a870ceb has been pushed to the Fedora 33 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-1c1a870ceb` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-1c1a870ceb See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2021-1c1a870ceb has been pushed to the Fedora 33 stable repository. If problem still persists, please make note of it in this bug report. |