Bug 1941335 - Starting raid-check.timer renders system unusable
Summary: Starting raid-check.timer renders system unusable
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 33
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1942298 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-21 17:08 UTC by Jonathan Dieter
Modified: 2021-03-27 01:11 UTC (History)
32 users (show)

Fixed In Version: systemd-248~rc4-3.fc34 systemd-246.13-1.fc33
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-25 00:18:55 UTC
Type: Bug


Attachments (Terms of Use)
raid-check.timer (155 bytes, text/plain)
2021-03-21 17:11 UTC, Jonathan Dieter
no flags Details
Journal from affected system (134.00 KB, text/plain)
2021-03-21 18:39 UTC, Jonathan Dieter
no flags Details

Description Jonathan Dieter 2021-03-21 17:08:53 UTC
Description of problem:
If raid-check.timer is started, the system ends up unusable with systemd no longer responding and loads of zombie processes.  This seems like it may be triggered by a specific date.

Version-Release number of selected component (if applicable):
systemd-246.10-1.fc33.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Set date to Mar 21st, 2021 (I think it's date related?)
2. Run `systemctl start raid-check.timer`

Actual results:
The system becomes unusable, with any systemctl commands hanging and a long list of zombie processes.

Expected results:
The system boots as normal

Additional info:
I'm assigning this to systemd because it seems to be a problem with how systemd is handling the timer file.  Starting raid-check.service works with no problems whatsoever, so it doesn't seem to be a problem with mdadm at all.

When tracking down the bug, I attempted to do a clean install of Fedora on one of my systems, and it turns out that raid-check.timer is enabled by default, which froze the new install.  This means that the problem affects the version of systemd on F33 GA as well as the latest updates.

I suspect that it has something to do with the day/date since I couldn't find any indication of anyone else seeing this bug before today.

A simple workaround is to boot into single user mode and disable raid-check.timer.  Unfortunately, this requires a root password.

Comment 1 Jonathan Dieter 2021-03-21 17:11:27 UTC
Created attachment 1765087 [details]
raid-check.timer

Comment 2 Jonathan Dieter 2021-03-21 17:54:02 UTC
Additional experimentation has confirmed that this is date related.  If the date is after Mar 3rd, 2021 at 1:00AM, this bug will be triggered.  Possibly an overflow issue?

Comment 3 Misha Ramendik 2021-03-21 18:24:10 UTC
The bug is causing people's systems to fail to boot without a clear cause, leading to several posts on reddit and discussion on IRC. I was hit too; after being pointed  to this bugzilla entry I was able to recover and will now post some instructions on reddit.

Because of the massive user impact, I have upgraded severity and urgency to maximum available values.

Comment 4 Tom Hughes 2021-03-21 18:31:57 UTC
A workaround is to add "systemd.mask=raid-check.timer" to the kernel command line when booting which should allow the machine to boot after which "systemctl disable raid-check.timer" can be used to prevent a recurrence.

Comment 5 Jonathan Dieter 2021-03-21 18:39:38 UTC
Created attachment 1765096 [details]
Journal from affected system

This is from a system with a clean install of F33.

Comment 6 Jonathan Dieter 2021-03-21 18:45:30 UTC
(In reply to Tom Hughes from comment #4)
> A workaround is to add "systemd.mask=raid-check.timer" to the kernel command
> line when booting which should allow the machine to boot after which
> "systemctl disable raid-check.timer" can be used to prevent a recurrence.

It looks like systemd won't allow you to disable a masked service, even if it's masked in the kernel command line.

If using the above workaround, you'll need to run the following to manually remove the timer:
rm /etc/systemd/system/timers.target.wants/raid-check.timer

Comment 7 Jonathan Dieter 2021-03-21 19:29:17 UTC
Some good news: as chrisawi pointed out on IRC, it looks like this is tied to the Europe/Dublin time zone.  Switching to Etc/UTC, Europe/London, or other time zones fixes the problem.

Comment 8 Zbigniew Jędrzejewski-Szmek 2021-03-21 19:32:22 UTC
Unfortunately I can't reproduce this here...
The most likely explanation is some infinite loop in the timer handling code. Could
someone who is affected provide a stack trace (with 'gdb -p1' or 'pstack 1'), or maybe
a core file ('kill -ABRT 1' and then look look in the journal for information in the core file
and upload it here).

Comment 9 Zbigniew Jędrzejewski-Szmek 2021-03-21 20:01:17 UTC
I see the issue.

Comment 10 Ed Greshko 2021-03-21 23:27:11 UTC
This also triggers an issue

[egreshko@f33g ~]$ date

Mon Mar 22 06:57:03 CST 2021
[egreshko@f33g ~]$ sudo systemctl --now disable raid-check.timer

Removed /etc/systemd/system/timers.target.wants/raid-check.timer.
[egreshko@f33g ~]$ sudo systemctl status raid-check.timer
● raid-check.timer - Weekly RAID setup health check
     Loaded: loaded (/usr/lib/systemd/system/raid-check.timer; disabled; vendor p>
     Active: inactive (dead)
    Trigger: n/a
   Triggers: ● raid-check.service

Mar 22 06:56:47 f33g.greshko.com systemd[1]: Started Weekly RAID setup health che>
Mar 22 07:17:56 f33g.greshko.com systemd[1]: raid-check.timer: Succeeded.
Mar 22 07:17:56 f33g.greshko.com systemd[1]: Stopped Weekly RAID setup health che>
[egreshko@f33g ~]$ timedatectl status | grep zone
                Time zone: Asia/Taipei (CST, +0800)   
[egreshko@f33g ~]$ sudo timedatectl set-timezone Europe/Dublin
[egreshko@f33g ~]$ sudo timedatectl set-timezone Asia/Taipei

Note there is no problem.  And then,

[egreshko@f33g ~]$ sudo systemctl --now enable raid-check.timer
Created symlink /etc/systemd/system/timers.target.wants/raid-check.timer → /usr/lib/systemd/system/raid-check.timer.
[egreshko@f33g ~]$ sudo timedatectl set-timezone Europe/Dublin
[egreshko@f33g ~]$ sudo timedatectl set-timezone Asia/Taipei
Failed to set time zone: Connection timed out

Comment 11 Japheth Cleaver 2021-03-22 04:59:31 UTC
Is this affecting all DST transitions?

Comment 12 Zbigniew Jędrzejewski-Szmek 2021-03-22 13:00:25 UTC
https://github.com/systemd/systemd/pull/19075

Comment 13 Fedora Update System 2021-03-23 07:03:37 UTC
FEDORA-2021-ea92e5703f has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-ea92e5703f

Comment 14 Fedora Update System 2021-03-23 22:47:13 UTC
FEDORA-2021-1c1a870ceb has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2021-1c1a870ceb

Comment 15 Fedora Update System 2021-03-24 02:44:21 UTC
FEDORA-2021-ea92e5703f has been pushed to the Fedora 34 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-ea92e5703f`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-ea92e5703f

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 16 Zbigniew Jędrzejewski-Szmek 2021-03-24 11:56:05 UTC
*** Bug 1942298 has been marked as a duplicate of this bug. ***

Comment 17 Fedora Update System 2021-03-24 11:58:06 UTC
FEDORA-2021-1c1a870ceb has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2021-1c1a870ceb

Comment 18 Fedora Update System 2021-03-25 00:18:55 UTC
FEDORA-2021-ea92e5703f has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 19 Fedora Update System 2021-03-25 01:31:51 UTC
FEDORA-2021-1c1a870ceb has been pushed to the Fedora 33 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-1c1a870ceb`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-1c1a870ceb

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 20 Fedora Update System 2021-03-27 01:11:12 UTC
FEDORA-2021-1c1a870ceb has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.