Bug 1984651

Summary:	systemd[1]: Assertion 'a <= b' failed at src/libsystemd/sd-event/sd-event.c:2903, function sleep_between(). Aborting.
Product:	[Fedora] Fedora	Reporter:	Charles R. Anderson <cra>
Component:	systemd	Assignee:	systemd-maint
Status:	CLOSED ERRATA	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	34	CC:	319513897, craig91, fedoraproject, filbranden, flepied, glesage, kasong, lnykryn, marc.jeanmougin, mpitt, msekleta, ssahani, s, systemd-maint, yuwatana, zbyszek
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	systemd-248.6-1.fc34	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-07-25 01:01:42 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Charles R. Anderson 2021-07-21 20:25:03 UTC

Description of problem:

Jul 21 09:57:50 gauge.lan systemd[1]: Assertion 'a <= b' failed at src/libsystemd/sd-event/sd-event.c:2903, function sleep_between(). Aborting.
Jul 21 09:57:50 gauge.lan audit[1938]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:init_t:s0 pid=1938 comm="systemd" exe="/usr/lib/systemd/systemd" sig=6 res=1
Jul 21 09:57:50 gauge.lan systemd-coredump[1939]: Due to PID 1 having crashed coredump collection will now be turned off.
Jul 21 09:57:50 gauge.lan audit[1939]: AVC avc:  denied  { write } for  pid=1939 comm="systemd-coredum" name="core_pattern" dev="proc" ino=2188 scontext=system_u:system_r:systemd_coredump_t:s0 >
Jul 21 09:57:50 gauge.lan audit[1939]: SYSCALL arch=c000003e syscall=257 success=no exit=-13 a0=ffffff9c a1=7f4299f45a2f a2=80101 a3=0 items=0 ppid=2 pid=1939 auid=4294967295 uid=0 gid=0 euid=0>
Jul 21 09:57:50 gauge.lan audit: PROCTITLE proctitle=2F7573722F6C69622F73797374656D642F73797374656D642D636F726564756D7000313933380030003000360031363236383735383730003138343436373434303733373039>
Jul 21 09:57:50 gauge.lan systemd-coredump[1939]: [🡕] Process 1938 (systemd) of user 0 dumped core.
                                                  
                                                  Stack trace of thread 1938:
                                                  #0  0x00007fd3d1c6659b kill (libc.so.6 + 0x3d59b)
                                                  #1  0x000056096b13abc8 crash (systemd + 0x45bc8)
                                                  #2  0x00007fd3d1e0ba20 __restore_rt (libpthread.so.0 + 0x13a20)
                                                  #3  0x00007fd3d1c662a2 raise (libc.so.6 + 0x3d2a2)
                                                  #4  0x00007fd3d1c4f8a4 abort (libc.so.6 + 0x268a4)
                                                  #5  0x00007fd3d1fb7a42 log_assert_failed.cold (libsystemd-shared-248.so + 0x78a42)
                                                  #6  0x00007fd3d2150d6f sleep_between (libsystemd-shared-248.so + 0x211d6f)
                                                  #7  0x00007fd3d21592b9 event_arm_timer (libsystemd-shared-248.so + 0x21a2b9)
                                                  #8  0x00007fd3d2159cf5 sd_event_prepare (libsystemd-shared-248.so + 0x21acf5)
                                                  #9  0x00007fd3d215cd50 sd_event_run (libsystemd-shared-248.so + 0x21dd50)
                                                  #10 0x000056096b17f7d4 manager_loop (systemd + 0x8a7d4)
                                                  #11 0x000056096b137c5f main (systemd + 0x42c5f)
                                                  #12 0x00007fd3d1c50b75 __libc_start_main (libc.so.6 + 0x27b75)
                                                  #13 0x000056096b13a77e _start (systemd + 0x4577e)
Jul 21 09:57:50 gauge.lan systemd[1]: Caught <ABRT>, dumped core as pid 1938.
Jul 21 09:57:50 gauge.lan systemd[1]: Freezing execution.


Version-Release number of selected component (if applicable):
systemd-248.5-1.fc34.x86_64

Comment 1 Martin Pitt 2021-07-22 05:19:16 UTC

We also started to see this in cockpit's CI in e.g. this test run [1]. Full journal at [2].

[1] https://logs-https-frontdoor.apps.ocp.ci.centos.org/logs/pull-16124-20210722-043022-771379a1-fedora-34/log.html#151
[2] https://logs-https-frontdoor.apps.ocp.ci.centos.org/logs/pull-16124-20210722-043022-771379a1-fedora-34/TestSOS-testWithUrlRoot-fedora-34-127.0.0.2-2201-FAIL.log.gz

Comment 2 Garrett LeSage 2021-07-23 09:35:01 UTC

I'm also hitting this on my ARM home server using systemd-248.4-1.fc34.aarch64 on Fedora 34 IoT. My personal laptop is running systemd-248.5-1.fc34.x86_64 and is fine, however.

In my specific case, it looks like the problem is with a specific podman container. (It's an often-badly-behaving Minecraft Java server I've been running for my nephews.) I removed the Minecraft container (which will probably make my nephews and their friends sad, but they can play Fortnite instead 😉), pulled the plug (as rebooting doesn't even work without systemd operating properly), and did a fresh boot. So far, everything is working... and I even have a few other services in Podman containers that still work.

I know Cockpit's CI uses containers and VMs... could this be something triggered by resource-hungry containers?

Comment 3 Martin Pitt 2021-07-23 09:39:11 UTC

The particular test where we see this does not use containers in the sense of "podman". (Of course every systemd service is also a container of some kind). The whole VM qemu runs inside a container on the host, but the QEMU layer should sufficiently shield systemd from knowing that.

Comment 4 Zbigniew Jędrzejewski-Szmek 2021-07-23 11:15:13 UTC

*** Bug 1985126 has been marked as a duplicate of this bug. ***

Comment 5 Marc Jeanmougin 2021-07-23 15:40:23 UTC

Just diagnosed this issue for several of my fc34 servers after having looked at lots of https://bugzilla.redhat.com/show_bug.cgi?id=1463745 -like issues in the last days (the symptoms are similar: systemctl commands are not responding (incl. "reboot" without the -f) and timeouts after 30s, ssh login is very slow (waits for 90s) ) - however solutions like kill -9 1 or systemctl daemon-reexec will not do anything here (reboot helps for a few hours to a few days)


same journalctl as reporter : 
Jul 23 16:57:46 lame15.enst.fr systemd[1]: Assertion 'a <= b' failed at src/libsystemd/sd-event/sd-event.c:2903, function sleep_between(). Aborting.
Jul 23 16:57:46 lame15.enst.fr audit[15757]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 pid=15757 comm="systemd" exe="/usr/lib/systemd/systemd" sig=6 res=1
Jul 23 16:57:46 lame15.enst.fr systemd-coredump[15758]: Due to PID 1 having crashed coredump collection will now be turned off.
Jul 23 16:57:46 lame15.enst.fr systemd-coredump[15758]: [🡕] Process 15757 (systemd) of user 0 dumped core.
                                                        
                                                        Stack trace of thread 15757:
                                                        #0  0x00007fac997a159b kill (libc.so.6 + 0x3d59b)
                                                        #1  0x0000556f87ffebc8 crash (systemd + 0x45bc8)
                                                        #2  0x00007fac99946a20 __restore_rt (libpthread.so.0 + 0x13a20)
                                                        #3  0x00007fac997a12a2 raise (libc.so.6 + 0x3d2a2)
                                                        #4  0x00007fac9978a8a4 abort (libc.so.6 + 0x268a4)
                                                        #5  0x00007fac99aeca42 log_assert_failed.cold (libsystemd-shared-248.so + 0x78a42)
                                                        #6  0x00007fac99c85d6f sleep_between (libsystemd-shared-248.so + 0x211d6f)
                                                        #7  0x00007fac99c8e2b9 event_arm_timer (libsystemd-shared-248.so + 0x21a2b9)
                                                        #8  0x00007fac99c8ecf5 sd_event_prepare (libsystemd-shared-248.so + 0x21acf5)
                                                        #9  0x00007fac99c91d50 sd_event_run (libsystemd-shared-248.so + 0x21dd50)
                                                        #10 0x0000556f880437d4 manager_loop (systemd + 0x8a7d4)
                                                        #11 0x0000556f87ffbc5f main (systemd + 0x42c5f)
                                                        #12 0x00007fac9978bb75 __libc_start_main (libc.so.6 + 0x27b75)
                                                        #13 0x0000556f87ffe77e _start (systemd + 0x4577e)
Jul 23 16:57:46 lame15.enst.fr systemd[1]: Caught <ABRT>, dumped core as pid 15757.
Jul 23 16:57:46 lame15.enst.fr systemd[1]: Freezing execution.
Jul 23 16:57:47 lame15.enst.fr systemd-oomd[826]: Failed to connect to /run/systemd/io.system.ManagedOOM: Connection refused
Jul 23 16:57:47 lame15.enst.fr systemd-oomd[826]: Failed to acquire varlink connection: Connection refused
Jul 23 16:57:47 lame15.enst.fr systemd-oomd[826]: Event loop failed: Connection refused

(System fully up to date, package info below:)
Name         : systemd
Version      : 248.5
Release      : 1.fc34
Architecture : x86_64
Size         : 14 M
Source       : systemd-248.5-1.fc34.src.rpm
Repository   : @System
From repo    : updates

Comment 6 Fedora Update System 2021-07-23 21:02:13 UTC

FEDORA-2021-3141f0eff1 has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-3141f0eff1

Comment 7 Fedora Update System 2021-07-24 01:18:56 UTC

FEDORA-2021-3141f0eff1 has been pushed to the Fedora 34 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-3141f0eff1`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-3141f0eff1

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 8 Fedora Update System 2021-07-25 01:01:42 UTC

FEDORA-2021-3141f0eff1 has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 9 Zbigniew Jędrzejewski-Szmek 2021-07-29 09:52:19 UTC

*** Bug 1985690 has been marked as a duplicate of this bug. ***