Created attachment 1758276 [details] system logs Description of problem: I was unable reboot computer after upgrade systemd from 247.3-2.fc35 to 247.3-3.fc35 version Demonstration: https://youtu.be/Kv5lPipOSfY
Created attachment 1758277 [details] backtrace
#9 0x00007f3885fc5a58 in __GI___libc_free (mem=<optimized out>) at malloc.c:3309 ar_ptr = <optimized out> p = <optimized out> hook = <optimized out> err = 0 #10 0x00007f38861ecb5d in regex_data_free (regex=0x564449efa630) at /usr/src/debug/libselinux-3.2-0.rc2.1.fc34.x86_64/src/regex.c:205 No locals. #11 closef (rec=<optimized out>) at /usr/src/debug/libselinux-3.2-0.rc2.1.fc34.x86_64/src/label_file.c:815 data = 0x564449bca340 area = <optimized out> last_area = <optimized out> spec = 0x7f3885248ae8 stem = <optimized out> i = 4085 #12 0x00007f38861f13cd in selabel_close (rec=0x564449bc9b20) at /usr/src/debug/libselinux-3.2-0.rc2.1.fc34.x86_64/src/label.c:366 No locals. #13 0x00007f38863aa549 in mac_selinux_finish () at ../src/basic/selinux-util.c:206 No locals. #14 0x0000564447fff582 in main (argc=<optimized out>, argv=<optimized out>) at ../src/core/main.c:2881 Hmm, could be something in libselinux. Or the memory is corrupted by systemd code, and the crash just happens in the libselinux code path. Hard to say at this point. libselinux maintainers, any ideas?
First thing I have to ask - it possible to boot in permissive mode? For instructions see https://docs.fedoraproject.org/en-US/quick-docs/changing-selinux-states-and-modes/#selinux-changing-to-permissive-mode or put 'enforcing=0' on kernel command line. Do you have any local policy modification, see `semanage export` ? I personally use Rawhide with libselinux-3.2-0.rc2.1.fc34.x86_64 systemd-247.3-3.fc35.x86_64 and it boots just fine.
> First thing I have to ask - it possible to boot in permissive mode? For instructions see https://docs.fedoraproject.org/en-US/quick-docs/changing-selinux-states-and-modes/#selinux-changing-to-permissive-mode or put 'enforcing=0' on kernel command line. I am not experienced issues with boot. I has issue only after upgrade systemd (daemons are out of control until computer reset). SELINUX=permissive or SELINUX=enforced mode doesn't matter. Issuessue reproduceble on machines with permissive and enforced SELinux modes.
Created attachment 1759175 [details] dnf output As you can see after systemd upgrade all daemons has "Connection timed out" when executing scriptlets.
So the reproducer seems to be simple: # rpm -q systemd systemd-247.3-1.fc34.x86_64 # dnf update systemd --nogpgcheck Running scriptlet: systemd-rpm-macros-248~rc2-1.fc35.noarch 1/1 Upgrading : systemd-rpm-macros-248~rc2-1.fc35.noarch 1/12 Upgrading : systemd-libs-248~rc2-1.fc35.x86_64 2/12 Running scriptlet: systemd-libs-248~rc2-1.fc35.x86_64 2/12 Running scriptlet: systemd-networkd-248~rc2-1.fc35.x86_64 3/12 Upgrading : systemd-networkd-248~rc2-1.fc35.x86_64 3/12 Running scriptlet: systemd-networkd-248~rc2-1.fc35.x86_64 3/12 Upgrading : systemd-pam-248~rc2-1.fc35.x86_64 4/12 Running scriptlet: systemd-248~rc2-1.fc35.x86_64 5/12 Upgrading : systemd-248~rc2-1.fc35.x86_64 5/12 Running scriptlet: systemd-248~rc2-1.fc35.x86_64 5/12 Broadcast message from systemd-journald@plautrba-1mt-fedora-rawhide-17506-2021-02-25-18-01 (Thu 2021-02-25 13:04:38 EST): systemd[1]: Caught <ABRT>, dumped core as pid 1062. Broadcast message from systemd-journald@plautrba-1mt-fedora-rawhide-17506-2021-02-25-18-01 (Thu 2021-02-25 13:04:38 EST): systemd[1]: Freezing execution.
OTOH update from 248~rc2-1.fc34 to 248~rc2-1.fc35 works without any problem. It looks like a problem between old and new systemd release.
(In reply to Petr Lautrbach from comment #7) > OTOH update from 248~rc2-1.fc34 to 248~rc2-1.fc35 works without any problem. > It looks like a problem between old and new systemd release. Yesterday it happens again on all my machines when I upgraded from systemd-247.3-3.fc35 to systemd-248~rc2-1.fc35
I started with systemd-247.3-1.fc34.x86_64. It happened when I updated to systemd-247.3-2.fc35.x86_64, after reboot I updated to systemd-247.3-3.fc35.x86_64 without any problem, but it happened again after update to systemd-udev-248~rc2-1.fc35.x86_64 The code referenced in https://bugzilla.redhat.com/show_bug.cgi?id=1930965#c2 is void regex_data_free(struct regex_data *regex) { if (regex) { if (regex->regex) pcre2_code_free(regex->regex); if (regex->match_data) pcre2_match_data_free(regex->match_data); __pthread_mutex_destroy(®ex->match_mutex); free(regex); } } It hasn't changed since 2016
This is breaking Fedora CI a lot, could somebody prioritize this please?
I wasn't able to reliably reproduce the problem and it looks like it fails on random positions: # coredumpctl info systemd PID: 8348 (systemd) UID: 0 (root) GID: 0 (root) Signal: 6 (ABRT) Timestamp: Tue 2021-03-02 09:53:56 UTC (5h 21min ago) Command Line: /usr/lib/systemd/systemd --switched-root --system --deserialize 30 Executable: /usr/lib/systemd/systemd Control Group: /init.scope Unit: init.scope Slice: -.slice Boot ID: f5ecd69c76fd4f0bb8dc34075c72e56d Machine ID: 2af87df091aa439383ed4f8589d5f592 Hostname: ip-172-31-31-198.us-east-2.compute.internal Storage: /var/lib/systemd/coredump/core.systemd.0.f5ecd69c76fd4f0bb8dc34075c72e56d.8348.1614678836000000.zst (present) Disk Size: 704.8K Message: Process 8348 (systemd) of user 0 dumped core. Stack trace of thread 8348: #0 0x00007fbb5f37358b kill (libc.so.6 + 0x3d58b) #1 0x00005592ba06848e do_reexecute (/usr/lib/systemd/systemd (deleted) + 0x4848e) PID: 3355 (systemd) UID: 0 (root) GID: 0 (root) Signal: 11 (SEGV) Timestamp: Tue 2021-03-02 10:32:00 UTC (4h 43min ago) Command Line: /usr/lib/systemd/systemd --system --deserialize 30 Executable: /usr/lib/systemd/systemd Control Group: /init.scope Unit: init.scope Slice: -.slice Boot ID: 4b50342a33d24b3bb92c90ab486250fe Machine ID: 2af87df091aa439383ed4f8589d5f592 Hostname: ip-172-31-31-198.us-east-2.compute.internal Storage: /var/lib/systemd/coredump/core.systemd.0.4b50342a33d24b3bb92c90ab486250fe.3355.1614681120000000.zst (present) Disk Size: 641.7K Message: Process 3355 (systemd) of user 0 dumped core. Stack trace of thread 3355: #0 0x00007f0fcf60b58b kill (libc.so.6 + 0x3d58b) #1 0x00005601a61aab78 crash (systemd + 0x45b78) #2 0x00007f0fcf7b04b0 __restore_rt (libpthread.so.0 + 0x134b0) #3 0x00007f0fcf72e955 __strlen_avx2 (libc.so.6 + 0x160955) #4 0x00007f0fcfa42c52 strjoin_real (libsystemd-shared-248.so + 0x16fc52) #5 0x00007f0fcfa506c4 strv_extend_strv_concat (libsystemd-shared-248.so + 0x17d6c4) #6 0x00007f0fcf978e03 config_parse_many (libsystemd-shared-248.so + 0xa5e03) #7 0x00005601a61b111b parse_config_file (systemd + 0x4c11b) #8 0x00005601a61b13fa parse_configuration (systemd + 0x4c3fa) #9 0x00005601a61a9c06 main (systemd + 0x44c06) #10 0x00007f0fcf5f5b75 __libc_start_main (libc.so.6 + 0x27b75) #11 0x00005601a61aa72e _start (systemd + 0x4572e) PID: 1411 (systemd) UID: 0 (root) GID: 0 (root) Signal: 6 (ABRT) Timestamp: Tue 2021-03-02 10:43:59 UTC (4h 31min ago) Command Line: /usr/lib/systemd/systemd --system --deserialize 29 Executable: /usr/lib/systemd/systemd Control Group: /init.scope Unit: init.scope Slice: -.slice Boot ID: f1b94465f3e347fbb3ebd1072d08e9cb Machine ID: 2af87df091aa439383ed4f8589d5f592 Hostname: ip-172-31-31-198.us-east-2.compute.internal Storage: /var/lib/systemd/coredump/core.systemd.0.f1b94465f3e347fbb3ebd1072d08e9cb.1411.1614681839000000.zst (present) Disk Size: 608.4K Message: Process 1411 (systemd) of user 0 dumped core. Stack trace of thread 1411: #0 0x00007f6d6a99e58b kill (libc.so.6 + 0x3d58b) #1 0x0000555f17e38b68 crash (/usr/lib/systemd/systemd (deleted) + 0x45b68) #2 0x00007f6d6ab434b0 __restore_rt (libpthread.so.0 + 0x134b0) #3 0x00007f6d6a99e292 raise (libc.so.6 + 0x3d292) #4 0x00007f6d6a9878a4 abort (libc.so.6 + 0x268a4) #5 0x00007f6d6acdcf26 n/a (/usr/lib/systemd/libsystemd-shared-248.so (deleted) + 0x76f26)
Created attachment 1761533 [details] dnf output It happened again between systemd-248~rc2-1.fc35 and 248~rc2-3.fc35 upgrade.
Created attachment 1761565 [details] system log
I think there might be more than one bug here. Comment 6 is almost certainly #1931034. The one in comment 13 too. OTOH, the one with strjoin_real() in the backtrace looks different.
Most of the crashes reported here should be fixed in systemd-248~rc2-5.fc3[45], i.e. bug #1931034. Let's reuse this bug for the issue with parse_config_file (). Petr, could you provide a full backtrace or the core file for the crash of PID 3355?
(In reply to Zbigniew Jędrzejewski-Szmek from comment #15) > Most of the crashes reported here should be fixed in > systemd-248~rc2-5.fc3[45], i.e. bug #1931034. > Let's reuse this bug for the issue with parse_config_file (). > > Petr, could you provide a full backtrace or the core file for the crash of > PID 3355? I used a cloud guest which is not available anymore, sorry.
> I used a cloud guest which is not available anymore, sorry. I'll close this then. Please reopen if this issue or a similar one occurs again. (We have a fuzzer for the config parsing stuff, so I'd be surprised if there was any easily-triggered bug in this code. Maybe it was caused by some unrelated corruption, and only surfaced here.)