Bug 1930965 - pid1 crash in parse_config_file()
Summary: pid1 crash in parse_config_file()
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-19 21:40 UTC by Mikhail
Modified: 2021-03-23 09:44 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-23 09:44:35 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
system logs (412.72 KB, text/plain)
2021-02-19 21:40 UTC, Mikhail
no flags Details
backtrace (5.10 KB, text/plain)
2021-02-19 21:41 UTC, Mikhail
no flags Details
dnf output (299.05 KB, text/plain)
2021-02-24 23:22 UTC, Mikhail
no flags Details
dnf output (12.78 KB, text/plain)
2021-03-08 10:02 UTC, Mikhail
no flags Details
system log (733.79 KB, text/plain)
2021-03-08 10:52 UTC, Mikhail
no flags Details

Description Mikhail 2021-02-19 21:40:50 UTC
Created attachment 1758276 [details]
system logs

Description of problem:
I was unable reboot computer after upgrade systemd from 247.3-2.fc35 to 247.3-3.fc35 version

Demonstration: https://youtu.be/Kv5lPipOSfY

Comment 1 Mikhail 2021-02-19 21:41:13 UTC
Created attachment 1758277 [details]
backtrace

Comment 2 Zbigniew Jędrzejewski-Szmek 2021-02-21 12:15:00 UTC
#9  0x00007f3885fc5a58 in __GI___libc_free (mem=<optimized out>) at malloc.c:3309
        ar_ptr = <optimized out>
        p = <optimized out>
        hook = <optimized out>
        err = 0
#10 0x00007f38861ecb5d in regex_data_free (regex=0x564449efa630) at /usr/src/debug/libselinux-3.2-0.rc2.1.fc34.x86_64/src/regex.c:205
No locals.
#11 closef (rec=<optimized out>) at /usr/src/debug/libselinux-3.2-0.rc2.1.fc34.x86_64/src/label_file.c:815
        data = 0x564449bca340
        area = <optimized out>
        last_area = <optimized out>
        spec = 0x7f3885248ae8
        stem = <optimized out>
        i = 4085
#12 0x00007f38861f13cd in selabel_close (rec=0x564449bc9b20) at /usr/src/debug/libselinux-3.2-0.rc2.1.fc34.x86_64/src/label.c:366
No locals.
#13 0x00007f38863aa549 in mac_selinux_finish () at ../src/basic/selinux-util.c:206
No locals.
#14 0x0000564447fff582 in main (argc=<optimized out>, argv=<optimized out>) at ../src/core/main.c:2881

Hmm, could be something in libselinux. Or the memory is corrupted by systemd code, and the crash
just happens in the libselinux code path. Hard to say at this point.

libselinux maintainers, any ideas?

Comment 3 Petr Lautrbach 2021-02-22 13:23:53 UTC
First thing I have to ask - it possible to boot in permissive mode? For instructions see https://docs.fedoraproject.org/en-US/quick-docs/changing-selinux-states-and-modes/#selinux-changing-to-permissive-mode or put 'enforcing=0' on kernel command line.

Do you have any local policy modification, see `semanage export` ?

I personally use Rawhide with libselinux-3.2-0.rc2.1.fc34.x86_64 systemd-247.3-3.fc35.x86_64 and it boots just fine.

Comment 4 Mikhail 2021-02-24 23:18:58 UTC
> First thing I have to ask - it possible to boot in permissive mode? For instructions see https://docs.fedoraproject.org/en-US/quick-docs/changing-selinux-states-and-modes/#selinux-changing-to-permissive-mode or put 'enforcing=0' on kernel command line.

I am not experienced issues with boot. I has issue only after upgrade systemd (daemons are out of control until computer reset).

SELINUX=permissive or SELINUX=enforced mode doesn't matter. Issuessue reproduceble on machines with permissive and enforced SELinux modes.

Comment 5 Mikhail 2021-02-24 23:22:13 UTC
Created attachment 1759175 [details]
dnf output

As you can see after systemd upgrade all daemons has "Connection timed out" when executing scriptlets.

Comment 6 Petr Lautrbach 2021-02-25 18:05:37 UTC
So the reproducer seems to be simple:

# rpm -q systemd
systemd-247.3-1.fc34.x86_64

# dnf update systemd --nogpgcheck

  Running scriptlet: systemd-rpm-macros-248~rc2-1.fc35.noarch                                                      1/1   Upgrading        : systemd-rpm-macros-248~rc2-1.fc35.noarch                                                     1/12   Upgrading        : systemd-libs-248~rc2-1.fc35.x86_64                                                           2/12   Running scriptlet: systemd-libs-248~rc2-1.fc35.x86_64                                                           2/12   Running scriptlet: systemd-networkd-248~rc2-1.fc35.x86_64                                                       3/12   Upgrading        : systemd-networkd-248~rc2-1.fc35.x86_64                                                       3/12   Running scriptlet: systemd-networkd-248~rc2-1.fc35.x86_64                                                       3/12   Upgrading        : systemd-pam-248~rc2-1.fc35.x86_64                                                            4/12   Running scriptlet: systemd-248~rc2-1.fc35.x86_64                                                                5/12   Upgrading        : systemd-248~rc2-1.fc35.x86_64                                                                5/12   Running scriptlet: systemd-248~rc2-1.fc35.x86_64                                                                5/12 
Broadcast message from systemd-journald@plautrba-1mt-fedora-rawhide-17506-2021-02-25-18-01 (Thu 2021-02-25 13:04:38 EST):

systemd[1]: Caught <ABRT>, dumped core as pid 1062.


Broadcast message from systemd-journald@plautrba-1mt-fedora-rawhide-17506-2021-02-25-18-01 (Thu 2021-02-25 13:04:38 EST):

systemd[1]: Freezing execution.

Comment 7 Petr Lautrbach 2021-02-25 20:49:10 UTC
OTOH update from 248~rc2-1.fc34 to 248~rc2-1.fc35 works without any problem. It looks like a problem between old and new systemd release.

Comment 8 Mikhail 2021-02-25 21:01:04 UTC
(In reply to Petr Lautrbach from comment #7)
> OTOH update from 248~rc2-1.fc34 to 248~rc2-1.fc35 works without any problem.
> It looks like a problem between old and new systemd release.

Yesterday it happens again on all my machines when I upgraded from systemd-247.3-3.fc35 to systemd-248~rc2-1.fc35

Comment 9 Petr Lautrbach 2021-02-25 21:22:02 UTC
I started with systemd-247.3-1.fc34.x86_64. It happened when I updated to systemd-247.3-2.fc35.x86_64, after reboot I updated to systemd-247.3-3.fc35.x86_64 without any problem, but it happened again after update to systemd-udev-248~rc2-1.fc35.x86_64



The code referenced in https://bugzilla.redhat.com/show_bug.cgi?id=1930965#c2 is 

void regex_data_free(struct regex_data *regex)
{
	if (regex) {
		if (regex->regex)
			pcre2_code_free(regex->regex);
		if (regex->match_data)
			pcre2_match_data_free(regex->match_data);
		__pthread_mutex_destroy(&regex->match_mutex);
		free(regex);
	}
}


It hasn't changed since 2016

Comment 10 Miroslav Vadkerti 2021-03-01 10:18:24 UTC
This is breaking Fedora CI a lot, could somebody prioritize this please?

Comment 11 Petr Lautrbach 2021-03-02 15:19:43 UTC
I wasn't able to reliably reproduce the problem and it looks like it fails on random positions:

# coredumpctl info systemd
           PID: 8348 (systemd)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 6 (ABRT)
     Timestamp: Tue 2021-03-02 09:53:56 UTC (5h 21min ago)
  Command Line: /usr/lib/systemd/systemd --switched-root --system --deserialize 30
    Executable: /usr/lib/systemd/systemd
 Control Group: /init.scope
          Unit: init.scope
         Slice: -.slice
       Boot ID: f5ecd69c76fd4f0bb8dc34075c72e56d
    Machine ID: 2af87df091aa439383ed4f8589d5f592
      Hostname: ip-172-31-31-198.us-east-2.compute.internal
       Storage: /var/lib/systemd/coredump/core.systemd.0.f5ecd69c76fd4f0bb8dc34075c72e56d.8348.1614678836000000.zst (present)
     Disk Size: 704.8K
       Message: Process 8348 (systemd) of user 0 dumped core.
                
                Stack trace of thread 8348:
                #0  0x00007fbb5f37358b kill (libc.so.6 + 0x3d58b)
                #1  0x00005592ba06848e do_reexecute (/usr/lib/systemd/systemd (deleted) + 0x4848e)

           PID: 3355 (systemd)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 11 (SEGV)
     Timestamp: Tue 2021-03-02 10:32:00 UTC (4h 43min ago)
  Command Line: /usr/lib/systemd/systemd --system --deserialize 30
    Executable: /usr/lib/systemd/systemd
 Control Group: /init.scope
          Unit: init.scope
         Slice: -.slice
       Boot ID: 4b50342a33d24b3bb92c90ab486250fe
    Machine ID: 2af87df091aa439383ed4f8589d5f592
      Hostname: ip-172-31-31-198.us-east-2.compute.internal
       Storage: /var/lib/systemd/coredump/core.systemd.0.4b50342a33d24b3bb92c90ab486250fe.3355.1614681120000000.zst (present)
     Disk Size: 641.7K
       Message: Process 3355 (systemd) of user 0 dumped core.
                
                Stack trace of thread 3355:
                #0  0x00007f0fcf60b58b kill (libc.so.6 + 0x3d58b)
                #1  0x00005601a61aab78 crash (systemd + 0x45b78)
                #2  0x00007f0fcf7b04b0 __restore_rt (libpthread.so.0 + 0x134b0)
                #3  0x00007f0fcf72e955 __strlen_avx2 (libc.so.6 + 0x160955)
                #4  0x00007f0fcfa42c52 strjoin_real (libsystemd-shared-248.so + 0x16fc52)
                #5  0x00007f0fcfa506c4 strv_extend_strv_concat (libsystemd-shared-248.so + 0x17d6c4)
                #6  0x00007f0fcf978e03 config_parse_many (libsystemd-shared-248.so + 0xa5e03)
                #7  0x00005601a61b111b parse_config_file (systemd + 0x4c11b)
                #8  0x00005601a61b13fa parse_configuration (systemd + 0x4c3fa)
                #9  0x00005601a61a9c06 main (systemd + 0x44c06)
                #10 0x00007f0fcf5f5b75 __libc_start_main (libc.so.6 + 0x27b75)
                #11 0x00005601a61aa72e _start (systemd + 0x4572e)

           PID: 1411 (systemd)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 6 (ABRT)
     Timestamp: Tue 2021-03-02 10:43:59 UTC (4h 31min ago)
  Command Line: /usr/lib/systemd/systemd --system --deserialize 29
    Executable: /usr/lib/systemd/systemd
 Control Group: /init.scope
          Unit: init.scope
         Slice: -.slice
       Boot ID: f1b94465f3e347fbb3ebd1072d08e9cb
    Machine ID: 2af87df091aa439383ed4f8589d5f592
      Hostname: ip-172-31-31-198.us-east-2.compute.internal
       Storage: /var/lib/systemd/coredump/core.systemd.0.f1b94465f3e347fbb3ebd1072d08e9cb.1411.1614681839000000.zst (present)
     Disk Size: 608.4K
       Message: Process 1411 (systemd) of user 0 dumped core.
                
                Stack trace of thread 1411:
                #0  0x00007f6d6a99e58b kill (libc.so.6 + 0x3d58b)
                #1  0x0000555f17e38b68 crash (/usr/lib/systemd/systemd (deleted) + 0x45b68)
                #2  0x00007f6d6ab434b0 __restore_rt (libpthread.so.0 + 0x134b0)
                #3  0x00007f6d6a99e292 raise (libc.so.6 + 0x3d292)
                #4  0x00007f6d6a9878a4 abort (libc.so.6 + 0x268a4)
                #5  0x00007f6d6acdcf26 n/a (/usr/lib/systemd/libsystemd-shared-248.so (deleted) + 0x76f26)

Comment 12 Mikhail 2021-03-08 10:02:32 UTC
Created attachment 1761533 [details]
dnf output

It happened again between systemd-248~rc2-1.fc35 and 248~rc2-3.fc35 upgrade.

Comment 13 Mikhail 2021-03-08 10:52:34 UTC
Created attachment 1761565 [details]
system log

Comment 14 Zbigniew Jędrzejewski-Szmek 2021-03-09 13:56:59 UTC
I think there might be more than one bug here. Comment 6 is almost certainly #1931034.
The one in comment 13 too.

OTOH, the one with strjoin_real() in the backtrace looks different.

Comment 15 Zbigniew Jędrzejewski-Szmek 2021-03-12 10:34:44 UTC
Most of the crashes reported here should be fixed in systemd-248~rc2-5.fc3[45], i.e. bug #1931034.
Let's reuse this bug for the issue with parse_config_file ().

Petr, could you provide a full backtrace or the core file for the crash of PID 3355?

Comment 16 Petr Lautrbach 2021-03-16 11:28:39 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #15)
> Most of the crashes reported here should be fixed in
> systemd-248~rc2-5.fc3[45], i.e. bug #1931034.
> Let's reuse this bug for the issue with parse_config_file ().
> 
> Petr, could you provide a full backtrace or the core file for the crash of
> PID 3355?

I used a cloud guest which is not available anymore, sorry.

Comment 17 Zbigniew Jędrzejewski-Szmek 2021-03-23 09:44:35 UTC
> I used a cloud guest which is not available anymore, sorry.

I'll close this then. Please reopen if this issue or a similar one occurs again.

(We have a fuzzer for the config parsing stuff, so I'd be surprised if there was any
easily-triggered bug in this code. Maybe it was caused by some unrelated corruption,
and only surfaced here.)


Note You need to log in before you can comment on or make changes to this bug.