RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2001549 - udisksd sometimes crashes during shutdown with "malloc_consolidate(): unaligned fastbin chunk detected"
Summary: udisksd sometimes crashes during shutdown with "malloc_consolidate(): unalign...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: udisks2
Version: 9.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Tomáš Bžatek
QA Contact: guazhang@redhat.com
URL:
Whiteboard:
Depends On: 2010690
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-06 11:21 UTC by Marius Vollmer
Modified: 2022-06-07 03:20 UTC (History)
1 user (show)

Fixed In Version: udisks2-2.9.4-1.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-17 12:52:43 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)
Journal messages with the crash (2.20 MB, text/plain)
2021-09-06 11:21 UTC, Marius Vollmer
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-96305 0 None None None 2021-09-06 11:22:02 UTC
Red Hat Product Errata RHBA-2022:2413 0 None None None 2022-05-17 12:52:54 UTC

Description Marius Vollmer 2021-09-06 11:21:40 UTC
Created attachment 1820822 [details]
Journal messages with the crash

Description of problem:

We see a udisksd crash sometime during shutdown of a CI machine:

Sep 06 05:35:27 rhel-9-0-127-0-0-2-2201 udisksd[16264]: udisks daemon version 2.9.2 exiting
Sep 06 05:35:27 rhel-9-0-127-0-0-2-2201 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=gssproxy comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 06 05:35:27 rhel-9-0-127-0-0-2-2201 systemd[1]: gssproxy.service: Deactivated successfully.
Sep 06 05:35:27 rhel-9-0-127-0-0-2-2201 systemd[1]: Stopped GSSAPI Proxy Daemon.
Sep 06 05:35:27 rhel-9-0-127-0-0-2-2201 udisksd[16264]: malloc_consolidate(): unaligned fastbin chunk detected
Sep 06 05:35:27 rhel-9-0-127-0-0-2-2201 audit[16264]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:devicekit_disk_t:s0 pid=16264 comm="udisksd" exe="/usr/libexec/udisks2/udisksd" sig=6 res=1
Sep 06 05:35:27 rhel-9-0-127-0-0-2-2201 systemd-coredump[166138]: Failed to connect to coredump service: Connection refused

The VM has run a good number of tests at that time, and it is not clear what exactly has gottin udisksd into the crashy state.  I will try to find something reproducible.

Also, systemd-coredump has already exited by the time udisksd crashes, so there is no backtrace either.

2350-20210906-090323-4a6c7db3-rhel-9-0-cockpit-project-cockpit/TestUpdates-testTracer-rhel-9-0-127.0.0.2-2301-FAIL.log.gz

Version-Release number of selected component (if applicable):
udisks2-2.9.2-6.el9.x86_64

How reproducible:
Not at all easy, but maybe I can find something

Steps to Reproduce:
1. Run the full Cockpit CI suite for rhel-9-0.

Comment 1 Marius Vollmer 2021-09-06 12:50:02 UTC
This will reproduce it, if you know how to run the Cockpit CI suite:

   $ export TEST_OS=rhel-9-0
   $ make bots
   $ ./test/image-prepare -q
   $ ./test/common/run-tests -n -b 1 --test-dir test/verify

This takes quite a bit of time.  I will try to cut this down a bit more, get a backtrace and maybe even try to run udisksd under valgrind.

Comment 2 Marius Vollmer 2021-09-06 15:02:13 UTC
I got this from valgrind, but I am not sure if it is relevant yet:

==5484== Thread 8 pool-udisksd:
==5484== Source and destination overlap in memcpy_chk(0xb126055, 0xb126050, 7)
==5484==    at 0x484C232: __memcpy_chk (vg_replace_strmem.c:1617)
==5484==    by 0x517FDED: blkid_devno_to_devname (in /usr/lib64/libblkid.so.1.1.0)
==5484==    by 0x518ACE1: ??? (in /usr/lib64/libblkid.so.1.1.0)
==5484==    by 0x5181DED: blkid_do_safeprobe (in /usr/lib64/libblkid.so.1.1.0)
==5484==    by 0x48C7DEF: mnt_cache_read_tags (in /usr/lib64/libmount.so.1.1.0)
==5484==    by 0x48C8180: mnt_get_fstype (in /usr/lib64/libmount.so.1.1.0)
==5484==    by 0x48D81E2: ??? (in /usr/lib64/libmount.so.1.1.0)
==5484==    by 0x48E5F99: mnt_context_prepare_mount (in /usr/lib64/libmount.so.1.1.0)
==5484==    by 0x48E6BB9: mnt_context_mount (in /usr/lib64/libmount.so.1.1.0)
==5484==    by 0x7C770AB: ??? (in /usr/lib64/libbd_fs.so.2.0.0)
==5484==    by 0x7C7887E: bd_fs_mount (in /usr/lib64/libbd_fs.so.2.0.0)
==5484==    by 0x13D7FA: ??? (in /usr/libexec/udisks2/udisksd)
==5484==

Comment 3 Marius Vollmer 2021-09-07 07:47:31 UTC
And this at exit:

==58962== Invalid read of size 4
==58962==    at 0x4B4CD24: g_resource_unref (in /usr/lib64/libgio-2.0.so.0.6800.4)
==58962==    by 0x4B4EAD0: g_static_resource_fini (in /usr/lib64/libgio-2.0.so.0.6800.4)
==58962==    by 0x4010ADB: _dl_fini (dl-fini.c:139)
==58962==    by 0x4E8BDC4: __run_exit_handlers (in /usr/lib64/libc.so.6)
==58962==    by 0x4E8BF3F: exit (in /usr/lib64/libc.so.6)
==58962==    by 0x4E74566: __libc_start_call_main (in /usr/lib64/libc.so.6)
==58962==    by 0x4E7460B: __libc_start_main@@GLIBC_2.34 (in /usr/lib64/libc.so.6)
==58962==    by 0x1286D4: (below main) (in /usr/libexec/udisks2/udisksd)
==58962==  Address 0x5da98d0 is 0 bytes inside a block of size 16 free'd
==58962==    at 0x48440E4: free (vg_replace_malloc.c:755)
==58962==    by 0x4D4BD1C: g_free (in /usr/lib64/libglib-2.0.so.0.6800.4)
==58962==    by 0x4B4EAC8: g_static_resource_fini (in /usr/lib64/libgio-2.0.so.0.6800.4)
==58962==    by 0x4010ADB: _dl_fini (dl-fini.c:139)
==58962==    by 0x4E8BDC4: __run_exit_handlers (in /usr/lib64/libc.so.6)
==58962==    by 0x4E8BF3F: exit (in /usr/lib64/libc.so.6)
==58962==    by 0x4E74566: __libc_start_call_main (in /usr/lib64/libc.so.6)
==58962==    by 0x4E7460B: __libc_start_main@@GLIBC_2.34 (in /usr/lib64/libc.so.6)
==58962==    by 0x1286D4: (below main) (in /usr/libexec/udisks2/udisksd)
==58962==  Block was alloc'd at
==58962==    at 0x484186F: malloc (vg_replace_malloc.c:380)
==58962==    by 0x4D4F408: g_malloc (in /usr/lib64/libglib-2.0.so.0.6800.4)
==58962==    by 0x4B4E347: g_resource_new_from_data (in /usr/lib64/libgio-2.0.so.0.6800.4)
==58962==    by 0x4B4E408: ??? (in /usr/lib64/libgio-2.0.so.0.6800.4)
==58962==    by 0x4B4EB0C: g_static_resource_get_resource (in /usr/lib64/libgio-2.0.so.0.6800.4)
==58962==    by 0x14F1F3: udisks_linux_mount_options_get_builtin (in /usr/libexec/udisks2/udisksd)
==58962==    by 0x12CAEE: ??? (in /usr/libexec/udisks2/udisksd)
==58962==    by 0x4CB9C86: ??? (in /usr/lib64/libgobject-2.0.so.0.6800.4)
==58962==    by 0x4CBB137: g_object_new_valist (in /usr/lib64/libgobject-2.0.so.0.6800.4)
==58962==    by 0x4CBB66C: g_object_new (in /usr/lib64/libgobject-2.0.so.0.6800.4)
==58962==    by 0x12998F: udisks_daemon_new (in /usr/libexec/udisks2/udisksd)
==58962==    by 0x129A05: ??? (in /usr/libexec/udisks2/udisksd)

Comment 4 Tomáš Bžatek 2021-09-07 10:57:44 UTC
Interesting! Might be caused by the fact RHEL 9 is compiled with much strict CFLAGS than Fedora. This looks like a stack corruption from a first look. From time to time we run udisks under Valgrind, though I have never tried that on RHEL 9.

I guess a good reproducer would be a key here. Please keep trying!

Comment 5 Marius Vollmer 2021-09-08 12:01:50 UTC
Got it!  (The reproducer, not the fix...)

This script causes udisksd to abort for me:

    # Make a fresh block device with some LUKS on it

    dd if=/dev/zero of=/disk.img bs=1M count=50
    DEV=$(losetup -f --show /disk.img)
    echo einszweidrei | cryptsetup luksFormat ${DEV}
    udevadm settle

    # Start udisksd, just so that a second copy refuses to start
    systemctl start udisks2

    # Now run udisksd.  It will immediately exit (because there is already
    # one running), and will crash during exit.
    /usr/libexec/udisks2/udisksd

    # Clean up for the next run
    losetup -d $DEV

Basically, if there is a LUKS device (doesn't need to be unlocked), just starting udisksd does something that will cause it to abort during exit.

Running udisksd under valgrind does not provide any hints, the invalid read in comment 3 happens always during exit.

Comment 6 Marius Vollmer 2021-09-08 12:05:54 UTC
And here is the backtrace of the abort:

#0  0x00007ffff7852763 in pthread_kill.5 () from /lib64/libc.so.6
#1  0x00007ffff7805686 in raise () from /lib64/libc.so.6
#2  0x00007ffff77ef7d3 in abort () from /lib64/libc.so.6
#3  0x00007ffff7846a07 in __libc_message () from /lib64/libc.so.6
#4  0x00007ffff785c72c in malloc_printerr () from /lib64/libc.so.6
#5  0x00007ffff785d49c in malloc_consolidate () from /lib64/libc.so.6
#6  0x00007ffff785e4a0 in _int_free () from /lib64/libc.so.6
#7  0x00007ffff7860ac5 in free () from /lib64/libc.so.6
#8  0x00007ffff709fa6d in sa_doall.lto_priv () from /lib64/libcrypto.so.3
#9  0x00007ffff70c457d in ossl_method_store_free.part () from /lib64/libcrypto.so.3
#10 0x00007ffff709a825 in CRYPTO_free_ex_data () from /lib64/libcrypto.so.3
#11 0x00007ffff709359f in context_deinit.part.0.lto_priv () from /lib64/libcrypto.so.3
#12 0x00007ffff709cfba in OPENSSL_cleanup () from /lib64/libcrypto.so.3
#13 0x00007ffff5592303 in libcryptsetup_exit () from /lib64/libcryptsetup.so.12
#14 0x00007ffff7fdaadc in _dl_fini () at dl-fini.c:139
#15 0x00007ffff7807dc5 in __run_exit_handlers () from /lib64/libc.so.6
#16 0x00007ffff7807f40 in exit () from /lib64/libc.so.6
#17 0x00007ffff77f0567 in __libc_start_call_main () from /lib64/libc.so.6
#18 0x00007ffff77f060c in __libc_start_main_impl () from /lib64/libc.so.6
#19 0x00005555555746d5 in _start ()

Comment 7 Tomáš Bžatek 2021-09-08 12:10:47 UTC
Thanks, this looks like cryptsetup bug 1998921

Comment 8 Tomáš Bžatek 2021-10-04 13:53:02 UTC
With bug 1998921 fixed and verified and the backtrace looking suspiciously similar, let's just close this one as a duplicate. No fix needed on udisks2 side.

*** This bug has been marked as a duplicate of bug 1998921 ***

Comment 9 Marius Vollmer 2021-10-05 07:28:14 UTC
cryptsetup-libs-2.4.0-2.el9_b.x86_64 unfortunately does not fix this crash.  The reproducer in comment 5 still results in a the same output:

udisks-Message: 03:26:22.116: udisks daemon version 2.9.2 starting
udisks-Message: 03:26:22.220: udisks daemon version 2.9.2 exiting
malloc_consolidate(): unaligned fastbin chunk detected
Aborted (core dumped)

Comment 11 Marius Vollmer 2021-10-05 07:30:08 UTC
(and udiskd doesn't actually link to libcryptsetup.so as far as ldd tells me, or is there some dlopening going on?)

Comment 12 Tomáš Bžatek 2021-10-05 11:10:02 UTC
You're right! Sorry for not noticing the backtrace difference, both are however related to at_exit handlers and library deinitialization.

I've reproduced this on my local development system with slightly different udisks setup.

I've opened bug 2010690 to keep the history clean.

Comment 13 Tomáš Bžatek 2021-10-05 14:10:55 UTC
(In reply to Marius Vollmer from comment #11)
> (and udiskd doesn't actually link to libcryptsetup.so as far as ldd tells
> me, or is there some dlopening going on?)

Yes, libcryptsetup.so gets dlopened in the process through bd_try_init() along with libbd_crypto.so.2 that actually links against libcryptsetup.so. There's no explicit dlclose() as far as I can tell, only exit handlers attached.

Comment 14 Marius Vollmer 2021-10-06 06:32:42 UTC
(In reply to Tomáš Bžatek from comment #13)
> (In reply to Marius Vollmer from comment #11)
> > (and udiskd doesn't actually link to libcryptsetup.so as far as ldd tells
> > me, or is there some dlopening going on?)
> 
> Yes, libcryptsetup.so gets dlopened in the process through bd_try_init()
> along with libbd_crypto.so.2 that actually links against libcryptsetup.so.
> There's no explicit dlclose() as far as I can tell, only exit handlers
> attached.

I see, thanks!

Comment 15 Tomáš Bžatek 2021-10-07 14:25:56 UTC
Turns out you were right again, Marius! The warning in comment 3 has actually caused memory corruption that caused a crash later in the cryptsetup code.

This upstream change should fix the issue: https://github.com/storaged-project/udisks/pull/926

I'll include it along with the 2.9.4 release rebase (bug 2010363).

Comment 16 guazhang@redhat.com 2021-10-27 07:25:25 UTC
Hi

reproduce the bug with udisks2-2.9.2-6.el9.x86_64.

# sh test.sh 
50+0 records in
50+0 records out
52428800 bytes (52 MB, 50 MiB) copied, 0.0198364 s, 2.6 GB/s
udisks-Message: 03:15:32.426: udisks daemon version 2.9.2 starting
udisks-Message: 03:15:32.730: udisks daemon version 2.9.2 exiting
malloc_consolidate(): unaligned fastbin chunk detected
test.sh: line 15:  5723 Aborted                 (core dumped) /usr/libexec/udisks2/udisksd




Do not reproduce the bug with udisks2-2.9.4-1.el9.

Comment 20 errata-xmlrpc 2022-05-17 12:52:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: udisks2), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2413


Note You need to log in before you can comment on or make changes to this bug.