Bug 2246048 - glibc: MALLOC_PERTURB_ causes ld.so to fail dlclose
Summary: glibc: MALLOC_PERTURB_ causes ld.so to fail dlclose
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 38
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Florian Weimer
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-10-25 06:40 UTC by Daiki Ueno
Modified: 2023-11-29 01:39 UTC (History)
12 users (show)

Fixed In Version: glibc-2.38.9000-19.fc40 glibc-2.36-18.fc37 glibc-2.37-14.fc38 glibc-2.38-11.fc39
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-11-28 14:26:10 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 2244992 0 unspecified CLOSED glibc: Improve compatibility of ELF destructor ordering 2023-11-29 01:39:24 UTC

Internal Links: 2244992

Description Daiki Ueno 2023-10-25 06:40:51 UTC
After upgrading to glibc-2.37-12, several tests started failing in p11-kit, when MALLOC_PERTURB_ is set, with:

Inconsistency detected by ld.so: dl-close.c: 200: _dl_close_worker: Assertion `(*lp)->l_idx >= 0 && (*lp)->l_idx < nloaded' failed!

Reproducible: Always

Steps to Reproduce:
1. git clone --depth=1 --recurse-submodules https://github.com/p11-glue/p11-kit.git
2. cd p11-kit
3. ./autogen.sh --disable-doc && make -j$(nproc) && MALLOC_PERTURB_=55 p11-kit/p11-kit-testable list-modules
Actual Results:  
Error is printed: Inconsistency detected by ld.so: dl-close.c: 200: _dl_close_worker: Assertion `(*lp)->l_idx >= 0 && (*lp)->l_idx < nloaded' failed!

Expected Results:  
No error and exit with 0 status code

Comment 1 Florian Weimer 2023-10-25 09:49:43 UTC
I would love to have a reproducer for this, but I couldn't get it to work based on your instructions. Here is the Containerfile I tried:

FROM fedora:37
RUN dnf install -y 'dnf-command(builddep)' git automake autoconf libtool gettext-devel
RUN dnf build-dep -y p11-kit
WORKDIR /root
RUN git clone --depth=1 --recurse-submodules https://github.com/p11-glue/p11-kit.git
WORKDIR /root/p11-kit
RUN ./autogen.sh --disable-doc
RUN make -j`nproc`
RUN make -j`nproc` check
RUN MALLOC_PERTURB_=55 p11-kit/p11-kit-testable list-modules

I tried to install softhsm-devel as well before building p11-kit, based on some warning output, but it doesn't make a difference. Still no assertion failure.

Comment 2 Daiki Ueno 2023-10-25 10:03:23 UTC
(In reply to Florian Weimer from comment #1)
> I would love to have a reproducer for this, but I couldn't get it to work
> based on your instructions. Here is the Containerfile I tried:

I'm trying to create a standalone reproducer, but haven't managed so far.

> FROM fedora:37
> RUN dnf install -y 'dnf-command(builddep)' git automake autoconf libtool
> gettext-devel

With the stock fedora:37 image, glibc is at glibc-2.36-14.fc37.x86_64.  I guess you would need to use fedora:38 and run "dnf update -y glibc" before proceeding to the following steps.

> RUN dnf build-dep -y p11-kit
> WORKDIR /root
> RUN git clone --depth=1 --recurse-submodules
> https://github.com/p11-glue/p11-kit.git
> WORKDIR /root/p11-kit
> RUN ./autogen.sh --disable-doc
> RUN make -j`nproc`
> RUN make -j`nproc` check
> RUN MALLOC_PERTURB_=55 p11-kit/p11-kit-testable list-modules
> 
> I tried to install softhsm-devel as well before building p11-kit, based on
> some warning output, but it doesn't make a difference. Still no assertion
> failure.

For the above reproducer, softhsm-devel shouldn't be necessary. p11-kit-testable should load only mock modules.

Comment 3 Florian Weimer 2023-10-25 10:09:48 UTC
(In reply to Daiki Ueno from comment #2)
> (In reply to Florian Weimer from comment #1)
> > I would love to have a reproducer for this, but I couldn't get it to work
> > based on your instructions. Here is the Containerfile I tried:
> 
> I'm trying to create a standalone reproducer, but haven't managed so far.
> 
> > FROM fedora:37
> > RUN dnf install -y 'dnf-command(builddep)' git automake autoconf libtool
> > gettext-devel
> 
> With the stock fedora:37 image, glibc is at glibc-2.36-14.fc37.x86_64.  I
> guess you would need to use fedora:38 and run "dnf update -y glibc" before
> proceeding to the following steps.

The image has already been updated:

$ podman run -i -t --pull=always fedora:37 rpm -q glibc
Trying to pull registry.fedoraproject.org/fedora:37...
Getting image source signatures
Copying blob 5547adeb6c7b skipped: already exists  
Copying config a9bfb06409 done   | 
Writing manifest to image destination
glibc-2.36-14.fc37.x86_64

So that's not why reproduction fails, unfortunately.

Comment 4 Daiki Ueno 2023-10-25 10:17:41 UTC
(In reply to Florian Weimer from comment #3)

> The image has already been updated:
> 
> $ podman run -i -t --pull=always fedora:37 rpm -q glibc
> Trying to pull registry.fedoraproject.org/fedora:37...
> Getting image source signatures
> Copying blob 5547adeb6c7b skipped: already exists  
> Copying config a9bfb06409 done   | 
> Writing manifest to image destination
> glibc-2.36-14.fc37.x86_64
> 
> So that's not why reproduction fails, unfortunately.

Sorry Florian, I don't get what you mean: glibc-2.36-14 is older than glibc-2.37-12 (notice the minor version).

Comment 5 Florian Weimer 2023-10-25 10:34:14 UTC
Sorry, I confused Fedora 38 and Fedora 37 (classical mistake due to the off-by-one glibc version). I tried to reproduce on Fedora 38 as well (outside a container), but couldn't. The fixed Containerfile looks like this:

FROM fedora:38
RUN dnf update -y
RUN dnf install -y 'dnf-command(builddep)' git automake autoconf libtool gettext-devel
RUN dnf build-dep -y p11-kit
WORKDIR /root
RUN git clone --depth=1 --recurse-submodules https://github.com/p11-glue/p11-kit.git
WORKDIR /root/p11-kit
RUN ./autogen.sh --disable-doc
RUN make -j`nproc`
RUN make -j`nproc` check
RUN MALLOC_PERTURB_=55 p11-kit/p11-kit-testable list-modules

Sadly, it still doesn't reproduce the issue.

I think I know see what the problem is, though. It looks like a latent use-after-free bug in dlclose that may hit after a recursive dlclose call.

Comment 6 Daiki Ueno 2023-10-25 11:10:40 UTC
(In reply to Florian Weimer from comment #5)

> Sadly, it still doesn't reproduce the issue.

Indeed, I was also not able to reproduce the issue with the Containerfile.
 
> I think I know see what the problem is, though. It looks like a latent
> use-after-free bug in dlclose that may hit after a recursive dlclose call.

It seems the trigger is that libsofthsm2.so is loaded.  I said on comment 2 that shouldn't be the case, but turned out it is, when p11-kit-devel (i.e., p11-kit-1.pc) is installed, one of the test creates a symlink to /usr/lib64/libsofthsm2.so:
https://github.com/p11-glue/p11-kit/blob/96a8b145a33f95eae52d7d31d0d7ed1674c68423/p11-kit/test-generate-keypair.sh#L41

With the following Containerfile, I can reproduce the issue (note p11-kit-devel and softhsm are installed):

FROM fedora:38
RUN dnf update -y
RUN dnf install -y 'dnf-command(builddep)' git automake autoconf libtool gettext-devel p11-kit-devel softhsm
RUN dnf build-dep -y p11-kit
WORKDIR /root
RUN git clone --depth=1 --recurse-submodules https://github.com/p11-glue/p11-kit.git
WORKDIR /root/p11-kit
RUN ./autogen.sh --disable-doc
RUN make -j`nproc`
RUN make -j`nproc` check
RUN MALLOC_PERTURB_=55 p11-kit/p11-kit-testable list-modules

Comment 7 Florian Weimer 2023-10-25 12:13:42 UTC
Thank you so much, reproduced. We'll take it from here.

Comment 8 Florian Weimer 2023-10-25 16:36:46 UTC
valgrind reports:

==1== Invalid read of size 1
==1==    at 0x4011494: dfs_traversal (dl-sort-maps.c:145)
==1==    by 0x4011494: dfs_traversal.part.0 (dl-sort-maps.c:157)
==1==    by 0x401152B: dfs_traversal (dl-sort-maps.c:169)
==1==    by 0x401152B: dfs_traversal.part.0 (dl-sort-maps.c:172)
==1==    by 0x40118F3: dfs_traversal (dl-sort-maps.c:145)
==1==    by 0x40118F3: _dl_sort_maps_dfs (dl-sort-maps.c:228)
==1==    by 0x40118F3: _dl_sort_maps (dl-sort-maps.c:314)
==1==    by 0x4004FFD: _dl_fini (dl-fini.c:94)
==1==    by 0x48B91E5: __run_exit_handlers (in /usr/lib64/libc.so.6)
==1==    by 0x48B932D: exit (in /usr/lib64/libc.so.6)
==1==    by 0x48A0B90: (below main) (in /usr/lib64/libc.so.6)
==1==  Address 0x4a724a5 is 821 bytes inside a block of size 1,240 free'd
==1==    at 0x48452AC: free (vg_replace_malloc.c:974)
==1==    by 0x4002131: UnknownInlinedFun (rtld-malloc.h:50)
==1==    by 0x4002131: _dl_close_worker (dl-close.c:702)
==1==    by 0x400267A: _dl_close (dl-close.c:791)
==1==    by 0x4001522: _dl_catch_exception (dl-catch.c:237)
==1==    by 0x4001678: _dl_catch_error (dl-catch.c:256)
==1==    by 0x49011F2: _dlerror_run (in /usr/lib64/libc.so.6)
==1==    by 0x4900F25: dlclose@@GLIBC_2.34 (in /usr/lib64/libc.so.6)
==1==    by 0x4167FF: free_module_unlocked (modules.c:311)
==1==    by 0x4108B6: p11_dict_free (dict.c:310)
==1==    by 0x4171C9: free_modules_when_no_refs_unlocked (modules.c:891)
==1==    by 0x4171C9: free_modules_when_no_refs_unlocked (modules.c:871)
==1==    by 0x419460: p11_modules_release_inlock_reentrant (modules.c:1975)
==1==    by 0x419A58: p11_kit_modules_release (modules.c:2374)
==1==  Block was alloc'd at
==1==    at 0x484782C: calloc (vg_replace_malloc.c:1554)
==1==    by 0x400B8FD: UnknownInlinedFun (rtld-malloc.h:44)
==1==    by 0x400B8FD: _dl_new_object (dl-object.c:92)
==1==    by 0x400707E: _dl_map_object_from_fd (dl-load.c:1063)
==1==    by 0x4008B38: _dl_map_object (dl-load.c:2253)
==1==    by 0x40028BC: openaux (dl-deps.c:64)
==1==    by 0x4001522: _dl_catch_exception (dl-catch.c:237)
==1==    by 0x4002D17: _dl_map_object_deps (dl-deps.c:232)
==1==    by 0x400C724: dl_open_worker_begin (dl-open.c:592)
==1==    by 0x4001522: _dl_catch_exception (dl-catch.c:237)
==1==    by 0x400BE6F: dl_open_worker (dl-open.c:782)
==1==    by 0x4001522: _dl_catch_exception (dl-catch.c:237)
==1==    by 0x400C243: _dl_open (dl-open.c:884)

This aligns very well with the backtrace in bug 2244992 comment 8.

Comment 9 Florian Weimer 2023-10-25 17:09:35 UTC
Oops, it does not align well. This one here has _dl_fini in it. My fix attempt does not help at all because it's dlclose only.

Comment 10 Carlos O'Donell 2023-10-26 21:37:53 UTC
(In reply to Florian Weimer from comment #9)
> Oops, it does not align well. This one here has _dl_fini in it. My fix
> attempt does not help at all because it's dlclose only.

We are going to temporarily revert this change across f37, f38, f39 and f40 while we review the possible fixes upstream.

I will push reverts for this shortly across all the active branches.

Comment 11 Florian Weimer 2023-11-07 09:40:53 UTC
Finally found it.  It's this bit:

+  /* Put the dlclose'd map first, so that its destructor runs first.
+     The map variable is NULL after a retry.  */
+  maps[map->l_idx] = maps[0];
+  maps[0] = map;

It does not swap map->l_idx and maps[0]->l_idx, so the invariant maps[i]->l_idx == i is violated.

Comment 12 Carlos O'Donell 2023-11-07 14:12:44 UTC
The status is that this bug is fixed (via the revert) in F38, F39 and F40 (Rawhide). The bug is still present in F37 (https://bodhi.fedoraproject.org/updates/FEDORA-2023-3b9da12381) because we haven't seen enough karma to push the build with the fix.

Florian has found the bug in the fix, so the fix is going to get put back into the releases after testing.

Comment 13 Fedora Update System 2023-11-20 12:22:31 UTC
FEDORA-2023-13d599718b has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2023-13d599718b

Comment 14 Fedora Update System 2023-11-20 12:23:13 UTC
FEDORA-2023-b5c7c2ff8b has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2023-b5c7c2ff8b

Comment 15 Fedora Update System 2023-11-20 13:09:43 UTC
FEDORA-2023-04796d3939 has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2023-04796d3939

Comment 16 Fedora Update System 2023-11-21 01:30:47 UTC
FEDORA-2023-b5c7c2ff8b has been pushed to the Fedora 38 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-b5c7c2ff8b`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-b5c7c2ff8b

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 17 Fedora Update System 2023-11-21 02:16:43 UTC
FEDORA-2023-04796d3939 has been pushed to the Fedora 37 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-04796d3939`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-04796d3939

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 18 Fedora Update System 2023-11-21 02:31:03 UTC
FEDORA-2023-13d599718b has been pushed to the Fedora 39 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-13d599718b`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-13d599718b

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 19 Fedora Update System 2023-11-23 01:24:23 UTC
FEDORA-2023-13d599718b has been pushed to the Fedora 39 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 20 Fedora Update System 2023-11-27 02:31:27 UTC
FEDORA-2023-b5c7c2ff8b has been pushed to the Fedora 38 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 21 Fedora Update System 2023-11-29 01:39:27 UTC
FEDORA-2023-04796d3939 has been pushed to the Fedora 37 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.