After upgrading to glibc-2.37-12, several tests started failing in p11-kit, when MALLOC_PERTURB_ is set, with: Inconsistency detected by ld.so: dl-close.c: 200: _dl_close_worker: Assertion `(*lp)->l_idx >= 0 && (*lp)->l_idx < nloaded' failed! Reproducible: Always Steps to Reproduce: 1. git clone --depth=1 --recurse-submodules https://github.com/p11-glue/p11-kit.git 2. cd p11-kit 3. ./autogen.sh --disable-doc && make -j$(nproc) && MALLOC_PERTURB_=55 p11-kit/p11-kit-testable list-modules Actual Results: Error is printed: Inconsistency detected by ld.so: dl-close.c: 200: _dl_close_worker: Assertion `(*lp)->l_idx >= 0 && (*lp)->l_idx < nloaded' failed! Expected Results: No error and exit with 0 status code
I would love to have a reproducer for this, but I couldn't get it to work based on your instructions. Here is the Containerfile I tried: FROM fedora:37 RUN dnf install -y 'dnf-command(builddep)' git automake autoconf libtool gettext-devel RUN dnf build-dep -y p11-kit WORKDIR /root RUN git clone --depth=1 --recurse-submodules https://github.com/p11-glue/p11-kit.git WORKDIR /root/p11-kit RUN ./autogen.sh --disable-doc RUN make -j`nproc` RUN make -j`nproc` check RUN MALLOC_PERTURB_=55 p11-kit/p11-kit-testable list-modules I tried to install softhsm-devel as well before building p11-kit, based on some warning output, but it doesn't make a difference. Still no assertion failure.
(In reply to Florian Weimer from comment #1) > I would love to have a reproducer for this, but I couldn't get it to work > based on your instructions. Here is the Containerfile I tried: I'm trying to create a standalone reproducer, but haven't managed so far. > FROM fedora:37 > RUN dnf install -y 'dnf-command(builddep)' git automake autoconf libtool > gettext-devel With the stock fedora:37 image, glibc is at glibc-2.36-14.fc37.x86_64. I guess you would need to use fedora:38 and run "dnf update -y glibc" before proceeding to the following steps. > RUN dnf build-dep -y p11-kit > WORKDIR /root > RUN git clone --depth=1 --recurse-submodules > https://github.com/p11-glue/p11-kit.git > WORKDIR /root/p11-kit > RUN ./autogen.sh --disable-doc > RUN make -j`nproc` > RUN make -j`nproc` check > RUN MALLOC_PERTURB_=55 p11-kit/p11-kit-testable list-modules > > I tried to install softhsm-devel as well before building p11-kit, based on > some warning output, but it doesn't make a difference. Still no assertion > failure. For the above reproducer, softhsm-devel shouldn't be necessary. p11-kit-testable should load only mock modules.
(In reply to Daiki Ueno from comment #2) > (In reply to Florian Weimer from comment #1) > > I would love to have a reproducer for this, but I couldn't get it to work > > based on your instructions. Here is the Containerfile I tried: > > I'm trying to create a standalone reproducer, but haven't managed so far. > > > FROM fedora:37 > > RUN dnf install -y 'dnf-command(builddep)' git automake autoconf libtool > > gettext-devel > > With the stock fedora:37 image, glibc is at glibc-2.36-14.fc37.x86_64. I > guess you would need to use fedora:38 and run "dnf update -y glibc" before > proceeding to the following steps. The image has already been updated: $ podman run -i -t --pull=always fedora:37 rpm -q glibc Trying to pull registry.fedoraproject.org/fedora:37... Getting image source signatures Copying blob 5547adeb6c7b skipped: already exists Copying config a9bfb06409 done | Writing manifest to image destination glibc-2.36-14.fc37.x86_64 So that's not why reproduction fails, unfortunately.
(In reply to Florian Weimer from comment #3) > The image has already been updated: > > $ podman run -i -t --pull=always fedora:37 rpm -q glibc > Trying to pull registry.fedoraproject.org/fedora:37... > Getting image source signatures > Copying blob 5547adeb6c7b skipped: already exists > Copying config a9bfb06409 done | > Writing manifest to image destination > glibc-2.36-14.fc37.x86_64 > > So that's not why reproduction fails, unfortunately. Sorry Florian, I don't get what you mean: glibc-2.36-14 is older than glibc-2.37-12 (notice the minor version).
Sorry, I confused Fedora 38 and Fedora 37 (classical mistake due to the off-by-one glibc version). I tried to reproduce on Fedora 38 as well (outside a container), but couldn't. The fixed Containerfile looks like this: FROM fedora:38 RUN dnf update -y RUN dnf install -y 'dnf-command(builddep)' git automake autoconf libtool gettext-devel RUN dnf build-dep -y p11-kit WORKDIR /root RUN git clone --depth=1 --recurse-submodules https://github.com/p11-glue/p11-kit.git WORKDIR /root/p11-kit RUN ./autogen.sh --disable-doc RUN make -j`nproc` RUN make -j`nproc` check RUN MALLOC_PERTURB_=55 p11-kit/p11-kit-testable list-modules Sadly, it still doesn't reproduce the issue. I think I know see what the problem is, though. It looks like a latent use-after-free bug in dlclose that may hit after a recursive dlclose call.
(In reply to Florian Weimer from comment #5) > Sadly, it still doesn't reproduce the issue. Indeed, I was also not able to reproduce the issue with the Containerfile. > I think I know see what the problem is, though. It looks like a latent > use-after-free bug in dlclose that may hit after a recursive dlclose call. It seems the trigger is that libsofthsm2.so is loaded. I said on comment 2 that shouldn't be the case, but turned out it is, when p11-kit-devel (i.e., p11-kit-1.pc) is installed, one of the test creates a symlink to /usr/lib64/libsofthsm2.so: https://github.com/p11-glue/p11-kit/blob/96a8b145a33f95eae52d7d31d0d7ed1674c68423/p11-kit/test-generate-keypair.sh#L41 With the following Containerfile, I can reproduce the issue (note p11-kit-devel and softhsm are installed): FROM fedora:38 RUN dnf update -y RUN dnf install -y 'dnf-command(builddep)' git automake autoconf libtool gettext-devel p11-kit-devel softhsm RUN dnf build-dep -y p11-kit WORKDIR /root RUN git clone --depth=1 --recurse-submodules https://github.com/p11-glue/p11-kit.git WORKDIR /root/p11-kit RUN ./autogen.sh --disable-doc RUN make -j`nproc` RUN make -j`nproc` check RUN MALLOC_PERTURB_=55 p11-kit/p11-kit-testable list-modules
Thank you so much, reproduced. We'll take it from here.
valgrind reports: ==1== Invalid read of size 1 ==1== at 0x4011494: dfs_traversal (dl-sort-maps.c:145) ==1== by 0x4011494: dfs_traversal.part.0 (dl-sort-maps.c:157) ==1== by 0x401152B: dfs_traversal (dl-sort-maps.c:169) ==1== by 0x401152B: dfs_traversal.part.0 (dl-sort-maps.c:172) ==1== by 0x40118F3: dfs_traversal (dl-sort-maps.c:145) ==1== by 0x40118F3: _dl_sort_maps_dfs (dl-sort-maps.c:228) ==1== by 0x40118F3: _dl_sort_maps (dl-sort-maps.c:314) ==1== by 0x4004FFD: _dl_fini (dl-fini.c:94) ==1== by 0x48B91E5: __run_exit_handlers (in /usr/lib64/libc.so.6) ==1== by 0x48B932D: exit (in /usr/lib64/libc.so.6) ==1== by 0x48A0B90: (below main) (in /usr/lib64/libc.so.6) ==1== Address 0x4a724a5 is 821 bytes inside a block of size 1,240 free'd ==1== at 0x48452AC: free (vg_replace_malloc.c:974) ==1== by 0x4002131: UnknownInlinedFun (rtld-malloc.h:50) ==1== by 0x4002131: _dl_close_worker (dl-close.c:702) ==1== by 0x400267A: _dl_close (dl-close.c:791) ==1== by 0x4001522: _dl_catch_exception (dl-catch.c:237) ==1== by 0x4001678: _dl_catch_error (dl-catch.c:256) ==1== by 0x49011F2: _dlerror_run (in /usr/lib64/libc.so.6) ==1== by 0x4900F25: dlclose@@GLIBC_2.34 (in /usr/lib64/libc.so.6) ==1== by 0x4167FF: free_module_unlocked (modules.c:311) ==1== by 0x4108B6: p11_dict_free (dict.c:310) ==1== by 0x4171C9: free_modules_when_no_refs_unlocked (modules.c:891) ==1== by 0x4171C9: free_modules_when_no_refs_unlocked (modules.c:871) ==1== by 0x419460: p11_modules_release_inlock_reentrant (modules.c:1975) ==1== by 0x419A58: p11_kit_modules_release (modules.c:2374) ==1== Block was alloc'd at ==1== at 0x484782C: calloc (vg_replace_malloc.c:1554) ==1== by 0x400B8FD: UnknownInlinedFun (rtld-malloc.h:44) ==1== by 0x400B8FD: _dl_new_object (dl-object.c:92) ==1== by 0x400707E: _dl_map_object_from_fd (dl-load.c:1063) ==1== by 0x4008B38: _dl_map_object (dl-load.c:2253) ==1== by 0x40028BC: openaux (dl-deps.c:64) ==1== by 0x4001522: _dl_catch_exception (dl-catch.c:237) ==1== by 0x4002D17: _dl_map_object_deps (dl-deps.c:232) ==1== by 0x400C724: dl_open_worker_begin (dl-open.c:592) ==1== by 0x4001522: _dl_catch_exception (dl-catch.c:237) ==1== by 0x400BE6F: dl_open_worker (dl-open.c:782) ==1== by 0x4001522: _dl_catch_exception (dl-catch.c:237) ==1== by 0x400C243: _dl_open (dl-open.c:884) This aligns very well with the backtrace in bug 2244992 comment 8.
Oops, it does not align well. This one here has _dl_fini in it. My fix attempt does not help at all because it's dlclose only.
(In reply to Florian Weimer from comment #9) > Oops, it does not align well. This one here has _dl_fini in it. My fix > attempt does not help at all because it's dlclose only. We are going to temporarily revert this change across f37, f38, f39 and f40 while we review the possible fixes upstream. I will push reverts for this shortly across all the active branches.
Finally found it. It's this bit: + /* Put the dlclose'd map first, so that its destructor runs first. + The map variable is NULL after a retry. */ + maps[map->l_idx] = maps[0]; + maps[0] = map; It does not swap map->l_idx and maps[0]->l_idx, so the invariant maps[i]->l_idx == i is violated.
The status is that this bug is fixed (via the revert) in F38, F39 and F40 (Rawhide). The bug is still present in F37 (https://bodhi.fedoraproject.org/updates/FEDORA-2023-3b9da12381) because we haven't seen enough karma to push the build with the fix. Florian has found the bug in the fix, so the fix is going to get put back into the releases after testing.
FEDORA-2023-13d599718b has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2023-13d599718b
FEDORA-2023-b5c7c2ff8b has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2023-b5c7c2ff8b
FEDORA-2023-04796d3939 has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2023-04796d3939
FEDORA-2023-b5c7c2ff8b has been pushed to the Fedora 38 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-b5c7c2ff8b` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-b5c7c2ff8b See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2023-04796d3939 has been pushed to the Fedora 37 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-04796d3939` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-04796d3939 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2023-13d599718b has been pushed to the Fedora 39 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-13d599718b` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-13d599718b See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2023-13d599718b has been pushed to the Fedora 39 stable repository. If problem still persists, please make note of it in this bug report.
FEDORA-2023-b5c7c2ff8b has been pushed to the Fedora 38 stable repository. If problem still persists, please make note of it in this bug report.
FEDORA-2023-04796d3939 has been pushed to the Fedora 37 stable repository. If problem still persists, please make note of it in this bug report.