Bug 2237392

Summary: Failed assertion on i386 when extracting debuginfo
Product: [Fedora] Fedora Reporter: Frantisek Sumsal <fsumsal>
Component: gdbAssignee: Kevin Buettner <kevinb>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: rawhideCC: aburgess, ahajkova, arjun.is, codonell, decathorpe, dj, fberat, fweimer, guinevere, jakub, jan, jlaw, keiths, kevinb, marcdeop, mcermak, mcoufal, mfabian, mkolar, pfrankli, sipoyare, skolosov, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: gdb-13.2-4.fc38 gdb-13.2-8.fc39 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-18 18:07:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
reproducer executable none

Description Frantisek Sumsal 2023-09-05 09:00:11 UTC
Hey,

Since ~yesterday I started noticing hanging i386 Rawhide Copr jobs in our upstream systemd CI. After closer inspection there seems to be a failed assert somewhere in glibc which the jobs get stuck on:

+ /usr/lib/rpm/find-lang.sh /builddir/build/BUILDROOT/systemd-254-1.20230905052623510728.pr29071.899.g0d239b6a0a.i386 systemd
+ python3 /builddir/build/SOURCES/split-files.py /builddir/build/BUILDROOT/systemd-254-1.20230905052623510728.pr29071.899.g0d239b6a0a.i386
+ /usr/bin/find-debuginfo -j2 --strict-build-id -m -i --build-id-seed 254-1.20230905052623510728.pr29071.899.g0d239b6a0a --unique-debug-suffix -254-1.20230905052623510728.pr29071.899.g0d239b6a0a.i386 --unique-debug-src-base systemd-254-1.20230905052623510728.pr29071.899.g0d239b6a0a.i386 --run-dwz --dwz-low-mem-die-limit 10000000 --dwz-max-die-limit 50000000 -S debugsourcefiles.list /builddir/build/BUILD/systemd-254
find-debuginfo: starting
Extracting debug info from 457 files
Fatal glibc error: malloc.c:2594 (sysmalloc): assertion failed: (old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)

A couple of affected jobs:
 - https://copr.fedorainfracloud.org/coprs/packit/systemd-systemd-29071/build/6372309/
 - https://copr.fedorainfracloud.org/coprs/packit/systemd-systemd-29074/build/6372581/
 - https://copr.fedorainfracloud.org/coprs/packit/systemd-systemd-29051/build/6372504/ 

Reproducible: Always

Comment 1 Florian Weimer 2023-09-05 09:15:11 UTC
I can reproduce it in mock:

“
+ /usr/bin/find-debuginfo -j4 --strict-build-id -m -i --build-id-seed 254.1-5.fc40 --unique-debug-suffix -254.1-5.fc40.i386 --unique-debug-src-base systemd-254.1-5.fc40.i386 --run-dwz --dwz-low-mem-die-limit 10000000 --dwz-max-die-limit 50000000 -S debugsourcefiles.list /builddir/build/BUILD/systemd-stable-254.1
find-debuginfo: starting
Extracting debug info from 451 files
Fatal glibc error: malloc.c:2594 (sysmalloc): assertion failed: (old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)


Fatal signal: 
”

Comment 2 Florian Weimer 2023-09-05 09:17:37 UTC
This works as a reproducer without a full rebuild:

cd /builddir/build/BUILD/systemd-stable-254.1
RPM_PACKAGE_NAME=systemd RPM_BUILD_DIR=`pwd` RPM_BUILD_ROOT=/builddir/build/BUILDROOT/systemd-254.1-5.fc40.i386 /usr/bin/find-debuginfo -j4 --strict-build-id -m -i --build-id-seed 254.1-5.fc40 --unique-debug-suffix -254.1-5.fc40.i386 --unique-debug-src-base systemd-254.1-5.fc40.i386 --run-dwz --dwz-low-mem-die-limit 10000000 --dwz-max-die-limit 50000000 -S debugsourcefiles.list /builddir/build/BUILD/systemd-stable-254.1

It likely hangs because the crash handler calls into malloc.

Comment 3 Florian Weimer 2023-09-05 09:26:16 UTC
Reproduces with glibc-2.38.9000-5.fc40.i686 as well.

Comment 4 Florian Weimer 2023-09-05 09:31:48 UTC
And even glibc-2.38-1.fc39.i686. Not sure if this is actually a glibc bug.

Comment 5 Florian Weimer 2023-09-05 09:33:14 UTC
Running under bash -x produces:

“
+ gdb-add-index /builddir/build/BUILDROOT/systemd-254.1-5.fc40.i386/usr/lib/systemd/tests/unit-tests/test-tpm2
Fatal glibc error: malloc.c:2589 (sysmalloc): assertion failed: (old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)


Fatal signal: ^C
/usr/bin/find-debuginfo: line 457:   221 Killed                  gdb-add-index "$f"
“

So I suspect this is a gdb issue.

Comment 6 Zbigniew Jędrzejewski-Szmek 2023-09-07 19:05:33 UTC
It only happens in rawhide builds. The same package built in F39 is fine.
https://koji.fedoraproject.org/koji/taskinfo?taskID=105853466 → bad
https://koji.fedoraproject.org/koji/taskinfo?taskID=105853461 → no problem

Comment 7 Andrew Burgess 2023-09-12 15:47:42 UTC
Would it be possible to attach the /builddir/build/BUILDROOT/systemd-254.1-5.fc40.i386/usr/lib/systemd/tests/unit-tests/test-tpm2 for which gdb-add-index fails to this bug?

Comment 8 Keith Seitz 2023-09-13 01:10:50 UTC
Created attachment 1988548 [details]
reproducer executable

Comment 9 Keith Seitz 2023-09-13 01:17:06 UTC
I've spent the day chasing this down a bit...

First off, you *must* use gdb.i686 to reproduce this with the (attached)
binary. Either use mock to build it, creating a rawhide/i386 env,
or grab the RPMs from koji and install the necessary 32-bit dependencies
on your workstation.  [I did this successfully on f38.]

While playing around in my mock environment, I noticed that upstream
origin/master worked. This is the commit that fixes it:

commit d06730bc0205f7c35bfccf057ef0ef83a12206d6
Author: Tom de Vries <tdevries>
Date:   Sat Aug 5 17:57:13 2023 +0200

    [gdb/symtab] Find main language without symtab expansion

However, simply grabbing this patch is insufficient. AFAICT, it requires
at least a dozen other patches -- gdb/dwarf2/cooked_index.[ch] have
changed a LOT since gdb-13-branch.

And I still don't know why this only fails on rawhide/i386...

Comment 10 Carlos O'Donell 2023-09-14 10:55:03 UTC
*** Bug 2238843 has been marked as a duplicate of this bug. ***

Comment 11 Andrew Burgess 2023-09-14 12:24:34 UTC
I believe I have a fix for this issue.  I'm running the GDB regression tests and will post the patch upstream later today.  Hopefully we can get the fix merged ASAP and then back-ported.  Just for the record, here's the GDB patch to fix this (yes, it's a 2 character change):

diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
index 98bedbc5d49..0f4d99109fb 100644
--- a/gdb/dwarf2/read.c
+++ b/gdb/dwarf2/read.c
@@ -10548,7 +10548,7 @@ read_call_site_scope (struct die_info *die, struct dwarf2_cu *cu)
 	  std::vector<unrelocated_addr> addresses;
 	  dwarf2_ranges_read_low_addrs (ranges_offset, target_cu,
 					target_die->tag, addresses);
-	  unrelocated_addr *saved = XOBNEWVAR (&objfile->objfile_obstack,
+	  unrelocated_addr *saved = XOBNEWVEC (&objfile->objfile_obstack,
 					       unrelocated_addr,
 					       addresses.size ());
 	  std::copy (addresses.begin (), addresses.end (), saved);

Comment 12 Andrew Burgess 2023-09-14 21:31:24 UTC
Created pull requests https://src.fedoraproject.org/rpms/gdb/pull-request/96 and https://src.fedoraproject.org/rpms/gdb/pull-request/97 to back-port the fix to rawhide and f38 respectively.

Comment 13 Fabio Valentini 2023-09-15 13:49:35 UTC
Out of curiosity, is there a reason why this patch was back ported to Fedora 40/rawhide and Fedora 38, but not for the upcoming Fedora 39 release?

Comment 14 Keith Seitz 2023-09-15 15:10:23 UTC
Someone missed that rawhide had moved to 40. There is a PR working its way through now.

Comment 15 Fedora Update System 2023-09-15 20:44:03 UTC
FEDORA-2023-e55ab8d0a7 has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2023-e55ab8d0a7

Comment 16 Fedora Update System 2023-09-15 20:44:54 UTC
FEDORA-2023-15aed01c68 has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2023-15aed01c68

Comment 17 Fedora Update System 2023-09-16 01:48:42 UTC
FEDORA-2023-e55ab8d0a7 has been pushed to the Fedora 39 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-e55ab8d0a7`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-e55ab8d0a7

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 18 Fedora Update System 2023-09-16 03:17:57 UTC
FEDORA-2023-15aed01c68 has been pushed to the Fedora 38 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-15aed01c68`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-15aed01c68

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 19 Mark Wielaard 2023-09-18 13:42:50 UTC
*** Bug 2238268 has been marked as a duplicate of this bug. ***

Comment 20 Fedora Update System 2023-09-18 18:07:32 UTC
FEDORA-2023-15aed01c68 has been pushed to the Fedora 38 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 21 Fedora Update System 2023-09-20 00:19:44 UTC
FEDORA-2023-e55ab8d0a7 has been pushed to the Fedora 39 stable repository.
If problem still persists, please make note of it in this bug report.