Bug 2368545 - malloc regression on i686 blocks the rebuild of python-pyside6
Summary: malloc regression on i686 blocks the rebuild of python-pyside6
Keywords:
Status: ASSIGNED
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: rawhide
Hardware: i686
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: DJ Delorie
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: PYTHON3.14 2325444
TreeView+ depends on / blocked
 
Reported: 2025-05-26 09:47 UTC by Miro Hrončok
Modified: 2025-06-08 16:52 UTC (History)
18 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
GNU Compiler Collection 120592 0 P3 NEW XMM register is used across ___tls_get_addr 2025-06-08 09:46:05 UTC
Sourceware 32996 0 P2 UNCONFIRMED malloc regression on i686 2025-05-26 09:47:56 UTC

Description Miro Hrončok 2025-05-26 09:47:56 UTC
This is a bugzilla to track upstream issue https://sourceware.org/bugzilla/show_bug.cgi?id=32996

We cannot build python-pyside6 on i686. Building it is essential for us to land the mass Python 3.14 rebuild (starting most likely next week).

Comment 1 Miro Hrončok 2025-05-26 10:05:30 UTC
I can reproduce even with glibc-2.41.9000-14.fc43.i686

<mock-chroot> sh-5.2# rpm -q glibc python3-pyside6 python3
glibc-2.41.9000-14.fc43.i686
python3-pyside6-6.9.0-1.fc43.i686
python3-3.13.3-3.fc43.i686

<mock-chroot> sh-5.2# python3 /usr/lib/python3.13/site-packages/PySide6/support/generate_pyi.py --outpath . QtGui 
Segmentation fault (core dumped)

Comment 2 Florian Weimer 2025-05-26 13:12:29 UTC
Yes, it's unfortunate I renamed the patch for the other bug fix to reference this bug when it's apparently something else entirely.

Comment 3 Florian Weimer 2025-05-27 17:10:57 UTC
The error changes with glibc-2.41.9000-14.fc43.i686. The new failure looks like this:

Program received signal SIGABRT, Aborted.
Downloading 2.37 K source file /usr/src/debug/kernel-6.14.6/linux-6.14.6-300.fc42.x86_64/arch/x86/entry/vdso/vdso32/system_call.S
__kernel_vsyscall () at arch/x86/entry/vdso/vdso32/system_call.S:72                                             
72              popl    %ebp
(gdb) bt
#0  __kernel_vsyscall () at arch/x86/entry/vdso/vdso32/system_call.S:72
#1  0xf79048ef in __pthread_kill_implementation (threadid=threadid@entry=4151973632, signo=signo@entry=6, 
    no_tid=no_tid@entry=0) at pthread_kill.c:43
#2  0xf79049a9 in __pthread_kill_internal (threadid=4151973632, signo=6) at pthread_kill.c:89
#3  0xf78a9a61 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26
#4  0xf7890f59 in __GI_abort () at abort.c:73
#5  0xf7892150 in __libc_message_impl (fmt=<optimized out>) at ../sysdeps/posix/libc_fatal.c:134
#6  0xf790fccd in malloc_printerr (str=<optimized out>) at malloc.c:5760
#7  0xf790fcf2 in malloc_printerr_tail (str=<optimized out>) at malloc.c:5777
#8  0xf6436744 in QBindingStoragePrivate::destroy (this=<optimized out>)
    at /usr/src/debug/qt6-qtbase-6.9.0-2.fc43.i386/src/corelib/kernel/qproperty.cpp:2273

I bisected this down to:

* Thu Apr 03 2025 Arjun Shankar <arjun> - 2.41.9000-8
- Auto-sync with upstream branch master,
  aaf94ec804830e0e273cfb45d54f4a04ab778fe5:
- stdio: fix hurd link for tst-setvbuf2
- stdlib: Fix qsort memory leak if callback throws (BZ 32058)
- sysdeps: powerpc: restore -mlong-double-128 check
- stdio: Add more setvbuf tests
- add ptmx support to test-container
- Update syscall lists for Linux 6.14
- x86: Link tst-gnu2-tls2-x86-noxsave{,c,xsavec} with libpthread
- elf: Fix tst-origin build when toolchain defaults to --as-needed (BZ 32823)
- Raise the minimum GCC version to 12.1 [BZ #32539]
- Fix typo in comment
- manual: tidy the longopt.c example
- manual: Document functions adopted by POSIX.1-2024.
- aarch64: Fix _dl_tlsdesc_dynamic unwind for pac-ret (BZ 32612)
- x86: Use separate variable for TLSDESC XSAVE/XSAVEC state size (bug 32810)
- x86: Skip XSAVE state size reset if ISA level requires XSAVE
- malloc: Improve performance of __libc_malloc
- stdio-common: Reject real data w/o exponent digits in scanf [BZ #12701]
- stdio-common: Reject significand prefixes in scanf [BZ #12701]
- stdio-common: Reject integer prefixes in scanf [BZ #12701]
- stdio-common: Also reject exp char w/o significand in i18n scanf [BZ #13988]
- stdio-common: Add tests for formatted vsscanf input specifiers
- stdio-common: Add tests for formatted vfscanf input specifiers
- stdio-common: Add tests for formatted vscanf input specifiers
- stdio-common: Add tests for formatted sscanf input specifiers
- stdio-common: Add tests for formatted fscanf input specifiers
- stdio-common: Add scanf long double data for Intel/Motorola 80-bit format
- Implement C23 pown
- support: Use unwinder in links-dso-program-c only with libgcc_s
- malloc: Use __always_inline for simple functions
- linux: Fix integer overflow warnings when including <sys/mount.h> [BZ #32708]
- malloc: Use _int_free_chunk for remainders
- Use MPFR 4.2.2 and Linux 6.14 in build-many-glibcs.py
- stdio-common: Add scanf long double data for IBM 128-bit format
- stdio-common: Add scanf long double data for IEEE 754 binary64 format
- stdio-common: Add scanf long double data for IEEE 754 binary128 format
- stdio-common: Add scanf double data for IEEE 754 binary64 format
- stdio-common: Add scanf float data for IEEE 754 binary32 format
- stdio-common: Add scanf integer data for LP64 targets
- stdio-common: Add scanf integer data for ILP32 targets
- stdio-common: Add tests for formatted scanf input specifiers

Comment 4 Łukasz Patron 2025-05-27 17:34:21 UTC
That stack trace looks like one I posted on sourceware bugzilla, which started happening as of "malloc: Use _int_free_chunk for remainders".

Comment 5 Łukasz Patron 2025-05-27 17:35:52 UTC
(In reply to Łukasz Patron from comment #4)
> That stack trace looks like one I posted on sourceware bugzilla, which
> started happening as of "malloc: Use _int_free_chunk for remainders".

More specifically, this exact part of the diff:

@@ -5087,7 +5084,7 @@ _int_realloc (mstate av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
                 (av != &main_arena ? NON_MAIN_ARENA : 0));
       /* Mark remainder as inuse so free() won't complain */
       set_inuse_bit_at_offset (remainder, remainder_size);
-      _int_free (av, remainder, 1);
+      _int_free_chunk (av, remainder, chunksize (remainder), 1);
     }
 
   check_inuse_chunk (av, newp);

Comment 6 Miro Hrončok 2025-05-29 10:53:24 UTC
We start the Python 3.14 mass rebuild on Monday-ish. I bumped the severity accordingly.

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/LHYTUTQDVNF2L5KM7BDGZRUJRWLGGICU/

Comment 7 DJ Delorie 2025-05-29 19:06:34 UTC
I've been working on this for the last two days.  While the line change noted above enables the error, I don't yet know if it's the *cause* of the error.  I can temporarily revert that line if it makes you happy and lets the rebuild run, but I doubt reverting it is the right fix.

Comment 8 Miro Hrončok 2025-05-29 20:16:50 UTC
The urgency is that we need to build python-pyside6. So, at least a temporary revert (if it enables the build) would help us and allow more time for a proper fix.

Comment 9 Florian Weimer 2025-05-30 07:16:23 UTC
(In reply to Miro Hrončok from comment #8)
> The urgency is that we need to build python-pyside6. So, at least a
> temporary revert (if it enables the build) would help us and allow more time
> for a proper fix.

I've pushed glibc-2.41.9000-15.fc43 with a revert of malloc to the glibc-2.41.9000-6.fc43 version, and it's now building.

Comment 10 Florian Weimer 2025-05-30 07:17:54 UTC
(In reply to Łukasz Patron from comment #5)
> (In reply to Łukasz Patron from comment #4)
> > That stack trace looks like one I posted on sourceware bugzilla, which
> > started happening as of "malloc: Use _int_free_chunk for remainders".
> 
> More specifically, this exact part of the diff:
> 
> @@ -5087,7 +5084,7 @@ _int_realloc (mstate av, mchunkptr oldp,
> INTERNAL_SIZE_T oldsize,
>                  (av != &main_arena ? NON_MAIN_ARENA : 0));
>        /* Mark remainder as inuse so free() won't complain */
>        set_inuse_bit_at_offset (remainder, remainder_size);
> -      _int_free (av, remainder, 1);
> +      _int_free_chunk (av, remainder, chunksize (remainder), 1);
>      }
>  
>    check_inuse_chunk (av, newp);

Thanks. How have you verified that this change is indeed responsible? Merely reverting it in isolation doesn't work because _int_free no longer exists in the current code.

Comment 11 Łukasz Patron 2025-05-30 07:22:08 UTC
(In reply to Florian Weimer from comment #10)
> (In reply to Łukasz Patron from comment #5)
> > (In reply to Łukasz Patron from comment #4)
> > > That stack trace looks like one I posted on sourceware bugzilla, which
> > > started happening as of "malloc: Use _int_free_chunk for remainders".
> > 
> > More specifically, this exact part of the diff:
> > 
> > @@ -5087,7 +5084,7 @@ _int_realloc (mstate av, mchunkptr oldp,
> > INTERNAL_SIZE_T oldsize,
> >                  (av != &main_arena ? NON_MAIN_ARENA : 0));
> >        /* Mark remainder as inuse so free() won't complain */
> >        set_inuse_bit_at_offset (remainder, remainder_size);
> > -      _int_free (av, remainder, 1);
> > +      _int_free_chunk (av, remainder, chunksize (remainder), 1);
> >      }
> >  
> >    check_inuse_chunk (av, newp);
> 
> Thanks. How have you verified that this change is indeed responsible? Merely
> reverting it in isolation doesn't work because _int_free no longer exists in
> the current code.

You can see the list of reverts I've been using here: https://sourceware.org/bugzilla/show_bug.cgi?id=32996#c1.
It only worked when all 13 were present, if I removed the last one it'd start crashing again.

Comment 12 Miro Hrončok 2025-05-30 08:28:05 UTC
At least this works:

<mock-chroot> sh-5.2# rpm -q glibc python3-pyside6 python3
glibc-2.41.9000-15.fc43.i686
python3-pyside6-6.9.0-1.fc43.i686
python3-3.13.3-3.fc43.i686

<mock-chroot> sh-5.2# python3 /usr/lib/python3.13/site-packages/PySide6/support/generate_pyi.py --outpath . QtGui 
INFO:generate_pyi:Generated: QtGui.pyi


I'll try building python-pyisde6 entirely.

Next time, could you perhaps open a Pull Request so I can test it before you ship it?

Comment 13 Miro Hrončok 2025-05-30 09:46:53 UTC
> I'll try building python-pyisde6 entirely.

Builds in i686 mock.

(If you don't want to ship this, we can tag it into the Python 3.14 rebuild side tag and then tag it out.)

Comment 14 Carlos O'Donell 2025-05-30 13:17:20 UTC
Bodhi update is in progress, waiting on Fedora Rawhide gating:
https://bodhi.fedoraproject.org/updates/FEDORA-2025-545d9e4ef9

We'll keep an eye on it and continue to triage the issue.

Comment 15 Miro Hrončok 2025-05-30 20:08:09 UTC
https://src.fedoraproject.org/rpms/python-pyside6/pull-request/11 built fine on 4 architectures incl. i686 (s390x still waits for a builder).

Comment 16 DJ Delorie 2025-05-31 01:17:16 UTC
This turned out to be more subtle than expected...

QBindingStorage::QBindingStorage() uses xmm0 to hold a temporary zero.  It calls ___tls_get_addr@plt, which needs to expand the DTV, which calls malloc, which uses xmm0 when it's processing the unsorted list...

So Qt expects xmm0 to be preserved across calls, but ___tls_get_addr@plt does not preserve it.

So there's nothing wrong with the malloc patches, other than they change the timing of when the unsorted list needs to be processed.

Comment 17 Carlos O'Donell 2025-05-31 14:48:13 UTC
The caller is responsible for saving or restoring xmm0, but in this case the caller is the compiler.

The call in the constructor stores to "Q_CONSTINIT static thread_local QBindingStatus bindingStatus;" which is TLS, via "bindingStatus = &QT_PREPEND_NAMESPACE(bindingStatus);" so there isn't a call before or after that point.

e.g.
2289 QBindingStorage::QBindingStorage()
2290 {
2291     bindingStatus = &QT_PREPEND_NAMESPACE(bindingStatus);
2292     Q_ASSERT(bindingStatus);
2293 }

The store on line 2291 is expanded by the compiler into the ABI relevant TLS access sequence which might include calling ___tls_get_addr@plt and is the equivalent of a function call and needs to save and restore all locals that would otherwise not be saved or restored.

Is this a compiler bug?

Comment 18 Miro Hrončok 2025-06-08 16:52:49 UTC
we have done the rebuild of pyside, resetting the severity


Note You need to log in before you can comment on or make changes to this bug.