Bug 1900651 - lasso fails to build with Python 3.10: tests leads to core dump
Summary: lasso fails to build with Python 3.10: tests leads to core dump
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: lasso
Version: 35
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Miro Hrončok
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedFreezeException
: 1968823 1977583 (view as bug list)
Depends On:
Blocks: PYTHON3.10 F35BetaFreezeException F35FTBFS F35FailsToInstall 1968820 1987580 F36FTBFS F36FailsToInstall
TreeView+ depends on / blocked
 
Reported: 2020-11-23 13:30 UTC by Tomáš Hrnčiar
Modified: 2021-09-24 14:34 UTC (History)
10 users (show)

Fixed In Version: lasso-2.7.0-4.fc35
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-22 00:16:10 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Tomáš Hrnčiar 2020-11-23 13:30:43 UTC
lasso fails to build with Python 3.10.0a2.

../../../test-driver: line 107: 21075 Aborted                 (core dumped) "$@" > $log_file 2>&1
FAIL: profiles_tests.py
FAIL: binding_tests.py
../../../test-driver: line 107: 21081 Aborted                 (core dumped) "$@" > $log_file 2>&1

For the build logs, see:
https://copr-be.cloud.fedoraproject.org/results/@python/python3.10/fedora-rawhide-x86_64/01772939-lasso/

For all our attempts to build lasso with Python 3.10, see:
https://copr.fedorainfracloud.org/coprs/g/python/python3.10/package/lasso/

Testing and mass rebuild of packages is happening in copr. You can follow these instructions to test locally in mock if your package builds with Python 3.10:
https://copr.fedorainfracloud.org/coprs/g/python/python3.10/

Let us know here if you have any questions.

Python 3.10 will be included in Fedora 35. To make that update smoother, we're building Fedora packages with early pre-releases of Python 3.10.
A build failure prevents us from testing all dependent packages (transitive [Build]Requires), so if this package is required a lot, it's important for us to get it fixed soon.
We'd appreciate help from the people who know this package best, but if you don't want to work on this now, let us know so we can try to work around it on our side.

Comment 1 Ben Cotton 2021-02-09 15:27:13 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 34 development cycle.
Changing version to 34.

Comment 2 Miro Hrončok 2021-06-04 20:14:01 UTC
This is a mass-posted update. Sorry if it is not 100% accurate to this bugzilla.


The Python 3.10 rebuild is in progress in a Koji side tag. If you manage to fix the problem, please commit the fix in the rawhide branch, but don't build the package in regular rawhide.

You can either build the package in the side tag, with:

    $ fedpkg build --target=f35-python

Or you can the build and we will eventually build it for you.

Note that the rebuild is still in progress, so not all (build) dependencies of this package might be available right away.

Thanks.

See also https://fedoraproject.org/wiki/Changes/Python3.10

If you have general questions about the rebuild, please use this mailing list thread: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/G47SGOYIQLRDTWGOSLSWERZSSHXDEDH5/

Comment 3 Miro Hrončok 2021-06-07 22:58:42 UTC
The f35-python side tag has been merged to Rawhide. From now on, build as you would normally build.

Comment 4 Miro Hrončok 2021-06-08 11:15:47 UTC
*** Bug 1968823 has been marked as a duplicate of this bug. ***

Comment 5 Miro Hrončok 2021-06-15 20:23:20 UTC
Hello,

This is the first reminder (step 3 from https://docs.fedoraproject.org/en-US/fesco/Fails_to_build_from_source_Fails_to_install/#_package_removal_for_long_standing_ftbfs_and_fti_bugs).

If you know about this problem and are planning on fixing it, please acknowledge so by setting the bug status to ASSIGNED. If you don't have time to maintain this package, consider orphaning it, so maintainers of dependent packages realize the problem.

Comment 6 Simo Sorce 2021-06-15 20:54:24 UTC
I o not have a lot of time, but looking at this I see a full core dump happening.
This is not a python stack trace so there isn't much to go on.
Is it possible to have core files?

Comment 7 Jakub Hrozek 2021-06-16 13:14:33 UTC
(In reply to Simo Sorce from comment #6)
> I o not have a lot of time, but looking at this I see a full core dump
> happening.
> This is not a python stack trace so there isn't much to go on.
> Is it possible to have core files?

I can take a look at the bug, I was able to reproduce locally and worked on lasso anyway this week.

Comment 8 Simo Sorce 2021-06-16 16:30:37 UTC
I did reproduce it locally in mock, that was not hard, but figuring out what breaks is a different thing.
I wrote to the lasso mailing list to see if smarter people than me have a clue.

Comment 9 Jakub Hrozek 2021-07-05 09:01:08 UTC
It turns out my not smart enough to fix the issue by myself either. The backtrace looks like this:

---
#0  __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:45
#1  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at pthread_kill.c:62
#2  0x00007ffff7ac5986 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff7aaf806 in __GI_abort () at abort.c:79
#4  0x00007ffff7b07087 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7c375ed "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#5  0x00007ffff7b1d29c in malloc_printerr (str=str@entry=0x7ffff7c35213 "free(): invalid pointer") at malloc.c:5643
#6  0x00007ffff7b1e664 in _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:4419
#7  0x00007ffff7b21f68 in __GI___libc_free (mem=<optimized out>) at malloc.c:3339
#8  0x00007ffff7d798f7 in _PyMem_RawFree (ptr=<optimized out>, ctx=<optimized out>) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Objects/obmalloc.c:125
#9  PyMem_RawFree (ptr=<optimized out>) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Objects/obmalloc.c:595
#10 _PyObject_Free (ctx=<optimized out>, p=<optimized out>) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Objects/obmalloc.c:2238
#11 _PyObject_Free (ctx=<optimized out>, p=<optimized out>) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Objects/obmalloc.c:2229
#12 0x00007ffff7d83042 in _Py_Dealloc (op=<optimized out>) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Objects/object.c:2288
#13 _Py_DECREF (op=<optimized out>) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Include/object.h:502
#14 _Py_XDECREF (op=<optimized out>) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Include/object.h:569
#15 tupledealloc (op=0x7ffff7924930) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Objects/tupleobject.c:276
#16 0x00007ffff7d962b9 in _Py_XDECREF (op=<optimized out>) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Include/object.h:569
#17 code_dealloc (co=0x7ffff79249d0) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Objects/codeobject.c:658
#18 0x00007ffff7d936cf in _Py_Dealloc (op=<optimized out>) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Objects/object.c:2288
#19 _Py_DECREF (op=<optimized out>) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Include/object.h:502
#20 func_clear (op=0x7ffff792dab0) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Objects/funcobject.c:612
#21 0x00007ffff7d9351e in func_dealloc (op=0x7ffff792dab0) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Objects/funcobject.c:634
#22 0x00007ffff7d83997 in _Py_Dealloc (op=<optimized out>) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Objects/object.c:2288
#23 _Py_DECREF (op=<optimized out>) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Include/object.h:502
#24 _Py_XDECREF (op=<optimized out>) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Include/object.h:569
#25 free_keys_object (keys=0x7ffff791c9b0) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Objects/dictobject.c:628
#26 0x00007ffff7e2483a in type_clear (type=0x55555559b990) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Objects/typeobject.c:4295
#27 0x00007ffff7d7c6d4 in delete_garbage (old=0x55555555b670, collectable=0x7fffffffe550, gcstate=0x55555555b628, tstate=0x555555577100) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Modules/gcmodule.c:1017
#28 gc_collect_main (tstate=0x555555577100, generation=2, n_collected=0x0, n_uncollectable=0x0, nofail=1) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Modules/gcmodule.c:1300
#29 0x00007ffff7e3135c in _PyGC_CollectNoFail (tstate=tstate@entry=0x555555577100) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Modules/gcmodule.c:2123
#30 0x00007ffff7e33de9 in interpreter_clear (interp=0x55555555b3c0, tstate=0x555555577100) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Python/pystate.c:326
#31 0x00007ffff7e328ea in _PyInterpreterState_Clear (tstate=0x555555577100) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Python/pystate.c:358
#32 finalize_interp_clear (tstate=0x555555577100) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Python/pylifecycle.c:1634
#33 0x00007ffff7e30557 in Py_FinalizeEx () at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Python/pylifecycle.c:1812
#34 0x00007ffff7e23254 in Py_RunMain () at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Modules/main.c:668
#35 0x00007ffff7df1e6d in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Modules/main.c:720
#36 0x00007ffff7ab07e0 in __libc_start_call_main (main=main@entry=0x555555555160 <main>, argc=argc@entry=2, argv=argv@entry=0x7fffffffe958) at ../sysdeps/nptl/libc_start_call_main.h:58
#37 0x00007ffff7ab088c in __libc_start_main_impl (main=0x555555555160 <main>, argc=2, argv=0x7fffffffe958, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe948) at ../csu/libc-start.c:410
---

I can reproduce the issue with a test that just imports lasso and quits. But I can't figure out which object that the GC is freeing is causing issues.

Is there any way to get more debug information from the python interpreter to see what is it that it's freeing?

Frame 25 says that it's a dict and its keys:
(gdb) frame 25
#25 free_keys_object (keys=0x7ffff791c9b0) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Objects/dictobject.c:628
628	        Py_XDECREF(entries[i].me_value);
(gdb) list
623	{
624	    PyDictKeyEntry *entries = DK_ENTRIES(keys);
625	    Py_ssize_t i, n;
626	    for (i = 0, n = keys->dk_nentries; i < n; i++) {
627	        Py_XDECREF(entries[i].me_key);
628	        Py_XDECREF(entries[i].me_value);
629	    }
630	    struct _Py_dict_state *state = get_dict_state();
631	#ifdef Py_DEBUG
632	    // free_keys_object() must not be called after _PyDict_Fini()

But of course all variables are optimized out..

valgrind is not very useful either:
==32685== Invalid free() / delete / delete[] / realloc()
==32685==    at 0x48430E4: free (vg_replace_malloc.c:755)
==32685==    by 0x49518F6: UnknownInlinedFun (obmalloc.c:127)
==32685==    by 0x49518F6: UnknownInlinedFun (obmalloc.c:595)
==32685==    by 0x49518F6: UnknownInlinedFun (obmalloc.c:2238)
==32685==    by 0x49518F6: _PyObject_Free (obmalloc.c:2229)
==32685==    by 0x495B8AC: UnknownInlinedFun (object.c:2288)
==32685==    by 0x495B8AC: UnknownInlinedFun (object.h:502)
==32685==    by 0x495B8AC: UnknownInlinedFun (object.h:569)
==32685==    by 0x495B8AC: dict_dealloc (dictobject.c:2068)
==32685==    by 0x497695C: UnknownInlinedFun (object.c:2288)
==32685==    by 0x497695C: UnknownInlinedFun (object.h:502)
==32685==    by 0x497695C: subtype_dealloc (typeobject.c:1434)
==32685==    by 0x495B996: UnknownInlinedFun (object.c:2288)
==32685==    by 0x495B996: UnknownInlinedFun (object.h:502)
==32685==    by 0x495B996: UnknownInlinedFun (object.h:569)
==32685==    by 0x495B996: free_keys_object (dictobject.c:628)
==32685==    by 0x49FC729: dict_tp_clear (dictobject.c:3265)
==32685==    by 0x49546D3: UnknownInlinedFun (gcmodule.c:1017)
==32685==    by 0x49546D3: gc_collect_main (gcmodule.c:1300)
==32685==    by 0x4A0935B: _PyGC_CollectNoFail.isra.0 (gcmodule.c:2123)
==32685==    by 0x4A0BDE8: interpreter_clear (pystate.c:326)
==32685==    by 0x4A0A8E9: UnknownInlinedFun (pystate.c:358)
==32685==    by 0x4A0A8E9: finalize_interp_clear (pylifecycle.c:1634)
==32685==    by 0x4A08556: Py_FinalizeEx (pylifecycle.c:1812)
==32685==    by 0x4A085CB: Py_Exit (pylifecycle.c:2827)
==32685==  Address 0x4b8f640 is 0 bytes inside data symbol "_Py_FalseStruct"

Comment 10 Miro Hrončok 2021-07-05 09:07:41 UTC
You might get more information with the debug build. Amongst other things, it is compiled with -O0.

Install python3-debug (and its debuginfo/debugsource package) on rawhide and try running gdb with python3-debug instead of python3.

Comment 11 Simo Sorce 2021-07-11 20:43:48 UTC
*** Bug 1977583 has been marked as a duplicate of this bug. ***

Comment 12 Miro Hrončok 2021-07-26 09:07:25 UTC
Any luck with python3-debug? How can we Python Miantianers) help?

Comment 13 Ben Cotton 2021-08-10 12:49:25 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 35 development cycle.
Changing version to 35.

Comment 14 Jakub Hrozek 2021-08-16 14:02:08 UTC
(In reply to Miro Hrončok from comment #12)
> Any luck with python3-debug? How can we Python Miantianers) help?

(Back from a long vacation).

I'm afraid I'm stuck. The python3-debug backtrace was more useful in the sense that I could see the keys and variables, but I still have no idea where they came from or what's the issue.

So, if anyone could provide a fresh set of eyes, I'd be grateful.

Comment 15 Miro Hrončok 2021-08-16 14:08:45 UTC
Could you please share the backtrace?

Comment 16 Victor Stinner 2021-08-31 10:23:01 UTC
The crash occurs at Python exit (Py_FinalizeEx) when destroying a code object (code_dealloc): _Py_XDECREF() on an item of a tuple does trigger a crash. The callback looks the same than this PySide bug:
https://bugreports.qt.io/browse/PYSIDE-1436?focusedCommentId=580046&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-580046

In PySide, the bug is a refcounting error in one of PySide hand-crafted code snippets.

Comment 17 Christian Tismer-Sperling 2021-08-31 12:05:23 UTC
Hi Jakub,

I just solved a similar problem with PySide6.
My approach is to use a debug CPython (which you did, I think)
and modify the lines where the deallocators are called, like

#15 tupledealloc (op=0x7ffff7924930) at /usr/src/debug/python3.10-3.10.0~b3-1.fc35.x86_64/Objects/tupleobject.c:276

I would try to get as much info as possible before the deallocations are performed.
Most probably it is the tuple dealloc, but not sure.
You should carefully fprintf the type of the deallocatee, the size, or even
try to print the complete tuple. This way, you might be able to guess where the
tuple comes from.

In my case it was easier because the type in question was a boolean, and the involved
test code was easy to find after knowing this.

Now you get a python which is pretty useless to build with, since it is throwing all
that debugging text at you. I am helping myself by making the debugging output depend
on a static environment variable that I only set when I want to debug.

On printing things in a debug CPython, I use the following helpers:
(probably you know these or even other ways)
Printing a type is simple:

    fprintf(stderr, "%s:%d %s\n", __func__, __LINE__, type->tp_name);

Printing an arbitrary object seems complicated but is easy, too:

    fprintf(stderr, "%s:%d %s\n", __func__, __LINE__, PyUnicode_AsUTF8(PyObject_Repr(obj)));

The latter would probably print the deallocated tuple in question, and that
will hopefully give you an idea what the object in question is.

Let me know if you can proceed this way a bit.

Cheers -- Chris

Comment 18 Victor Stinner 2021-09-01 14:28:49 UTC
I just modified the future Python 3.11 to fail with a fatal error when Python tries to delete True or False singleton, which is a bug in a C extension: https://bugs.python.org/issue45061

By the way, I also modified Python 3.10 to list imported third-party C extensions on a Python fatal error: https://bugs.python.org/issue42923

Comment 19 Miro Hrončok 2021-09-09 18:21:26 UTC
Jakub, is there anything else we can do to help you debug this?

Comment 20 Simo Sorce 2021-09-20 19:52:33 UTC
I do not have any better info, I tried months ago, but failed to understand what was breaking here.
Do we have any CPython ninja that would be able to help here?

Comment 21 Miro Hrončok 2021-09-21 09:08:17 UTC
With https://github.com/python/cpython/pull/28089 backported I indeed get:

Fatal Python error: bool_dealloc: deallocating True or False likely caused by a refcount bug in a C extension

Comment 22 Miro Hrončok 2021-09-21 09:13:05 UTC
This patch makes it pass:

diff --git a/bindings/python/lang.py b/bindings/python/lang.py
index bd7ef70..644de23 100644
--- a/bindings/python/lang.py
+++ b/bindings/python/lang.py
@@ -739,7 +739,6 @@ register_constants(PyObject *d)
             else:
                 print_('E: unknown constant type: %r' % c[0], file=sys.stderr)
             print_('    PyDict_SetItemString(d, "%s", obj);' % c[1][6:], file=fd)
-            print_('    Py_DECREF(obj);', file=fd)
         print_('}', file=fd)
         print_('', file=fd)
 


Would you like me to push it?

Comment 24 Jakub Hrozek 2021-09-21 11:38:16 UTC
(In reply to Miro Hrončok from comment #22)
> This patch makes it pass:
> 
> diff --git a/bindings/python/lang.py b/bindings/python/lang.py
> index bd7ef70..644de23 100644
> --- a/bindings/python/lang.py
> +++ b/bindings/python/lang.py
> @@ -739,7 +739,6 @@ register_constants(PyObject *d)
>              else:
>                  print_('E: unknown constant type: %r' % c[0],
> file=sys.stderr)
>              print_('    PyDict_SetItemString(d, "%s", obj);' % c[1][6:],
> file=fd)
> -            print_('    Py_DECREF(obj);', file=fd)
>          print_('}', file=fd)
>          print_('', file=fd)
>  
> 
> 
> Would you like me to push it?

Yes please and thanks for looking into this.

Comment 25 Miro Hrončok 2021-09-21 12:58:05 UTC
Ok, will do. Could you submit that upstream on Fedora's behalf please?

Comment 26 Fedora Update System 2021-09-21 16:03:02 UTC
FEDORA-2021-c3b25b1592 has been submitted as an update to Fedora 35. https://bodhi.fedoraproject.org/updates/FEDORA-2021-c3b25b1592

Comment 27 Fedora Update System 2021-09-21 18:52:56 UTC
FEDORA-2021-c3b25b1592 has been pushed to the Fedora 35 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-c3b25b1592`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-c3b25b1592

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 28 Adam Williamson 2021-09-21 22:35:01 UTC
+4 in https://pagure.io/fedora-qa/blocker-review/issue/468 , marking accepted.

Comment 29 Fedora Update System 2021-09-22 00:16:10 UTC
FEDORA-2021-c3b25b1592 has been pushed to the Fedora 35 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 30 Victor Stinner 2021-09-22 10:01:18 UTC
Miro Hrončok: "This patch makes it pass: (...)

Oh wow, nice finding! I'm happy that my code to detect refcount bugs was helpful to debug this tricky bug.

Comment 31 Jakub Hrozek 2021-09-24 14:34:50 UTC
Patch sent upstream.


Note You need to log in before you can comment on or make changes to this bug.