Description of problem: When running a compose for Fedora 31 updates, we quite often get a segfault recently. This has happened a few times before, but currently it is very common. Pungi issue: https://pagure.io/pungi/issue/1335 Version-Release number of selected component (if applicable): pungi-4.1.41-3.fc30.noarch python3-3.7.6-1.fc30.x86_64 libcomps-0.1.14-1.fc30.x86_64 libdnf-0.39.1-1.fc30.x86_64 How reproducible: On my laptop about 1 in 10 runs segfaults. On the bodhi compose machine is more like 9 out of 10 times. Steps to Reproduce: 1. wget https://kojipkgs.fedoraproject.org/compose/updates/Fedora-31-updates-20200126.3/work/ppc64le/pungi/Everything.ppc64le.comps.conf 2. sed -i 's@file:///mnt/koji@https://kojipkgs.fedoraproject.org@g' Everything.ppc64le.comps.conf 3. pungi-gather --config=Everything.ppc64le.comps.conf --greedy=build --arch=ppc64le 3. while pungi-gather --config=Everything.ppc64le.comps.conf --greedy=build --arch=ppc64le ; do true ; done Actual results: It sometimes segfaults. Expected results: It should not segfault. Additional info: The repos used in the config file will be deleted in a couple days. However there will be a new compose, which will have a different date in URL, and then one can then be used.
I managed to get a stack trace with gdb: (gdb) py-bt Traceback (most recent call first): Garbage-collecting File "/usr/lib/python3.7/site-packages/pungi/dnf_wrapper.py", line 130, in get_langpacks result.append({"name": name, "install": install}) File "/usr/bin/pungi-gather", line 127, in main gather_opts.langpacks = dnf_obj.comps_wrapper.get_langpacks() File "/usr/bin/pungi-gather", line 179, in <module> main(persistdir, cachedir) From that I understand that the crash happens during garbage collection. My guess is that some C library is doing something wrong. How can I debug this further?
Created attachment 1655998 [details] valgrind log I tried running valgrind on the program. It took about 4 hours and none of the three runs I did crashed. The logs were pretty much similar for all three runs.
> Actual results: > It sometimes segfaults. Can you please attach the gdb traceback when Python segfault? (gdb command "where".) To get a more complete traceback, it would help if you can install Python debug symbols: dnf debuginfo-install python3. -- You may try to reproduce the crash with the Python Development Mode: add -X dev to Python command line, or set PYTHONDEVMODE=1 env var. It enables a few runtime checks which might help to detect some bugs earlier. See: https://docs.python.org/dev/library/devmode.html It would be even better if you could try the Python Development Mode of python3.9. I'm not sure if you can test python3-debug, since it requires to rebuild C extensions in debug mode. Crashs during a garbage collection are the worst to debug, since it basically means that "something is corrupted" but it doesn't really say what. My notes on these kinds of bugs: https://pythondev.readthedocs.io/debug_tools.html#debug-crash-in-garbage-collection-visit-decref I added more assertions in Python 3.8 and Python 3.9 to help to debug these issues, but you would need a debug build of Python 3.8 or 3.9 to get them.
Created attachment 1656216 [details] Backtrace I attached a backtrace. I think I have all relevant debuginfo packages installed. I'll try the development mode and try Python 3.8 from Rawhide.
Enabling PYTHONDEVMODE with /usr/bin/python3 seems to remove the problem. So far 72 attempts have not crashed. I installed python3-debug package, and if I run the program with it I get segfault immediately. This traceback doesn't seem very helpful though. $ PYTHONDEVMODE=1 python3-debug /usr/bin/pungi-gather --config ~/temp/updates/Everything.ppc64le.comps.conf --greedy=build --arch=ppc64le >/dev/null /usr/lib/python3.7/site-packages/ordered_set.py:39: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working class OrderedSet(collections.MutableSet): Fatal Python error: Segmentation fault Current thread 0x00007f92c4f26680 (most recent call first): File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 1043 in create_module File "<frozen importlib._bootstrap>", line 583 in module_from_spec File "<frozen importlib._bootstrap>", line 670 in _load_unlocked File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 983 in _find_and_load File "<frozen importlib._bootstrap>", line 1006 in _gcd_import File "/usr/lib64/python3.7/importlib/__init__.py", line 127 in import_module File "/usr/lib64/python3.7/site-packages/libdnf/common_types.py", line 14 in swig_import_helper File "/usr/lib64/python3.7/site-packages/libdnf/common_types.py", line 17 in <module> File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 728 in exec_module File "<frozen importlib._bootstrap>", line 677 in _load_unlocked File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 983 in _find_and_load File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed File "<frozen importlib._bootstrap>", line 1035 in _handle_fromlist File "/usr/lib64/python3.7/site-packages/libdnf/__init__.py", line 3 in <module> File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 728 in exec_module File "<frozen importlib._bootstrap>", line 677 in _load_unlocked File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 983 in _find_and_load File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed File "<frozen importlib._bootstrap>", line 953 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 983 in _find_and_load File "/usr/lib/python3.7/site-packages/dnf/base.py", line 29 in <module> File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 728 in exec_module File "<frozen importlib._bootstrap>", line 677 in _load_unlocked File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 983 in _find_and_load File "/usr/lib/python3.7/site-packages/dnf/__init__.py", line 30 in <module> File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 728 in exec_module File "<frozen importlib._bootstrap>", line 677 in _load_unlocked File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 983 in _find_and_load File "/usr/lib/python3.7/site-packages/pungi/dnf_wrapper.py", line 20 in <module> File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 728 in exec_module File "<frozen importlib._bootstrap>", line 677 in _load_unlocked File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 983 in _find_and_load File "/usr/bin/pungi-gather", line 11 in <module> Segmentation fault (core dumped [obraz paměti uložen] I wasn't able to replicate the crash with Python 3.8 from Rawhide.
Oh, I guess python3-debug is not working because I don't have the extensions rebuilt.
Side note: With 3.8+, you don't need the extensions rebuilt with the debug build, if they are not linked to libpython. The dnf stack Python extension modules unfortunately are linked to libpython, but that is not required and should be fixable in their cmake scripts, see for example https://src.fedoraproject.org/rpms/libarcus/pull-request/8
You may also try to call gc.set_threshold(5) at the very beginning of your application. It should make the crash more likely. Another different approach: attempt to call "gc.disable()" at the very beginning of your application. I'm not sure if it helps to debug such issue.
I can only replicate the crash with program installed from the RPM package. If I try with git checkout, the crash goes away. What do I need to change in the extensions so that they are compatible with python3.7-debug? And how can I find out all the extensions to rebuild?
To make the extensions compatible with python3.7-debug, you need to rebuild them with python3.7-debug. Packages built regularly can be rebuilt be redefining %__python3 to /usr/bin/python3-debug, unfortunately most of the dnf-stack Python packages are built using cmake scripts that try to autodetect Python and I have no idea how to rebuild them with python3-debug :( To find out all the extensions you need to rebuild, install your Python RPM package to a minimal environment. All packages owning *.so files in /usr/lib64/python3.7/site-packages/ need to be rebuilt.
I rebuilt libdnf, rpm, libcomps and gpgme and can run the program now without immediate segfault on missing import. However I get this abort. Is this caused by incorrectly rebuilt packages? $ PYTHONDEVMODE=1 python3.7dm /usr/bin/pungi-gather --config=Everything.ppc64le.comps.conf --greedy=build --arch=ppc64le /usr/lib/python3.7/site-packages/ordered_set.py:39: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working class OrderedSet(collections.MutableSet): /usr/lib/python3.7/site-packages/pungi/arch_utils.py:158: ResourceWarning: unclosed file <_io.BufferedReader name='/proc/self/auxv'> data = open("/proc/self/auxv", "rb").read() ResourceWarning: Enable tracemalloc to get the object allocation traceback /usr/lib/python3.7/site-packages/pungi/arch_utils.py:281: ResourceWarning: unclosed file <_io.TextIOWrapper name='/proc/cpuinfo' mode='r' encoding='UTF-8'> break ResourceWarning: Enable tracemalloc to get the object allocation traceback /usr/lib/python3.7/site-packages/koji/__init__.py:48: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp python3.7dm: /builddir/build/BUILD/Python-3.7.6/Objects/typeobject.c:3670: excess_args: Assertion `PyTuple_Check(args)' failed. Fatal Python error: Aborted Current thread 0x00007fce65c63680 (most recent call first): File "/usr/lib64/python3.7/site-packages/libdnf/conf.py", line 1369 in pluginpath File "/usr/lib/python3.7/site-packages/dnf/conf/config.py", line 119 in _set_value File "/usr/lib/python3.7/site-packages/dnf/conf/config.py", line 212 in __init__ File "/usr/lib/python3.7/site-packages/pungi/dnf_wrapper.py", line 37 in __init__ File "/usr/bin/pungi-gather", line 81 in main File "/usr/bin/pungi-gather", line 179 in <module> Aborted (core dumped)
Created attachment 1656716 [details] Backtrace from python3.7dm I attached a full backtrace from the abort.
... #6 0x00007fca99a83566 in __GI___assert_fail (assertion=assertion@entry=0x7fca998fc23e "PyTuple_Check(args)", file=file@entry=0x7fca9990f418 "/builddir/build/BUILD/Python-3.7.6/Objects/typeobject.c", line=line@entry=3670, function=function@entry=0x7fca9996e220 <__PRETTY_FUNCTION__.16477> "excess_args") at assert.c:101 #7 0x00007fca9971c061 in excess_args (args=<optimized out>, kwds=<optimized out>) at /usr/src/debug/python3-3.7.6-1.fc30.x86_64/Objects/typeobject.c:3670 #8 0x00007fca9983d868 in object_new (type=0x555830410fc0, args=<optimized out>, kwds=<optimized out>) at /usr/src/debug/python3-3.7.6-1.fc30.x86_64/Objects/typeobject.c:3697 #9 0x00007fca9760a0a2 in SWIG_Python_NewShadowInstance (data=0x555830411730, swig_this=<SwigPyObject at remote 0x7fca98e77700>) at /usr/src/debug/libdnf-0.43.1-1.1.debug.fc30.x86_64/build-py3/bindings/python/CMakeFiles/_conf.dir/confPYTHON_wrap.cxx:2532 #10 0x00007fca9760a41a in SWIG_Python_NewPointerObj (self=0x0, ptr=0x555830d6ca80, type=0x7fca976b9760 <_swigt__p_libdnf__OptionStringList>, flags=0) at /usr/src/debug/libdnf-0.43.1-1.1.debug.fc30.x86_64/build-py3/bindings/python/CMakeFiles/_conf.dir/confPYTHON_wrap.cxx:2666 ... It looks like a bug in SWIG: it pass an object which is not a tuple to object_new() 'args' argument. * SWIG fix: https://github.com/swig/swig/commit/016518073537e2b88c8ac3f33f4caebd6bede3c6 * SWIG bug: https://github.com/swig/swig/issues/1321 The fix is part of SWIG version 4.0.0 (27 Apr 2019). What is your swig package version? Try: rpm -q swig. On Fedora 31, I get: swig-4.0.1-3.fc31.x86_64. Sidenode: did you consider to upgrade to Fedora 31?
$ repoquery --releasever=30 --repo={fedora,updates{,-testing}} --latest=1 swig swig-0:3.0.12-25.fc30.x86_64
I'll backport the fix.
I tried building swig 4.0.1 (from F31) for Fedora 30. Then I rebuilt libdnf using this version of swig. And it indeed does seem to fix the issue. After 35 iterations of the previously segfaulting program I have not seen a single crash.
FEDORA-2020-ba4b52e9ff has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2020-ba4b52e9ff
swig-3.0.12-26.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-ba4b52e9ff
FEDORA-2020-71be871020 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2020-71be871020
libdnf-0.43.1-3.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-71be871020
swig-3.0.12-26.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.
libdnf-0.43.1-3.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.