Bug 1795130 - swig makes pungi-gather segfault, fix available upstream
Summary: swig makes pungi-gather segfault, fix available upstream
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: swig
Version: 30
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jitka Plesnikova
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-27 08:08 UTC by Lubomír Sedlář
Modified: 2020-02-27 16:43 UTC (History)
17 users (show)

Fixed In Version: swig-3.0.12-26.fc30
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-13 01:36:34 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
valgrind log (5.15 KB, text/plain)
2020-01-28 11:56 UTC, Lubomír Sedlář
no flags Details
Backtrace (11.65 KB, text/plain)
2020-01-29 11:55 UTC, Lubomír Sedlář
no flags Details
Backtrace from python3.7dm (31.58 KB, text/plain)
2020-01-31 14:08 UTC, Lubomír Sedlář
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github swig swig issues 1321 0 None closed Wrong argument in call of tp_new in SWIG_Python_NewShadowInstance for Python3 2020-07-22 21:39:19 UTC

Description Lubomír Sedlář 2020-01-27 08:08:30 UTC
Description of problem:
When running a compose for Fedora 31 updates, we quite often get a segfault recently. This has happened a few times before, but currently it is very common.

Pungi issue: https://pagure.io/pungi/issue/1335

Version-Release number of selected component (if applicable):
pungi-4.1.41-3.fc30.noarch
python3-3.7.6-1.fc30.x86_64
libcomps-0.1.14-1.fc30.x86_64
libdnf-0.39.1-1.fc30.x86_64

How reproducible:
On my laptop about 1 in 10 runs segfaults. On the bodhi compose machine is more like 9 out of 10 times.

Steps to Reproduce:
1. wget https://kojipkgs.fedoraproject.org/compose/updates/Fedora-31-updates-20200126.3/work/ppc64le/pungi/Everything.ppc64le.comps.conf
2. sed -i 's@file:///mnt/koji@https://kojipkgs.fedoraproject.org@g' Everything.ppc64le.comps.conf
3. pungi-gather --config=Everything.ppc64le.comps.conf --greedy=build --arch=ppc64le
3. while pungi-gather --config=Everything.ppc64le.comps.conf --greedy=build --arch=ppc64le ; do true ; done


Actual results:
It sometimes segfaults. 

Expected results:
It should not segfault.

Additional info:
The repos used in the config file will be deleted in a couple days. However there will be a new compose, which will have a different date in URL, and then one can then be used.

Comment 1 Lubomír Sedlář 2020-01-27 08:13:00 UTC
I managed to get a stack trace with gdb:

(gdb) py-bt
Traceback (most recent call first):
  Garbage-collecting
  File "/usr/lib/python3.7/site-packages/pungi/dnf_wrapper.py", line 130, in get_langpacks
    result.append({"name": name, "install": install})
  File "/usr/bin/pungi-gather", line 127, in main
    gather_opts.langpacks = dnf_obj.comps_wrapper.get_langpacks()
  File "/usr/bin/pungi-gather", line 179, in <module>
    main(persistdir, cachedir)

From that I understand that the crash happens during garbage collection. My guess is that some C library is doing something wrong.

How can I debug this further?

Comment 2 Lubomír Sedlář 2020-01-28 11:56:29 UTC
Created attachment 1655998 [details]
valgrind log

I tried running valgrind on the program. It took about 4 hours and none of the three runs I did crashed. The logs were pretty much similar for all three runs.

Comment 3 Victor Stinner 2020-01-29 00:07:24 UTC
> Actual results:
> It sometimes segfaults.

Can you please attach the gdb traceback when Python segfault? (gdb command "where".)

To get a more complete traceback, it would help if you can install Python debug symbols: dnf debuginfo-install python3.

--

You may try to reproduce the crash with the Python Development Mode: add -X dev to Python command line, or set PYTHONDEVMODE=1 env var. It enables a few runtime checks which might help to detect some bugs earlier. See: https://docs.python.org/dev/library/devmode.html

It would be even better if you could try the Python Development Mode of python3.9.

I'm not sure if you can test python3-debug, since it requires to rebuild C extensions in debug mode.

Crashs during a garbage collection are the worst to debug, since it basically means that "something is corrupted" but it doesn't really say what. My notes on these kinds of bugs:
https://pythondev.readthedocs.io/debug_tools.html#debug-crash-in-garbage-collection-visit-decref

I added more assertions in Python 3.8 and Python 3.9 to help to debug these issues, but you would need a debug build of Python 3.8 or 3.9 to get them.

Comment 4 Lubomír Sedlář 2020-01-29 11:55:05 UTC
Created attachment 1656216 [details]
Backtrace

I attached a backtrace. I think I have all relevant debuginfo packages installed.

I'll try the development mode and try Python 3.8 from Rawhide.

Comment 5 Lubomír Sedlář 2020-01-29 12:38:50 UTC
Enabling PYTHONDEVMODE with /usr/bin/python3 seems to remove the problem. So far 72 attempts have not crashed.

I installed python3-debug package, and if I run the program with it I get segfault immediately. This traceback doesn't seem very helpful though.

$ PYTHONDEVMODE=1 python3-debug /usr/bin/pungi-gather --config ~/temp/updates/Everything.ppc64le.comps.conf --greedy=build --arch=ppc64le >/dev/null                                                                                           
/usr/lib/python3.7/site-packages/ordered_set.py:39: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working                        
  class OrderedSet(collections.MutableSet):                                                                                                                                                                                                    
Fatal Python error: Segmentation fault

Current thread 0x00007f92c4f26680 (most recent call first): 
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1043 in create_module
  File "<frozen importlib._bootstrap>", line 583 in module_from_spec
  File "<frozen importlib._bootstrap>", line 670 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006 in _gcd_import
  File "/usr/lib64/python3.7/importlib/__init__.py", line 127 in import_module
  File "/usr/lib64/python3.7/site-packages/libdnf/common_types.py", line 14 in swig_import_helper
  File "/usr/lib64/python3.7/site-packages/libdnf/common_types.py", line 17 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 728 in exec_module
  File "<frozen importlib._bootstrap>", line 677 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1035 in _handle_fromlist
  File "/usr/lib64/python3.7/site-packages/libdnf/__init__.py", line 3 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 728 in exec_module
  File "<frozen importlib._bootstrap>", line 677 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 953 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "/usr/lib/python3.7/site-packages/dnf/base.py", line 29 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 728 in exec_module
  File "<frozen importlib._bootstrap>", line 677 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "/usr/lib/python3.7/site-packages/dnf/__init__.py", line 30 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 728 in exec_module
  File "<frozen importlib._bootstrap>", line 677 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "/usr/lib/python3.7/site-packages/pungi/dnf_wrapper.py", line 20 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 728 in exec_module
  File "<frozen importlib._bootstrap>", line 677 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "/usr/bin/pungi-gather", line 11 in <module>
Segmentation fault (core dumped [obraz paměti uložen]

I wasn't able to replicate the crash with Python 3.8 from Rawhide.

Comment 6 Lubomír Sedlář 2020-01-29 12:41:10 UTC
Oh, I guess python3-debug is not working because I don't have the extensions rebuilt.

Comment 7 Miro Hrončok 2020-01-29 12:55:51 UTC
Side note: With 3.8+, you don't need the extensions rebuilt with the debug build, if they are not linked to libpython.

The dnf stack Python extension modules unfortunately are linked to libpython, but that is not required and should be fixable in their cmake scripts, see for example https://src.fedoraproject.org/rpms/libarcus/pull-request/8

Comment 8 Victor Stinner 2020-01-29 15:07:40 UTC
You may also try to call gc.set_threshold(5) at the very beginning of your application. It should make the crash more likely.

Another different approach: attempt to call "gc.disable()" at the very beginning of your application. I'm not sure if it helps to debug such issue.

Comment 9 Lubomír Sedlář 2020-01-30 12:31:56 UTC
I can only replicate the crash with program installed from the RPM package. If I try with git checkout, the crash goes away.
What do I need to change in the extensions so that they are compatible with python3.7-debug? And how can I find out all the extensions to rebuild?

Comment 10 Miro Hrončok 2020-01-30 14:01:00 UTC
To make the extensions compatible with python3.7-debug, you need to rebuild them with python3.7-debug.

Packages built regularly can be rebuilt be redefining %__python3 to /usr/bin/python3-debug, unfortunately most of the dnf-stack Python packages are built using cmake scripts that try to autodetect Python and I have no idea how to rebuild them with python3-debug :(

To find out all the extensions you need to rebuild, install your Python RPM package to a minimal environment. All packages owning *.so files in /usr/lib64/python3.7/site-packages/ need to be rebuilt.

Comment 11 Lubomír Sedlář 2020-01-31 11:26:40 UTC
I rebuilt libdnf, rpm, libcomps and gpgme and can run the program now without immediate segfault on missing import. However I get this abort. Is this caused by incorrectly rebuilt packages?

$ PYTHONDEVMODE=1 python3.7dm /usr/bin/pungi-gather --config=Everything.ppc64le.comps.conf --greedy=build --arch=ppc64le
/usr/lib/python3.7/site-packages/ordered_set.py:39: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
  class OrderedSet(collections.MutableSet):
/usr/lib/python3.7/site-packages/pungi/arch_utils.py:158: ResourceWarning: unclosed file <_io.BufferedReader name='/proc/self/auxv'>
  data = open("/proc/self/auxv", "rb").read()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/lib/python3.7/site-packages/pungi/arch_utils.py:281: ResourceWarning: unclosed file <_io.TextIOWrapper name='/proc/cpuinfo' mode='r' encoding='UTF-8'>
  break
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/lib/python3.7/site-packages/koji/__init__.py:48: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
python3.7dm: /builddir/build/BUILD/Python-3.7.6/Objects/typeobject.c:3670: excess_args: Assertion `PyTuple_Check(args)' failed.
Fatal Python error: Aborted

Current thread 0x00007fce65c63680 (most recent call first):
  File "/usr/lib64/python3.7/site-packages/libdnf/conf.py", line 1369 in pluginpath
  File "/usr/lib/python3.7/site-packages/dnf/conf/config.py", line 119 in _set_value
  File "/usr/lib/python3.7/site-packages/dnf/conf/config.py", line 212 in __init__
  File "/usr/lib/python3.7/site-packages/pungi/dnf_wrapper.py", line 37 in __init__
  File "/usr/bin/pungi-gather", line 81 in main
  File "/usr/bin/pungi-gather", line 179 in <module>
Aborted (core dumped)

Comment 12 Lubomír Sedlář 2020-01-31 14:08:46 UTC
Created attachment 1656716 [details]
Backtrace from python3.7dm

I attached a full backtrace from the abort.

Comment 13 Victor Stinner 2020-02-03 08:52:37 UTC
...
#6  0x00007fca99a83566 in __GI___assert_fail (assertion=assertion@entry=0x7fca998fc23e "PyTuple_Check(args)", file=file@entry=0x7fca9990f418 "/builddir/build/BUILD/Python-3.7.6/Objects/typeobject.c", line=line@entry=3670, 
    function=function@entry=0x7fca9996e220 <__PRETTY_FUNCTION__.16477> "excess_args") at assert.c:101
#7  0x00007fca9971c061 in excess_args (args=<optimized out>, kwds=<optimized out>) at /usr/src/debug/python3-3.7.6-1.fc30.x86_64/Objects/typeobject.c:3670
#8  0x00007fca9983d868 in object_new (type=0x555830410fc0, args=<optimized out>, kwds=<optimized out>) at /usr/src/debug/python3-3.7.6-1.fc30.x86_64/Objects/typeobject.c:3697
#9  0x00007fca9760a0a2 in SWIG_Python_NewShadowInstance (data=0x555830411730, swig_this=<SwigPyObject at remote 0x7fca98e77700>)
    at /usr/src/debug/libdnf-0.43.1-1.1.debug.fc30.x86_64/build-py3/bindings/python/CMakeFiles/_conf.dir/confPYTHON_wrap.cxx:2532
#10 0x00007fca9760a41a in SWIG_Python_NewPointerObj (self=0x0, ptr=0x555830d6ca80, type=0x7fca976b9760 <_swigt__p_libdnf__OptionStringList>, flags=0)
    at /usr/src/debug/libdnf-0.43.1-1.1.debug.fc30.x86_64/build-py3/bindings/python/CMakeFiles/_conf.dir/confPYTHON_wrap.cxx:2666
...

It looks like a bug in SWIG: it pass an object which is not a tuple to object_new() 'args' argument.

* SWIG fix: https://github.com/swig/swig/commit/016518073537e2b88c8ac3f33f4caebd6bede3c6
* SWIG bug: https://github.com/swig/swig/issues/1321

The fix is part of SWIG version 4.0.0 (27 Apr 2019).

What is your swig package version? Try: rpm -q swig. On Fedora 31, I get: swig-4.0.1-3.fc31.x86_64.

Sidenode: did you consider to upgrade to Fedora 31?

Comment 14 Miro Hrončok 2020-02-03 10:17:23 UTC
$ repoquery --releasever=30 --repo={fedora,updates{,-testing}} --latest=1 swig
swig-0:3.0.12-25.fc30.x86_64

Comment 15 Jitka Plesnikova 2020-02-03 13:07:46 UTC
I'll backport the fix.

Comment 16 Lubomír Sedlář 2020-02-03 14:07:01 UTC
I tried building swig 4.0.1 (from F31) for Fedora 30. Then I rebuilt libdnf using this version of swig. And it indeed does seem to fix the issue. After 35 iterations of the previously segfaulting program I have not seen a single crash.

Comment 17 Fedora Update System 2020-02-04 13:09:34 UTC
FEDORA-2020-ba4b52e9ff has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2020-ba4b52e9ff

Comment 18 Fedora Update System 2020-02-05 00:52:36 UTC
swig-3.0.12-26.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-ba4b52e9ff

Comment 19 Fedora Update System 2020-02-12 02:29:51 UTC
FEDORA-2020-71be871020 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2020-71be871020

Comment 20 Fedora Update System 2020-02-13 01:15:57 UTC
libdnf-0.43.1-3.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-71be871020

Comment 21 Fedora Update System 2020-02-13 01:36:34 UTC
swig-3.0.12-26.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.

Comment 22 Fedora Update System 2020-02-27 16:43:55 UTC
libdnf-0.43.1-3.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.