librepo fails to build with Python 3.9.0a2. ... 2: test_download_packages_with_expectedsize (tests.test_yum_package_downloading.TestCaseYumPackagesDownloading) ... ok 2: test_download_packages_with_fastestmirror_enabled_1 (tests.test_yum_package_downloading.TestCaseYumPackagesDownloading) ... ok 2: test_download_packages_with_fastestmirror_enabled_2 (tests.test_yum_package_downloading.TestCaseYumPackagesDownloading) ... /builddir/build/BUILD/librepo-1.11.1/build-py3/tests/python/tests/run_nosetests.sh: line 1: 778 Segmentation fault (core dumped) LD_LIBRARY_PATH="/builddir/build/BUILD/librepo-1.11.1/build-py3/librepo/:" PYTHONPATH="/builddir/build/BUILD/librepo-1.11.1/build-py3/librepo/python/python3" /usr/bin/python3 -m nose -s -v "/builddir/build/BUILD/librepo-1.11.1/tests/python/tests/" (build hangs here) For the build logs, see: https://copr-be.cloud.fedoraproject.org/results/@python/python3.9/fedora-rawhide-x86_64/01141146-librepo/ For all our attempts to build librepo with Python 3.9, see: https://copr.fedorainfracloud.org/coprs/g/python/python3.9/package/librepo/ Testing and mass rebuild of packages is happening in copr. You can follow these instructions to test locally in mock if your package builds with Python 3.9: https://copr.fedorainfracloud.org/coprs/g/python/python3.9/ Let us know here if you have any questions. Python 3.9 will be included in Fedora 33, but the initial bootstrapping has already started. A build failure this early in the bootstrap sequence blocks us very much.
Hi Miro, I'm looking into this and I kind of suspect the cause may be potentially some mess in python versions against which the python packages being used during the tests were built. Either that or possibly a python bug. I've installed a systemd-nspawn container from the official rawhide repositories, added the copr repository with python 3.9 from a link above and installed python 3.9. I've reproduced the issue and got a full traceback (it is very long, btw., I've had over 100 frames one time and 246 frames another time). This is the important part: #0 get_small_int (ival=2) at /usr/src/debug/python3-3.9.0~a2-1.fc32.x86_64/Objects/longobject.c:47 #1 PyLong_FromLong (ival=2) at /usr/src/debug/python3-3.9.0~a2-1.fc32.x86_64/Objects/longobject.c:313 #2 0x00007fffe9718878 in fastestmirror_callback (data=0x7fffe81866d0, stage=LR_FMSTAGE_DETECTION, ptr=0x7fffffff9628) at /home/lu/dev/librepo/librepo/python/handle-py.c:155 So PyLong_FromLong() is called with ival=2, which leaves little room for librepo to be at fault, except for memory corruption. For small numbers PyLong_FromLong() calls get_small_int(), which pulls the return value from some cache for small numbers. The code is here: https://github.com/python/cpython/blob/d8efc1495194228c3a4cd472200275d6491d8e2d/Objects/longobject.c#L47 On this line, tstate is NULL (and it being dereferenced causes the segfault). Googling around, it seems one possibility for tstate being NULL is when a module tries to use a python library that is different than the one the interpreter uses. That's where I'm at now, trying to figure out what's built against what might be quite tedious, I'm not sure how to proceed yet. Is there a way to make sure different python library versions aren't in play here, e.g. by ensuring all the packages really are built against python 3.9? This setup with an ad-hoc repository lends itself to be susceptible to this kind of an issue.
The buildroot was: Packages built from our copr and hence 99% sure Python 3.9 based: file file-libs gdb-minimal gpgme gpgme-devel libbrotli libpwquality libselinux libselinux-devel libxml2 libxml2-devel python-pip-wheel python-setuptools-wheel python3 python3-babel python3-click python3-devel python3-docutils python3-flask python3-gpg python3-chardet python3-idna python3-imagesize python3-itsdangerous python3-jinja2 python3-libs python3-markupsafe python3-nose python3-packaging python3-pygments python3-pyparsing python3-pysocks python3-pytz python3-pyxattr python3-requests python3-rpm-generators python3-setuptools python3-six python3-snowballstemmer python3-sphinx python3-sphinxcontrib-applehelp python3-sphinxcontrib-devhelp python3-sphinxcontrib-htmlhelp python3-sphinxcontrib-jsmath python3-sphinxcontrib-qthelp python3-sphinxcontrib-serializinghtml python3-sphinx-theme-alabaster python3-urllib3 python3-werkzeug rpm rpm-build rpm-build-libs rpm-libs Packages built from Koji and hence 100% sure Python 3.8 based (or non-Python or Python version agnostic): adobe-mappings-cmap adobe-mappings-cmap-deprecated adobe-mappings-pdf alternatives annobin atk audit-libs avahi-libs basesystem bash binutils binutils-gold bzip2 bzip2-libs ca-certificates cairo cairo-gobject cmake cmake-data cmake-filesystem cmake-rpm-macros coreutils coreutils-common cpio cpp cracklib crypto-policies cups-libs curl cyrus-sasl-lib dbus-libs diffutils doxygen dwz efi-srpm-macros elfutils elfutils-default-yama-scope elfutils-libelf elfutils-libs emacs-filesystem expat fedora-gpg-keys fedora-release fedora-release-common fedora-repos fedora-repos-rawhide filesystem findutils fontconfig fontpackages-filesystem fpc-srpm-macros freetype fribidi gawk gc gcc gd gdbm-libs gdk-pixbuf2 gdk-pixbuf2-modules ghc-srpm-macros glibc glibc-common glibc-devel glibc-headers glibc-minimal-langpack glib2 glib2-devel gmp gnat-srpm-macros gnupg2 gnutls google-droid-sans-fonts go-srpm-macros graphite2 graphviz grep gtk-update-icon-cache gtk2 gts guile22 gzip harfbuzz hicolor-icon-theme check check-devel info isl jasper-libs jbigkit-libs jbig2dec-libs jsoncpp kernel-headers keyutils-libs krb5-libs lasi lcms2 libacl libarchive libassuan libassuan-devel libatomic_ops libattr libattr-devel libblkid libblkid-devel libcap libcap-ng libcom_err libcroco libcurl libcurl-devel libdatrie libdb libdb-utils libfdisk libffi libffi-devel libfontenc libgcc libgcrypt libgomp libgpg-error libgpg-error-devel libgs libICE libidn libidn2 libijs libimagequant libjpeg-turbo libksba libmcpp libmetalink libmount libmount-devel libmpc libnghttp2 libnsl2 libpaper libpkgconf libpng libpsl librsvg2 libsemanage libsepol libsepol-devel libsigsegv libSM libsmartcols libssh libssh-config libstdc++ libtasn1 libthai libtiff libtirpc libtool-ltdl libunistring libusbx libutempter libuuid libuv libverto libwebp libXau libXaw libxcb libXcomposite libxcrypt libxcrypt-devel libXcursor libXdamage libXext libXfixes libXft libXi libXinerama libXmu libXpm libXrandr libXrender libXt libXxf86vm libX11 libX11-common libzstd libzstd-devel lua-libs lz4-libs make mcpp mpfr ncurses ncurses-base ncurses-libs netpbm nettle nim-srpm-macros npth ocaml-srpm-macros openblas-srpm-macros openjpeg2 openldap openssl-devel openssl-libs pam pango patch pcre pcre-cpp pcre-devel pcre-utf16 pcre-utf32 pcre2 pcre2-devel pcre2-utf16 pcre2-utf32 perl-Carp perl-constant perl-Errno perl-Exporter perl-File-Path perl-interpreter perl-IO perl-libs perl-macros perl-parent perl-PathTools perl-Scalar-List-Utils perl-Socket perl-srpm-macros perl-Text-Tabs+Wrap perl-threads perl-threads-shared perl-Unicode-Normalize pixman pkgconf pkgconf-m4 pkgconf-pkg-config popt publicsuffix-list-dafsa python-rpm-macros python-srpm-macros python3-rpm-macros p11-kit p11-kit-trust qt5-srpm-macros readline redhat-rpm-config rhash rust-srpm-macros sed setup shadow-utils shared-mime-info sqlite-libs subunit subunit-devel systemd-libs tar tzdata unzip urw-base35-bookman-fonts urw-base35-c059-fonts urw-base35-d050000l-fonts urw-base35-fonts urw-base35-fonts-common urw-base35-gothic-fonts urw-base35-nimbus-mono-ps-fonts urw-base35-nimbus-roman-fonts urw-base35-nimbus-sans-fonts urw-base35-p052-fonts urw-base35-standard-symbols-ps-fonts urw-base35-z003-fonts util-linux which xapian-core-libs xorg-x11-fonts-ISO8859-1-100dpi xorg-x11-font-utils xorg-x11-server-utils xz xz-devel xz-libs zchunk-devel zchunk-libs zip zlib zlib-devel zstd
Victor, this looks like something you've worked on. Do you have any insight to share re. tstate & the small int cache?
Miro, thanks for the list, but apart from going through it and making sure there aren't any obvious python packages built with 3.8 and used in the tests, I'm still not sure what I can do. And since it seems librepo is most likely not the culprit and I don't have an effective way to debug this, I'm reluctant to spend more time on it. Also, do you have a way I could list the packages based on the repo they come from on my test system? It seems non-trivial to do that... I'm kind of thinking that making sure python 3.8 isn't a part of the build chain for any packages might be the easiest way to verify the issue, but I have little clue about the build process, so I might be totally off here. Petr, thanks for the link and I'm adding a needinfo on Victor, hoping he can help shed some light on this.
(In reply to Lukáš Hrázký from comment #4) > Miro, thanks for the list, but apart from going through it and making sure > there aren't any obvious python packages built with 3.8 and used in the > tests, I'm still not sure what I can do. I just posted it here for completeness. Here's the thing: Any "normal" Python package would require the correct python(abi) or libpython version - and hence it would bail the build already when resolving the dependencies. However, if an extension module is build to a nonstandard location, it is possible that a 3.8 built module is present in the buildroot. Apart from filenames, nothing gives us a hint about this. I will try to reproduce this in mock and search for 3.8 filenames in the buildroot and report back. > Also, do you have a way I could list the packages based on the repo they > come from on my test system? It seems non-trivial to do that... You can grep the output of `dnf list`, third column is the repo. > I'm kind of thinking that making sure python 3.8 isn't a part of the build > chain for any packages might be the easiest way to verify the issue, but I > have little clue about the build process, so I might be totally off here. Python is often present in the buildroot of non-Python packages as well, so this would be extremely hard and produce a lot of false positives :(
I have only found the following outliers: /usr/share/crypto-policies/python/**/*.cpython-38*.pyc /usr/share/gcc-9/python/libstdcxx/**/*.cpython-38*.pyc /usr/share/gdb/auto-load/usr/lib64/__pycache__/*.cpython-38*.pyc /usr/share/glib-2.0/codegen/__pycache__/*.cpython-38*.pyc /usr/share/glib-2.0/gdb/__pycache__/*.cpython-38*.pyc All just bytecode cache, should not harm anything during the build, certainly not segfault librepo.
(In reply to Miro Hrončok from comment #5) > Here's the thing: Any "normal" Python package would require the correct > python(abi) or libpython version - and hence it would bail the build already > when resolving the dependencies. Yeah, I know this should ideally be handled in RPM dependencies... > You can grep the output of `dnf list`, third column is the repo. Right, I tried to do it with `dnf repoquery`... So I've just verified all python* packages on my test setup are from the python 3.9 repo and also all the files (that are in the standard location) are under paths containing the 3.9 version (i.e. /usr/lib/python3.9 and /usr/lib64/python3.9), so those should obviously be the correct version. So unless there's something very wrong with the built packages (which I'd assume would manifest in other places), maybe this is a red herring and the null pointer is caused by something else. It would be a pain to debug and trying to minimize the reproducer also might not be much fun, I'll wait for Victor to chime in before attempting that.
It seems like librepo doesn't use the Python C API properly. librepo must hold the GIL to use this API, whereas it seems like it's not the case. Commented code: static void fastestmirror_callback(void *data, LrFastestMirrorStages stage, void *ptr) { ... if (!ptr) { pydata = Py_None; } else { switch (stage) { case LR_FMSTAGE_CACHELOADING: case LR_FMSTAGE_CACHELOADINGSTATUS: case LR_FMSTAGE_STATUS: pydata = PyStringOrNone_FromString((char *) ptr); break; case LR_FMSTAGE_DETECTION: pydata = PyLong_FromLong(*((long *) ptr)); break; default: pydata = Py_None; } } /* PyEval_RestoreThread = take the GIL */ EndAllowThreads(self->state); result = PyObject_CallFunction(self->fastestmirror_cb, "(OlO)", user_data, (long) stage, pydata); Py_XDECREF(result); /* PyEval_SaveThread() = release the GIL */ BeginAllowThreads(self->state); if (pydata != Py_None) Py_XDECREF(pydata); return; } https://github.com/rpm-software-management/librepo/blob/5ade10012d40fad27a7aa1085091c2c5fdd099ff/librepo/python/handle-py.c#L155 The GIL is only taken to call PyObject_CallFunction() and Py_XDECREF(result). For example, PyLong_FromLong() call at line 155 is done without holding the GIL. To me, it's a bug in librepo. The behavior changed in Python 3.9: get_small_int() now requires to get the Python thread state. -- FYI I'm the one who modified get_small_int(): https://bugs.python.org/issue38858 For the rationale, see my article: https://vstinner.github.io/cpython-pass-tstate.html
Thanks for the information, Victor. I've gone through the code and tried to make sure no Python API is called without holding the GIL. The tests are passing for me now. PR: https://github.com/rpm-software-management/librepo/pull/179
> Thanks for the information, Victor. I've gone through the code and tried to make sure no Python API is called without holding the GIL. The tests are passing for me now. You can use a debug build of Python, like python3-debug, to check if the GIL is held when using the Python C API.
(In reply to Victor Stinner from comment #10) > You can use a debug build of Python, like python3-debug, to check if the GIL > is held when using the Python C API. That's handy, but I'm not sure I'm using it correctly. I've installed the package, looked around briefly and haven't found much info on how to use it. So I used the `/usr/bin/python3-debug` binary to run the tests. It passed with my patch and I still got a segfault without my patch...
Can we please get a backport of https://github.com/rpm-software-management/librepo/pull/179 ?
I did the backport to rawhide. :)
Thank you, awesome!