Bug 2371769 - F43FailsToInstall: python3-pcp
Summary: F43FailsToInstall: python3-pcp
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: pcp
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jan Kurik
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 2356166
Blocks: PYTHON3.14 F43FailsToInstall
TreeView+ depends on / blocked
 
Reported: 2025-06-11 08:05 UTC by Fedora Fails To Install
Modified: 2025-07-01 14:16 UTC (History)
9 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2025-06-26 15:35:47 UTC
Type: ---
Embargoed:
jkurik: mirror+


Attachments (Terms of Use)
Back trace (7.03 KB, text/plain)
2025-06-23 04:55 UTC, Jan Kurik
no flags Details
Back trace when using python-debug (7.47 KB, text/plain)
2025-06-25 04:16 UTC, Jan Kurik
no flags Details

Description Fedora Fails To Install 2025-06-11 08:05:39 UTC
Hello,

Please note that this comment was generated automatically by https://pagure.io/releng/blob/main/f/scripts/ftbfs-fti/follow-policy.py
If you feel that this output has mistakes, please open an issue at https://pagure.io/releng/

Your package (pcp) Fails To Install in Fedora 43:

can't install python3-pcp:
  - nothing provides python(abi) = 3.13 needed by python3-pcp-6.3.7-5.fc43.x86_64
  
If you know about this problem and are planning on fixing it, please acknowledge so by setting the bug status to ASSIGNED. If you don't have time to maintain this package, consider orphaning it, so maintainers of dependent packages realize the problem.


If you don't react accordingly to the policy for FTBFS/FTI bugs (https://docs.fedoraproject.org/en-US/fesco/Fails_to_build_from_source_Fails_to_install/), your package may be orphaned in 8+ weeks.


P.S. The data was generated solely from koji buildroot, so it might be newer than the latest compose or the content on mirrors. To reproduce, use the koji/local repo only, e.g. in mock:

    $ mock -r fedora-43-x86_64 --config-opts mirrored=False install python3-pcp


P.P.S. If this bug has been reported in the middle of upgrading multiple dependent packages, please consider using side tags: https://docs.fedoraproject.org/en-US/fesco/Updates_Policy/#updating-inter-dependent-packages

Thanks!

Comment 1 Fedora Fails To Install 2025-06-20 19:56:15 UTC
Hello,

Please note that this comment was generated automatically by https://pagure.io/releng/blob/main/f/scripts/ftbfs-fti/follow-policy.py
If you feel that this output has mistakes, please open an issue at https://pagure.io/releng/

This package fails to install and maintainers are advised to take one of the following actions:

 - Fix this bug and close this bugzilla once the update makes it to the repository.
   (The same script that posted this comment will eventually close this bugzilla
   when the fixed package reaches the repository, so you don't have to worry about it.)

or

 - Move this bug to ASSIGNED if you plan on fixing this, but simply haven't done so yet.

or

 - Orphan the package if you no longer plan to maintain it.


If you do not take one of these actions, the process at https://docs.fedoraproject.org/en-US/fesco/Fails_to_build_from_source_Fails_to_install/#_package_removal_for_long_standing_ftbfs_and_fti_bugs will continue.
This package may be orphaned in 7+ weeks.
This is the first reminder (step 3) from the policy.

Don't hesitate to ask for help on https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/ if you are unsure how to fix this bug.

Comment 2 Miro Hrončok 2025-06-22 18:30:40 UTC
A gating failur is blocking a fix of this: https://bodhi.fedoraproject.org/updates/FEDORA-2025-60271a0c5d -- please investigate and fix.

Comment 3 Jan Kurik 2025-06-23 03:18:50 UTC
There are two failed gating tests:

* The first one is installation test, which fails during its "downgrade" phase. This is OK as there are no two python versions (the old 3.13 and the new 3.14) available at the same time. This can be waived.

* The second one is test 1098 of the upstream pcp testsuite, testing the python-pcp it self. This needs to be investigated....

Comment 4 Jan Kurik 2025-06-23 04:53:12 UTC
Regarding the failing 1098 test: The test fails, as after upgrade of the python from 3.13 to 3.14, the new python core dumps at PyList_New function in Objects/listobject.c file (this is part of the python, not PCP).
It seems like there was some change in the way how python works with lists (Vectors). I am attaching Back Trace from my F43 test machine on x86_64.

As someone who is completely new to the Python API, I probably can't proceed with the investigation. Is this something you can move forward with, @wcohen, or do we need help from the Python team?

Comment 5 Jan Kurik 2025-06-23 04:55:12 UTC
Created attachment 2094793 [details]
Back trace

Comment 7 Jan Kurik 2025-06-23 06:06:50 UTC
Side note: There was some rework in python-3.14 in the part of the code where we see those core-dumps. Namely it is rework of freelists https://github.com/python/cpython/issues/100240 . Mentioning this here just for a reference as it might be related to the issue we are facing.

Comment 8 Victor Stinner 2025-06-24 10:03:00 UTC
I analyzed the coredump: the PyList freelist is in a corrupted state. Its size is zero (empty), but 'obj' is not NULL. It looks like a memory corruption in a C extension used by pcp. It's not a bug in Python.

This bug is very similar to the crick bug: https://bugzilla.redhat.com/show_bug.cgi?id=2367454#c3

For pcp, I don't know how to run the test suite. It lacks a "localconfig" file, but I don't know how to create this file.
---
# cd /var/lib/pcp/testsuite/
# ./1089
QA output created by 1089
./common.product: line 9: ./localconfig: No such file or directory
---

gdb analysis of the pcp crash:
------------------
# gdb -c core.python3.986.d2b0d3eef0924bb8a71b78aa829066b2.38464.1750652843000000 

Program terminated with signal SIGSEGV, Segmentation fault.
Downloading source file /usr/src/debug/python3.14-3.14.0~b3-2.fc43.x86_64/Include/internal/pycore_freelist.h
#0  _PyFreeList_PopNoStats (fl=0x7f2489573058 <_PyRuntime+102104>)              
    at /usr/src/debug/python3.14-3.14.0~b3-2.fc43.x86_64/Include/internal/pycore_freelist.h:79
79              fl->freelist = *(void **)obj;

(gdb) l
74      _PyFreeList_PopNoStats(struct _Py_freelist *fl)
75      {
76          void *obj = fl->freelist;
77          if (obj != NULL) {
78              assert(fl->size > 0);
79              fl->freelist = *(void **)obj;
80              fl->size--;
81          }
82          return obj;
83      }

(gdb) p /x obj  # should be 0
$4 = 0xffffffff
(gdb) p fl->size
$3 = 0
------------------

If you use a debug build of Python, Python should fail on the assertion: assert(fl->size > 0).


> Side note: There was some rework in python-3.14 in the part of the code where we see those core-dumps. Namely it is rework of freelists https://github.com/python/cpython/issues/100240 . Mentioning this here just for a reference as it might be related to the issue we are facing.

IMO the memory corruption exists with all Python versions, but it's just that it was undetected before Python 3.14. The new freelist design makes the memory corruption more visible.

Comment 9 Miro Hrončok 2025-06-24 12:19:13 UTC
Jan, to rerun the tests with python3-debug, dnf install ptthon3-debug and the use /usr/bin/python3.14d or /usr/bin/python3-debug where you normally use /usr/bin/python3.

Comment 10 Jan Kurik 2025-06-25 04:16:19 UTC
Created attachment 2095091 [details]
Back trace when using python-debug

Comment 12 Jan Kurik 2025-06-25 04:18:08 UTC
Attached back-trace and core-dump file when run using python-debug.

Comment 13 Jan Kurik 2025-06-25 05:59:46 UTC
@vstinner Thanks a lot for your help. May I ask you please for a review of my hypothesis, I was able to put together debugging the issue?

IMO the problem is in PCP code in function refresh_all_clusters in file src/python/pmda.c https://github.com/performancecopilot/pcp/blob/main/src/python/pmda.c#L389 .

In the function, we are building a tuple from a list of PyObjects:
arglist = Py_BuildValue("(N)", list);

However, checking the content of arglist in the core-file, it shows the reference counter is set to zero. As such, the next call of
Py_DECREF(arglist);
results in a negative reference counter -> assert/core-dump.

When I change the Py_BuildValue to use "O" instead of "N", so that we have the reference counter set to non zero value, then all works correctly, no core-dumps are seen and life is great and sunny :-)

Can you please confirm, the use of "O" instead of "N" in this case is the right way how to work with the tuple ?

Comment 14 Jan Kurik 2025-06-25 06:45:02 UTC
Side note: I am trying to push the fix above ("N" -> "O") through the upstream PCP pipeline, just to be sure it will not cause any regression: https://github.com/performancecopilot/pcp/pull/2228

Comment 15 Victor Stinner 2025-06-25 12:46:13 UTC
> Can you please confirm, the use of "O" instead of "N" in this case is the right way how to work with the tuple ?

I confirm that the current code is wrong and that using "O" is a good fix.

Simplified code:
---
    arglist = Py_BuildValue("(N)", list);
    Py_DECREF(list);
---

This code is wrong since "N" steals a reference, whereas 'list' is deleted by DECREF.

Comment 16 Jan Kurik 2025-06-26 05:20:07 UTC
Upstream patch for PCP: https://github.com/performancecopilot/pcp/pull/2228 fixes 
Waiting for a Fedora rebuild with this fix.

Comment 17 Jan Kurik 2025-06-26 15:35:47 UTC
pcp-6.3.7-7.fc43 is available in Fedora
https://bodhi.fedoraproject.org/updates/FEDORA-2025-9682ef8660


Note You need to log in before you can comment on or make changes to this bug.