Bug 2231751

Summary: Python 3.12 ctypes regression
Product: [Fedora] Fedora Reporter: Nathan Scott <nathans>
Component: python3.12Assignee: Victor Stinner <vstinner>
Status: CLOSED ERRATA QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 39CC: code, cstratak, jkurik, mhroncok, python-maint, python-packagers-sig, thrnciar, torsava, vstinner
Target Milestone: ---Keywords: Regression, Upgrades
Target Release: ---   
Hardware: All   
OS: Linux   
URL: https://performancecopilot.github.io/qa-reports/reports/20230812_213907-b3cc0288/#suites/3d55d380e43a65924702ab7b74fce31b/8e114a031acf49ef/
Whiteboard:
Fixed In Version: python3.12-3.12.0~rc2-1.fc40 python3.12-3.12.0~rc3-1.fc39 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-26 00:18:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nathan Scott 2023-08-13 22:43:09 UTC
We are observing a significant number of new failures in the PCP test suite on rawhide with the recent python update there.

It's unfortunately taken us this long to report due to problems with python3.12 making python3-pyodbc uninstallable, which caused dependent PCP sub-packages to become uninstallable, on rawhide.  We've had to (temporarily?) disable PCP metrics based on that module (pcp-pmda-mssql) in order to observe the remaining fallout.

Anyway, the ctypes issues seem to have two similar but slightly different signatures:

>     desc.contents.type, PM_TYPE_U64)
>     ^^^^^^^^^^^^^^^^^^
> AttributeError: 'dict' object has no attribute 'type'


and

>     if desc.contents.indom == pmapi.c_api.PM_INDOM_NULL:
>        ^^^^^^^^^^^^^
> ValueError: Unexpected NULL pointer in _objects


This ctypes structure for variable 'desc' (pmDesc) is here:
https://github.com/performancecopilot/pcp/blob/83cf926e507ab1302c18663daa4ce7d2129b99a1/src/python/pcp/pmapi.py#L600

These failure modes can be observed from daily PCP QA:
https://github.com/performancecopilot/pcp/actions
(click on a current "QA" action, then "fedora-rawhide-container", then the "QA" step therein - there's many examples of the above two failure signatures if you need a reproducible test case)

And the test cases are available in rawhide via pcp-testsuite package,
which puts failing cases (such as qa/991) below /var/lib/pcp/testsuite

Please let me know if any further information is needed.  FWIW there's been no change to PCP python wrapper library recently, and these tests pass on every other version of python (incl. python2) so we're 100% certain this is directly related to the python3.12 upgrade.

Reproducible: Always

Steps to Reproduce:
Run PCP python tools e.g. via pcp-testsuite
Actual Results:  
Numerous tests fail due to crashes in python ctypes code.

Expected Results:  
Tests pass, python tools using ctypes do not fail.

Comment 1 Fedora Release Engineering 2023-08-16 08:07:09 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle.
Changing version to 39.

Comment 2 Victor Stinner 2023-08-30 13:49:45 UTC
> We are observing a significant number of new failures in the PCP test suite on rawhide with the recent python update there.

Do you still have the issue? The latest GitHub Action QA job is a success, even on fedora-rawhide-container: https://github.com/performancecopilot/pcp/actions/runs/6015971693

I built successfully the package with Python 3.12 in the Python 3.12 COPR. But then I'm not sure how to run PCP test suite.

On my Fedora 38, I ran successfully the QA tests using commands given in .github/workflows/qa.yml for the "fedora-rawhide-container" platform:
---
set -e -x

PLATFORM=fedora-rawhide-container

source $PWD/VERSION.pcp
PACKAGE_BUILD="0.$(date +'%Y%m%d').$(git rev-parse --short HEAD)"
PCP_VERSION=${PACKAGE_MAJOR}.${PACKAGE_MINOR}.${PACKAGE_REVISION}
PCP_BUILD_VERSION=${PCP_VERSION}-${PACKAGE_BUILD}
sed -i "s/PACKAGE_BUILD=.*/PACKAGE_BUILD=${PACKAGE_BUILD}/" VERSION.pcp
sed -i "1 s/(.*)/(${PCP_BUILD_VERSION})/" debian/changelog
python3 -c 'import yaml' || python3 -m pip install pyyaml
mkdir -p artifacts/build artifacts/test
touch artifacts/build/.keep
build/ci/ci-run.py $PLATFORM setup

build/ci/ci-run.py $PLATFORM task build
build/ci/ci-run.py $PLATFORM artifacts build --path artifacts/build
build/ci/ci-run.py $PLATFORM task install
build/ci/ci-run.py $PLATFORM task init_qa
build/ci/ci-run.py $PLATFORM task qa
---

... Well, right now, the QA tests are still running: "[14%] 243". But so far, so good: so far, all QA tests succeeded.

Can you please explain me how to reproduce the issue on Python 3.12? Do you have a reproducer which doesn't require PCP, but only Python stdlib modules (ctypes)?

Comment 3 Victor Stinner 2023-08-30 13:54:49 UTC
Can you please explain me how to reproduce the issue on Python 3.12? Do you have a reproducer which doesn't require PCP, but only Python stdlib modules (ctypes)?

Comment 4 Ben Beasley 2023-08-30 14:02:27 UTC
This sounds a lot like https://github.com/python/cpython/issues/107940, a regression that appeared between 3.12.0b4 and 3.12.0rc1, and should be fixed in 3.12.0rc2. I’m waiting to see if rc2 fixes https://github.com/Toblerity/rtree/issues/277, which also looks similar.

Comment 5 Victor Stinner 2023-08-30 15:03:41 UTC
Oh wait, I kept the QA test running in background a many minutes later, I started to get errors!
---
...
[51%] 855
[51%] 856
[51%] 857
[51%] 858
[51%] 859 [failed, exit status 1] - output mismatch (see 859.out.bad)
2c2,15
< pmfg - OK
---
> pmfg -   File "/var/lib/pcp/testsuite/src/test_pmfg.py", line 132, in test_pmfg
>     test_pmfg_live(self, c_api.PM_CONTEXT_HOST, "local:")
>   File "/var/lib/pcp/testsuite/src/test_pmfg.py", line 28, in test_pmfg_live
>     v1 = pmfg.extend_item("sample.ulong.one") # infer type
>          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/usr/lib64/python3.12/site-packages/pcp/pmapi.py", line 3201, in extend_item
>     mtype = descs[0].type
>             ^^^^^^^^^^^^^
>   File "/usr/lib64/python3.12/site-packages/pcp/pmapi.py", line 614, in <lambda>
>     pmDescPtr.type = property(lambda x: x.contents.type, None, None, None)
>                                         ^^^^^^^^^^
> ValueError: Unexpected NULL pointer in _objects
> 
> ----------------------------------------------------------------------
Check local PMCD is still alive ...
PMDA probe: pminfo -h b620c05c31c0 -f sample.milliseconds
PMDA probe: pminfo -h b620c05c31c0 -f sampledso.milliseconds
PMDA probe: pminfo -h b620c05c31c0 -f simple.numfetch
[51%] 860
[51%] 861
[51%] 862
...
[52%] 879
[52%] 880 - output mismatch (see 880.out.bad)
2a3,20
> Traceback (most recent call last):
>   File "/usr/lib64/python3.12/site-packages/pcp/pmconfig.py", line 568, in check_metric
>     if desc.contents.indom == pmapi.c_api.PM_INDOM_NULL:
>        ^^^^^^^^^^^^^
> ValueError: Unexpected NULL pointer in _objects
> No matching instances found.
> Traceback (most recent call last):
>   File "/usr/lib64/python3.12/site-packages/pcp/pmconfig.py", line 568, in check_metric
>     if desc.contents.indom == pmapi.c_api.PM_INDOM_NULL:
>        ^^^^^^^^^^^^^
> ValueError: Unexpected NULL pointer in _objects
> No matching instances found.
> Traceback (most recent call last):
>   File "/usr/lib64/python3.12/site-packages/pcp/pmconfig.py", line 568, in check_metric
>     if desc.contents.indom == pmapi.c_api.PM_INDOM_NULL:
>        ^^^^^^^^^^^^^
> ValueError: Unexpected NULL pointer in _objects
> No matching instances found.
Check local PMCD is still alive ...
PMDA probe: pminfo -h b620c05c31c0 -f sample.milliseconds
PMDA probe: pminfo -h b620c05c31c0 -f sampledso.milliseconds
PMDA probe: pminfo -h b620c05c31c0 -f simple.numfetch
[52%] 881
[52%] 882
...
[59%] 991 - output mismatch (see 991.out.bad)
...
[59%] 991 - output mismatch (see 991.out.bad)
4,6c4,13
<               total        used        free      shared  buff/cache   available
< Mem:       16010088     7817828     2956804      651076     5235456     7208668
< Swap        8093692     1785600     6308092
---
> Traceback (most recent call last):
>   File "/usr/libexec/pcp/bin/pcp-free", line 239, in <module>
>     FREE.execute()
>   File "/usr/libexec/pcp/bin/pcp-free", line 156, in execute
>     values = self.extract(descs, result)
>              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/usr/libexec/pcp/bin/pcp-free", line 126, in extract
>     desc.contents.type, PM_TYPE_U64)
>     ^^^^^^^^^^^^^^^^^^
> AttributeError: 'dict' object has no attribute 'type'
9,11c16,25
<               total        used        free      shared  buff/cache   available
< Mem:       16010088    16010088           0           0           0           0
< Swap              0           0           0
---
> Traceback (most recent call last):
>   File "/usr/libexec/pcp/bin/pcp-free", line 239, in <module>
>     FREE.execute()
>   File "/usr/libexec/pcp/bin/pcp-free", line 156, in execute
>     values = self.extract(descs, result)
>              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/usr/libexec/pcp/bin/pcp-free", line 126, in extract
>     desc.contents.type, PM_TYPE_U64)
>     ^^^^^^^^^^^^^^^^^^
> AttributeError: 'dict' object has no attribute 'type'
14,16c28,37
<               total        used        free      shared  buff/cache   available
< Mem:    16394330112  8005455872  3027767296   666701824  5361106944  7381676032
< Swap     8287940608  1828454400  6459486208
---
...
[59%] 992
[60%] 993
...
[60%] 999 - output mismatch (see 999.out.bad)
...
> Traceback (most recent call last):
>   File "/usr/lib64/python3.12/site-packages/pcp/pmconfig.py", line 568, in check_metric
>     if desc.contents.indom == pmapi.c_api.PM_INDOM_NULL:
>        ^^^^^^^^^^^^^
> ValueError: Unexpected NULL pointer in _objects
...
[62%] 1062 - output mismatch (see 1062.out.bad)
...
> Traceback (most recent call last):
>  File "/usr/libxxx/pythonxxx.xxx/site-packages/pcp/pmconfig.py", line xxx, in check_metric
>  if desc.contents.indom == pmapi.c_api.PM_INDOM_NULL:
>  ^^^^^^^^^^^^^
> ValueError: Unexpected NULL pointer in _objects
> Traceback (most recent call last):
>  File "/usr/libxxx/pythonxxx.xxx/site-packages/pcp/pmconfig.py", line xxx, in check_metric
>  if desc.contents.indom == pmapi.c_api.PM_INDOM_NULL:
>  ^^^^^^^^^^^^^
...
---

Comment 6 Nathan Scott 2023-08-31 07:28:07 UTC
Thanks for looking into it Victor.  Looks like you've reproduced it now (I don't have any simpler test case at hand, sorry) and also found the numpy project reference where they've possibly encountered the same issue.

https://github.com/numpy/numpy/issues/24399
https://github.com/python/cpython/issues/107940

Comment 7 Victor Stinner 2023-09-04 19:48:55 UTC
I reverted a recent ctypes change which introduced a regression in numpy. It's likely the same bug which affects PCP.

The revert got merged in the Python 3.12 branch:
https://github.com/python/cpython/commit/e0f6080819f00d456215646e3117ae77b9af40d1

You can expect it as part of the incoming Python 3.12.0rc2 release, which is expected to be released today.

I propose to leave this issue open until Python 3.12.0rc2 is shipped in Fedora Rawhide and someone can confirm that the bug is fixed. Or at least, test it in the PCP upstream CI.

Comment 8 Nathan Scott 2023-09-04 22:12:10 UTC
Thanks Victor!  Soon as its in rawhide we should see it in PCP daily CI on the next day - I'll report back then.

Comment 9 Nathan Scott 2023-09-06 22:41:24 UTC
*** Bug 2237699 has been marked as a duplicate of this bug. ***

Comment 11 Nathan Scott 2023-09-11 00:57:37 UTC
I can confirm the issue is resolved now in rawhide - thanks!

Comment 12 Fedora Update System 2023-09-11 09:50:07 UTC
FEDORA-2023-623962bb38 has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2023-623962bb38

Comment 13 Fedora Update System 2023-09-21 01:11:27 UTC
FEDORA-2023-9d033517d4 has been pushed to the Fedora 39 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-9d033517d4`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-9d033517d4

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 14 Fedora Update System 2023-09-26 00:18:26 UTC
FEDORA-2023-9d033517d4 has been pushed to the Fedora 39 stable repository.
If problem still persists, please make note of it in this bug report.