.Programs using `papi` no longer abort when shutting down
Previously, `papi` initialized threads before `papi` initialized some components. Because of this, entries for some components describing the number of elements in arrays were not set to correct values and zero-sized memory allocations were attempted. As a consequence, later accesses and frees of those zero-sized memory allocations would cause the programs to abort.
With this update, the bug has been fixed and programs using `papi` no longer abort when shutting down.
Description of problem:
During initialization some threads are run in the wrong order and use uninitialized data. This can cause illegal memory accesses which cause aborts. This can be observed on aarch64 machine. However running the test under valgrind show that x86_64 also has those problematic accesses.
Version-Release number of selected component (if applicable):
papi-6.0.0-14.el9.src.rpm
How reproducible:
everytime.
Steps to Reproduce:
1. dnf install papi-testsuite
2. cd /usr/share/papi
3. ./components/stealtime/tests/stealtime_basic
Actual results:
# ./components/stealtime/tests/stealtime_basic
Trying all stealtime events
Found stealtime component 13 - stealtime
stealtime:::TOTAL value: 0
stealtime:::CPU1 value: 0
stealtime:::CPU2 value: 0
stealtime:::CPU3 value: 0
stealtime:::CPU4 value: 0
Note: for this test the values are expected to all be 0
unless run inside a VM on a busy system.
PASSED
free(): invalid next size (fast)
Aborted (core dumped)
Expected results:
No "invalid next size" or "Aborted (core dumped)" after the "PASS"
./components/stealtime/tests/stealtime_basic
Trying all stealtime events
Found stealtime component 13 - stealtime
stealtime:::TOTAL value: 0
stealtime:::CPU1 value: 0
stealtime:::CPU2 value: 0
stealtime:::CPU3 value: 0
stealtime:::CPU4 value: 0
Note: for this test the values are expected to all be 0
unless run inside a VM on a busy system.
PASSED
Additional info:
This can also be observed with the ./ctests/all_native_events test.
The upstream papi git commit 3625bdbad9fd57d1cdb1e5615854545167d4adcb below addresses the problem
Author: Anthony Castaldo <TonyCastaldo.edu> 2020-08-26 17:18:29
Committer: Anthony Castaldo <TonyCastaldo.edu> 2020-08-26 17:18:29
Parent: 82fdd098d2c1c6aad20b139dcb7a3a6a508b5580 (Merged in master (pull request #126))
Child: 9266f6ebde64883f886793d7a8ce1d475d3589ea (Merged in master (pull request #131))
Branches: master, remotes/origin/master
Follows:
Precedes:
This modifies PAPI_library_init() to initialize components in two classes,
separated by the initialization of the papi thread structure. The first class
is those that need no thread structure, currently everything but perf_event and
perf_event_uncore. Following the init of the threading structure, we init the
second class (perf_event and perf_event_uncore) that DOES need the thread
structure to successfully init_component(). This required a change to
_papi_hwi_init_global(), to add an argument to distinguish which class it
should initialize.
(In reply to William Cohen from comment #1)
> Built papi-6.0.0-15.el9 with the upstream patch to address this issue.
Hi Will,
Thank you very much.
I would like to test papi-6.0.0-15.el9 on FX700.
Could you please tell me the URL to download it?
Best,
Fuli
Hi,
I have tested papi-6.0.0-15.el9 on FX700.
Both ctests/all_native_events and components/stealtime/tests/stealtime_basic were "PASSED" without "invalid next size" or "Aborted (core dumped)".
Best,
Fuli
Description of problem: During initialization some threads are run in the wrong order and use uninitialized data. This can cause illegal memory accesses which cause aborts. This can be observed on aarch64 machine. However running the test under valgrind show that x86_64 also has those problematic accesses. Version-Release number of selected component (if applicable): papi-6.0.0-14.el9.src.rpm How reproducible: everytime. Steps to Reproduce: 1. dnf install papi-testsuite 2. cd /usr/share/papi 3. ./components/stealtime/tests/stealtime_basic Actual results: # ./components/stealtime/tests/stealtime_basic Trying all stealtime events Found stealtime component 13 - stealtime stealtime:::TOTAL value: 0 stealtime:::CPU1 value: 0 stealtime:::CPU2 value: 0 stealtime:::CPU3 value: 0 stealtime:::CPU4 value: 0 Note: for this test the values are expected to all be 0 unless run inside a VM on a busy system. PASSED free(): invalid next size (fast) Aborted (core dumped) Expected results: No "invalid next size" or "Aborted (core dumped)" after the "PASS" ./components/stealtime/tests/stealtime_basic Trying all stealtime events Found stealtime component 13 - stealtime stealtime:::TOTAL value: 0 stealtime:::CPU1 value: 0 stealtime:::CPU2 value: 0 stealtime:::CPU3 value: 0 stealtime:::CPU4 value: 0 Note: for this test the values are expected to all be 0 unless run inside a VM on a busy system. PASSED Additional info: This can also be observed with the ./ctests/all_native_events test. The upstream papi git commit 3625bdbad9fd57d1cdb1e5615854545167d4adcb below addresses the problem Author: Anthony Castaldo <TonyCastaldo.edu> 2020-08-26 17:18:29 Committer: Anthony Castaldo <TonyCastaldo.edu> 2020-08-26 17:18:29 Parent: 82fdd098d2c1c6aad20b139dcb7a3a6a508b5580 (Merged in master (pull request #126)) Child: 9266f6ebde64883f886793d7a8ce1d475d3589ea (Merged in master (pull request #131)) Branches: master, remotes/origin/master Follows: Precedes: This modifies PAPI_library_init() to initialize components in two classes, separated by the initialization of the papi thread structure. The first class is those that need no thread structure, currently everything but perf_event and perf_event_uncore. Following the init of the threading structure, we init the second class (perf_event and perf_event_uncore) that DOES need the thread structure to successfully init_component(). This required a change to _papi_hwi_init_global(), to add an argument to distinguish which class it should initialize.