Bug 2215582 - papi initialization threads run in the wrong order
Summary: papi initialization threads run in the wrong order
Keywords:
Status: VERIFIED
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: papi
Version: 9.3
Hardware: aarch64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: William Cohen
QA Contact: Lenka Špačková
Jacob Taylor Valdez
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-16 15:33 UTC by William Cohen
Modified: 2023-08-02 12:24 UTC (History)
6 users (show)

Fixed In Version: papi-6.0.0-15.el9
Doc Type: Bug Fix
Doc Text:
.Programs using `papi` no longer abort when shutting down Previously, `papi` initialized threads before `papi` initialized some components. Because of this, entries for some components describing the number of elements in arrays were not set to correct values and zero-sized memory allocations were attempted. As a consequence, later accesses and frees of those zero-sized memory allocations would cause the programs to abort. With this update, the bug has been fixed and programs using `papi` no longer abort when shutting down.
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-160101 0 None None None 2023-06-16 15:34:44 UTC

Description William Cohen 2023-06-16 15:33:50 UTC
Description of problem:

During initialization some threads are run in the wrong order and use uninitialized data.  This can cause illegal memory accesses which cause aborts.  This can be observed on aarch64 machine.  However running the test under valgrind show that x86_64 also has those problematic accesses.


Version-Release number of selected component (if applicable):

papi-6.0.0-14.el9.src.rpm

How reproducible:

everytime.


Steps to Reproduce:
1. dnf install papi-testsuite
2. cd /usr/share/papi
3. ./components/stealtime/tests/stealtime_basic

Actual results:

#  ./components/stealtime/tests/stealtime_basic  
Trying all stealtime events
	Found stealtime component 13 - stealtime
  stealtime:::TOTAL  value: 0
  stealtime:::CPU1  value: 0
  stealtime:::CPU2  value: 0
  stealtime:::CPU3  value: 0
  stealtime:::CPU4  value: 0
Note: for this test the values are expected to all be 0
	 unless run inside a VM on a busy system.
PASSED
free(): invalid next size (fast)
Aborted (core dumped)


Expected results:

No "invalid next size" or "Aborted (core dumped)" after the "PASS"


  ./components/stealtime/tests/stealtime_basic  
Trying all stealtime events
	Found stealtime component 13 - stealtime
  stealtime:::TOTAL  value: 0
  stealtime:::CPU1  value: 0
  stealtime:::CPU2  value: 0
  stealtime:::CPU3  value: 0
  stealtime:::CPU4  value: 0
Note: for this test the values are expected to all be 0
	 unless run inside a VM on a busy system.
PASSED


Additional info:

This can also be observed with the ./ctests/all_native_events test.

The upstream papi git commit 3625bdbad9fd57d1cdb1e5615854545167d4adcb below addresses the problem


Author: Anthony Castaldo <TonyCastaldo.edu>  2020-08-26 17:18:29
Committer: Anthony Castaldo <TonyCastaldo.edu>  2020-08-26 17:18:29
Parent: 82fdd098d2c1c6aad20b139dcb7a3a6a508b5580 (Merged in master (pull request #126))
Child:  9266f6ebde64883f886793d7a8ce1d475d3589ea (Merged in master (pull request #131))
Branches: master, remotes/origin/master
Follows: 
Precedes: 

    This modifies PAPI_library_init() to initialize components in two classes,
    separated by the initialization of the papi thread structure.  The first class
    is those that need no thread structure, currently everything but perf_event and
    perf_event_uncore. Following the init of the threading structure, we init the
    second class (perf_event and perf_event_uncore) that DOES need the thread
    structure to successfully init_component().  This required a change to
    _papi_hwi_init_global(), to add an argument to distinguish which class it
    should initialize.

Comment 1 William Cohen 2023-06-20 16:13:06 UTC
Built papi-6.0.0-15.el9 with the upstream patch to address this issue.

Comment 2 QI Fuli 2023-06-20 17:50:58 UTC
(In reply to William Cohen from comment #1)
> Built papi-6.0.0-15.el9 with the upstream patch to address this issue.

Hi Will,

Thank you very much.
I would like to test papi-6.0.0-15.el9 on FX700.
Could you please tell me the URL to download it?

Best,
Fuli

Comment 3 William Cohen 2023-06-20 23:53:17 UTC
Answered the question on internal chat.

Comment 5 QI Fuli 2023-06-21 18:38:53 UTC
Hi,

I have tested papi-6.0.0-15.el9 on FX700.
Both ctests/all_native_events and components/stealtime/tests/stealtime_basic were "PASSED" without "invalid next size" or "Aborted (core dumped)".

Best,
Fuli

Comment 10 William Cohen 2023-08-02 12:24:26 UTC
The doc text looks fine to me.


Note You need to log in before you can comment on or make changes to this bug.