Bug 2007883

Summary: [abrt] papi-testsuite: _papi_hwi_cleanup_eventset(): all_native_events killed by SIGABRT
Product: [Fedora] Fedora Reporter: Török Edwin <edwin+bugs>
Component: papiAssignee: William Cohen <wcohen>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 34CC: lberk, wcohen
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
URL: https://retrace.fedoraproject.org/faf/reports/bthash/096a835d7d0f103d40af8ab6c20f7772be8123df
Whiteboard: abrt_hash:ef320002a7e0c93e3a9843e3605eff05f128bcf7;VARIANT_ID=workstation;
Fixed In Version: papi-6.0.0-10.fc34 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-28 01:09:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File: backtrace
none
File: core_backtrace
none
File: cpuinfo
none
File: dso_list
none
File: environ
none
File: limits
none
File: maps
none
File: mountinfo
none
File: open_fds
none
File: proc_pid_status none

Description Török Edwin 2021-09-25 22:22:24 UTC
Version-Release number of selected component:
papi-testsuite-6.0.0-7.fc34

Additional info:
reporter:       libreport-2.15.2
backtrace_rating: 4
cgroup:         0::/user.slice/user-1000.slice/user/app.slice/app-org.gnome.Terminal.slice/vte-spawn-e72706b1-43bc-49a4-94ab-465a9eb697ef.scope
cmdline:        ./ctests/all_native_events TESTS_QUIET
crash_function: _papi_hwi_cleanup_eventset
executable:     /usr/share/papi/ctests/all_native_events
journald_cursor: s=68e38ebf976a4c7b9d23c531e3fbc082;i=11845;b=550b8228d45a43918585e9682c133605;m=ca45a962;t=5c9d9da9e2e5d;x=33c4cfa5f349032
kernel:         5.13.9-200.fc34.x86_64
rootdir:        /
runlevel:       N 5
type:           CCpp
uid:            0

Truncated backtrace:
Thread no. 1 (4 frames)
 #6 _papi_hwi_cleanup_eventset at papi_internal.c:1825
 #7 PAPI_cleanup_eventset at papi.c:3491
 #8 PAPI_shutdown at papi.c:5035
 #9 test_pass at test_utils.c:459

Comment 1 Török Edwin 2021-09-25 22:22:28 UTC
Created attachment 1826249 [details]
File: backtrace

Comment 2 Török Edwin 2021-09-25 22:22:29 UTC
Created attachment 1826250 [details]
File: core_backtrace

Comment 3 Török Edwin 2021-09-25 22:22:30 UTC
Created attachment 1826251 [details]
File: cpuinfo

Comment 4 Török Edwin 2021-09-25 22:22:31 UTC
Created attachment 1826252 [details]
File: dso_list

Comment 5 Török Edwin 2021-09-25 22:22:32 UTC
Created attachment 1826253 [details]
File: environ

Comment 6 Török Edwin 2021-09-25 22:22:33 UTC
Created attachment 1826254 [details]
File: limits

Comment 7 Török Edwin 2021-09-25 22:22:34 UTC
Created attachment 1826255 [details]
File: maps

Comment 8 Török Edwin 2021-09-25 22:22:35 UTC
Created attachment 1826256 [details]
File: mountinfo

Comment 9 Török Edwin 2021-09-25 22:22:36 UTC
Created attachment 1826257 [details]
File: open_fds

Comment 10 Török Edwin 2021-09-25 22:22:37 UTC
Created attachment 1826258 [details]
File: proc_pid_status

Comment 11 William Cohen 2021-09-29 21:54:28 UTC
I suspect there is something related to the specific processor implementation triggering a problem.  I don't have access to an AMD Ryzen 9 3900X machine.  Could you run the test under valgrind and report the results:

valgrind ./ctests/all_native_events TESTS_QUIET

Also provide the results of:

papi_avail -a

Comment 12 Török Edwin 2021-10-17 22:35:17 UTC
papi_avail -a output:

aAvailable PAPI preset and user defined events plus hardware information.
--------------------------------------------------------------------------------
PAPI version             : 6.0.0.0
Operating system         : Linux 5.14.11-200.fc34.x86_64
Vendor string and code   : AuthenticAMD (2, 0x2)
Model string and code    : AMD Ryzen 9 3900X 12-Core Processor (113, 0x71)
CPU revision             : 0.000000
CPUID                    : Family/Model/Stepping 23/113/0, 0x17/0x71/0x00
CPU Max MHz              : 4672
CPU Min MHz              : 2200
Total cores              : 24
SMT threads per core     : 2
Cores per socket         : 12
Sockets                  : 1
Cores per NUMA region    : 24
NUMA regions             : 1
Running in a VM          : no
Number Hardware Counters : 5
Max Multiplex Counters   : 384
Fast counter read (rdpmc): yes
--------------------------------------------------------------------------------

================================================================================
  PAPI Preset Events
================================================================================
    Name        Code    Deriv Description (Note)

PAPI_L1_ICM  0x80000001  No   Level 1 instruction cache misses
PAPI_TLB_DM  0x80000014  No   Data translation lookaside buffer misses
PAPI_TLB_IM  0x80000015  Yes  Instruction translation lookaside buffer misses
PAPI_STL_ICY 0x80000025  No   Cycles with no instruction issue
PAPI_BR_TKN  0x8000002c  No   Conditional branch instructions taken
PAPI_BR_MSP  0x8000002e  No   Conditional branch instructions mispredicted
PAPI_TOT_INS 0x80000032  No   Instructions completed
PAPI_FP_INS  0x80000034  No   Floating point instructions
PAPI_BR_INS  0x80000037  No   Branch instructions
PAPI_VEC_INS 0x80000038  No   Vector/SIMD instructions (could include integer)
PAPI_TOT_CYC 0x8000003b  No   Total cycles
PAPI_L1_DCA  0x80000040  No   Level 1 data cache accesses
PAPI_L1_ICH  0x80000049  Yes  Level 1 instruction cache hits
PAPI_L1_ICA  0x8000004c  No   Level 1 instruction cache accesses
PAPI_L2_ICA  0x8000004d  No   Level 2 instruction cache accesses
PAPI_L1_ICR  0x8000004f  No   Level 1 instruction cache reads
PAPI_L1_TCA  0x80000058  Yes  Level 1 total cache accesses
PAPI_FML_INS 0x80000061  No   Floating point multiply instructions
PAPI_FAD_INS 0x80000062  No   Floating point add instructions
PAPI_FDV_INS 0x80000063  No   Floating point divide instructions (Counts both divide and square root instructions)
PAPI_FSQ_INS 0x80000064  No   Floating point square root instructions (Counts both divide and square root instructions)
PAPI_FP_OPS  0x80000066  No   Floating point operations
PAPI_SP_OPS  0x80000067  No   Floating point operations; optimized to count scaled single precision vector operations
PAPI_DP_OPS  0x80000068  No   Floating point operations; optimized to count scaled double precision vector operations
--------------------------------------------------------------------------------
Of 24 available events, 3 are derived.


And here is valgrind output, unfortunately it looks like it doesn't know how to emulate rdpmc:
valgrind ./ctests/all_native_events
==64949== Memcheck, a memory error detector
==64949== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==64949== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==64949== Command: ./ctests/all_native_events
==64949== 
PAPI Error: Couldn't open hw_instructions in exclude_guest=0 test
Test case ALL_NATIVE_EVENTS: Available native events and hardware information.
vex amd64->IR: unhandled instruction bytes: 0xF 0x33 0x8B 0x4E 0x8 0x44 0x39 0xD1 0xF 0x84
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==64949== valgrind: Unrecognised instruction at address 0x48993c7.
==64949==    at 0x48993C7: rdpmc (perf_helpers.h:48)
==64949==    by 0x48993C7: mmap_read_self (perf_helpers.h:130)
==64949==    by 0x48993C7: _pe_rdpmc_read (perf_event.c:1115)
==64949==    by 0x48993C7: _pe_read (perf_event.c:1281)
==64949==    by 0x48862D8: _papi_hwi_read (papi_internal.c:1710)
==64949==    by 0x4880BB8: PAPI_stop (papi.c:2888)
==64949==    by 0x10A7DC: ??? (in /usr/share/papi/ctests/all_native_events)
==64949==    by 0x10A9AF: ??? (in /usr/share/papi/ctests/all_native_events)
==64949==    by 0x491CB74: (below main) (libc-start.c:332)
==64949== Your program just tried to execute an instruction that Valgrind
==64949== did not recognise.  There are two possible reasons for this.
==64949== 1. Your program has a bug and erroneously jumped to a non-code
==64949==    location.  If you are running Memcheck and you just saw a
==64949==    warning about a bad jump, it's probably your program's fault.
==64949== 2. The instruction is legitimate but Valgrind doesn't handle it,
==64949==    i.e. it's Valgrind's fault.  If you think this is the case or
==64949==    you are not sure, please let us know and we'll try to fix it.
==64949== Either way, Valgrind will now raise a SIGILL signal which will
==64949== probably kill your program.
==64949== 
==64949== Process terminating with default action of signal 4 (SIGILL): dumping core
==64949==  Illegal opcode at address 0x48993C7
==64949==    at 0x48993C7: rdpmc (perf_helpers.h:48)
==64949==    by 0x48993C7: mmap_read_self (perf_helpers.h:130)
==64949==    by 0x48993C7: _pe_rdpmc_read (perf_event.c:1115)
==64949==    by 0x48993C7: _pe_read (perf_event.c:1281)
==64949==    by 0x48862D8: _papi_hwi_read (papi_internal.c:1710)
==64949==    by 0x4880BB8: PAPI_stop (papi.c:2888)
==64949==    by 0x10A7DC: ??? (in /usr/share/papi/ctests/all_native_events)
==64949==    by 0x10A9AF: ??? (in /usr/share/papi/ctests/all_native_events)
==64949==    by 0x491CB74: (below main) (libc-start.c:332)
==64949== 
==64949== HEAP SUMMARY:
==64949==     in use at exit: 465,593 bytes in 592 blocks
==64949==   total heap usage: 3,404 allocs, 2,812 frees, 3,474,443 bytes allocated
==64949== 
==64949== LEAK SUMMARY:
==64949==    definitely lost: 3,789 bytes in 42 blocks
==64949==    indirectly lost: 0 bytes in 0 blocks
==64949==      possibly lost: 0 bytes in 0 blocks
==64949==    still reachable: 461,804 bytes in 550 blocks
==64949==         suppressed: 0 bytes in 0 blocks
==64949== Rerun with --leak-check=full to see details of leaked memory
==64949== 
==64949== For lists of detected and suppressed errors, rerun with: -s
==64949== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
[1]    64949 illegal hardware instruction (core dumped)  valgrind ./ctests/all_native_events


However I suspect this is related to the stealtime bug:
https://bugzilla.redhat.com/show_bug.cgi?id=2007882 and https://bugzilla.redhat.com/show_bug.cgi?id=2007877
which appears to be a problem with the initialization order (try to access an array that got allocated with size 0)

Comment 13 William Cohen 2021-11-19 15:43:49 UTC
Backported following patch that addresses the problem

commit 3625bdbad9fd57d1cdb1e5615854545167d4adcb
Author: Anthony Castaldo <TonyCastaldo.edu>
Date:   Wed Aug 26 17:18:29 2020 -0400

    This modifies PAPI_library_init() to initialize components in two classes,
    separated by the initialization of the papi thread structure.  The first class
    is those that need no thread structure, currently everything but perf_event and
    perf_event_uncore. Following the init of the threading structure, we init the
    second class (perf_event and perf_event_uncore) that DOES need the thread
    structure to successfully init_component().  This required a change to
    _papi_hwi_init_global(), to add an argument to distinguish which class it
    should initialize.

Comment 14 Fedora Update System 2021-11-20 02:05:40 UTC
FEDORA-2021-752e807fdd has been pushed to the Fedora 34 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-752e807fdd`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-752e807fdd

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 15 Fedora Update System 2021-11-28 01:09:32 UTC
FEDORA-2021-752e807fdd has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.