Bug 2007883 - [abrt] papi-testsuite: _papi_hwi_cleanup_eventset(): all_native_events killed by SIGABRT
Summary: [abrt] papi-testsuite: _papi_hwi_cleanup_eventset(): all_native_events killed...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: papi
Version: 34
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: William Cohen
QA Contact: Fedora Extras Quality Assurance
URL: https://retrace.fedoraproject.org/faf...
Whiteboard: abrt_hash:ef320002a7e0c93e3a9843e3605...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-25 22:22 UTC by Török Edwin
Modified: 2021-11-28 01:09 UTC (History)
2 users (show)

Fixed In Version: papi-6.0.0-10.fc34
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-28 01:09:32 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
File: backtrace (16.69 KB, text/plain)
2021-09-25 22:22 UTC, Török Edwin
no flags Details
File: core_backtrace (3.28 KB, text/plain)
2021-09-25 22:22 UTC, Török Edwin
no flags Details
File: cpuinfo (2.57 KB, text/plain)
2021-09-25 22:22 UTC, Török Edwin
no flags Details
File: dso_list (509 bytes, text/plain)
2021-09-25 22:22 UTC, Török Edwin
no flags Details
File: environ (2.09 KB, text/plain)
2021-09-25 22:22 UTC, Török Edwin
no flags Details
File: limits (1.29 KB, text/plain)
2021-09-25 22:22 UTC, Török Edwin
no flags Details
File: maps (3.80 KB, text/plain)
2021-09-25 22:22 UTC, Török Edwin
no flags Details
File: mountinfo (3.05 KB, text/plain)
2021-09-25 22:22 UTC, Török Edwin
no flags Details
File: open_fds (194 bytes, text/plain)
2021-09-25 22:22 UTC, Török Edwin
no flags Details
File: proc_pid_status (1.38 KB, text/plain)
2021-09-25 22:22 UTC, Török Edwin
no flags Details

Description Török Edwin 2021-09-25 22:22:24 UTC
Version-Release number of selected component:
papi-testsuite-6.0.0-7.fc34

Additional info:
reporter:       libreport-2.15.2
backtrace_rating: 4
cgroup:         0::/user.slice/user-1000.slice/user/app.slice/app-org.gnome.Terminal.slice/vte-spawn-e72706b1-43bc-49a4-94ab-465a9eb697ef.scope
cmdline:        ./ctests/all_native_events TESTS_QUIET
crash_function: _papi_hwi_cleanup_eventset
executable:     /usr/share/papi/ctests/all_native_events
journald_cursor: s=68e38ebf976a4c7b9d23c531e3fbc082;i=11845;b=550b8228d45a43918585e9682c133605;m=ca45a962;t=5c9d9da9e2e5d;x=33c4cfa5f349032
kernel:         5.13.9-200.fc34.x86_64
rootdir:        /
runlevel:       N 5
type:           CCpp
uid:            0

Truncated backtrace:
Thread no. 1 (4 frames)
 #6 _papi_hwi_cleanup_eventset at papi_internal.c:1825
 #7 PAPI_cleanup_eventset at papi.c:3491
 #8 PAPI_shutdown at papi.c:5035
 #9 test_pass at test_utils.c:459

Comment 1 Török Edwin 2021-09-25 22:22:28 UTC
Created attachment 1826249 [details]
File: backtrace

Comment 2 Török Edwin 2021-09-25 22:22:29 UTC
Created attachment 1826250 [details]
File: core_backtrace

Comment 3 Török Edwin 2021-09-25 22:22:30 UTC
Created attachment 1826251 [details]
File: cpuinfo

Comment 4 Török Edwin 2021-09-25 22:22:31 UTC
Created attachment 1826252 [details]
File: dso_list

Comment 5 Török Edwin 2021-09-25 22:22:32 UTC
Created attachment 1826253 [details]
File: environ

Comment 6 Török Edwin 2021-09-25 22:22:33 UTC
Created attachment 1826254 [details]
File: limits

Comment 7 Török Edwin 2021-09-25 22:22:34 UTC
Created attachment 1826255 [details]
File: maps

Comment 8 Török Edwin 2021-09-25 22:22:35 UTC
Created attachment 1826256 [details]
File: mountinfo

Comment 9 Török Edwin 2021-09-25 22:22:36 UTC
Created attachment 1826257 [details]
File: open_fds

Comment 10 Török Edwin 2021-09-25 22:22:37 UTC
Created attachment 1826258 [details]
File: proc_pid_status

Comment 11 William Cohen 2021-09-29 21:54:28 UTC
I suspect there is something related to the specific processor implementation triggering a problem.  I don't have access to an AMD Ryzen 9 3900X machine.  Could you run the test under valgrind and report the results:

valgrind ./ctests/all_native_events TESTS_QUIET

Also provide the results of:

papi_avail -a

Comment 12 Török Edwin 2021-10-17 22:35:17 UTC
papi_avail -a output:

aAvailable PAPI preset and user defined events plus hardware information.
--------------------------------------------------------------------------------
PAPI version             : 6.0.0.0
Operating system         : Linux 5.14.11-200.fc34.x86_64
Vendor string and code   : AuthenticAMD (2, 0x2)
Model string and code    : AMD Ryzen 9 3900X 12-Core Processor (113, 0x71)
CPU revision             : 0.000000
CPUID                    : Family/Model/Stepping 23/113/0, 0x17/0x71/0x00
CPU Max MHz              : 4672
CPU Min MHz              : 2200
Total cores              : 24
SMT threads per core     : 2
Cores per socket         : 12
Sockets                  : 1
Cores per NUMA region    : 24
NUMA regions             : 1
Running in a VM          : no
Number Hardware Counters : 5
Max Multiplex Counters   : 384
Fast counter read (rdpmc): yes
--------------------------------------------------------------------------------

================================================================================
  PAPI Preset Events
================================================================================
    Name        Code    Deriv Description (Note)

PAPI_L1_ICM  0x80000001  No   Level 1 instruction cache misses
PAPI_TLB_DM  0x80000014  No   Data translation lookaside buffer misses
PAPI_TLB_IM  0x80000015  Yes  Instruction translation lookaside buffer misses
PAPI_STL_ICY 0x80000025  No   Cycles with no instruction issue
PAPI_BR_TKN  0x8000002c  No   Conditional branch instructions taken
PAPI_BR_MSP  0x8000002e  No   Conditional branch instructions mispredicted
PAPI_TOT_INS 0x80000032  No   Instructions completed
PAPI_FP_INS  0x80000034  No   Floating point instructions
PAPI_BR_INS  0x80000037  No   Branch instructions
PAPI_VEC_INS 0x80000038  No   Vector/SIMD instructions (could include integer)
PAPI_TOT_CYC 0x8000003b  No   Total cycles
PAPI_L1_DCA  0x80000040  No   Level 1 data cache accesses
PAPI_L1_ICH  0x80000049  Yes  Level 1 instruction cache hits
PAPI_L1_ICA  0x8000004c  No   Level 1 instruction cache accesses
PAPI_L2_ICA  0x8000004d  No   Level 2 instruction cache accesses
PAPI_L1_ICR  0x8000004f  No   Level 1 instruction cache reads
PAPI_L1_TCA  0x80000058  Yes  Level 1 total cache accesses
PAPI_FML_INS 0x80000061  No   Floating point multiply instructions
PAPI_FAD_INS 0x80000062  No   Floating point add instructions
PAPI_FDV_INS 0x80000063  No   Floating point divide instructions (Counts both divide and square root instructions)
PAPI_FSQ_INS 0x80000064  No   Floating point square root instructions (Counts both divide and square root instructions)
PAPI_FP_OPS  0x80000066  No   Floating point operations
PAPI_SP_OPS  0x80000067  No   Floating point operations; optimized to count scaled single precision vector operations
PAPI_DP_OPS  0x80000068  No   Floating point operations; optimized to count scaled double precision vector operations
--------------------------------------------------------------------------------
Of 24 available events, 3 are derived.


And here is valgrind output, unfortunately it looks like it doesn't know how to emulate rdpmc:
valgrind ./ctests/all_native_events
==64949== Memcheck, a memory error detector
==64949== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==64949== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==64949== Command: ./ctests/all_native_events
==64949== 
PAPI Error: Couldn't open hw_instructions in exclude_guest=0 test
Test case ALL_NATIVE_EVENTS: Available native events and hardware information.
vex amd64->IR: unhandled instruction bytes: 0xF 0x33 0x8B 0x4E 0x8 0x44 0x39 0xD1 0xF 0x84
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==64949== valgrind: Unrecognised instruction at address 0x48993c7.
==64949==    at 0x48993C7: rdpmc (perf_helpers.h:48)
==64949==    by 0x48993C7: mmap_read_self (perf_helpers.h:130)
==64949==    by 0x48993C7: _pe_rdpmc_read (perf_event.c:1115)
==64949==    by 0x48993C7: _pe_read (perf_event.c:1281)
==64949==    by 0x48862D8: _papi_hwi_read (papi_internal.c:1710)
==64949==    by 0x4880BB8: PAPI_stop (papi.c:2888)
==64949==    by 0x10A7DC: ??? (in /usr/share/papi/ctests/all_native_events)
==64949==    by 0x10A9AF: ??? (in /usr/share/papi/ctests/all_native_events)
==64949==    by 0x491CB74: (below main) (libc-start.c:332)
==64949== Your program just tried to execute an instruction that Valgrind
==64949== did not recognise.  There are two possible reasons for this.
==64949== 1. Your program has a bug and erroneously jumped to a non-code
==64949==    location.  If you are running Memcheck and you just saw a
==64949==    warning about a bad jump, it's probably your program's fault.
==64949== 2. The instruction is legitimate but Valgrind doesn't handle it,
==64949==    i.e. it's Valgrind's fault.  If you think this is the case or
==64949==    you are not sure, please let us know and we'll try to fix it.
==64949== Either way, Valgrind will now raise a SIGILL signal which will
==64949== probably kill your program.
==64949== 
==64949== Process terminating with default action of signal 4 (SIGILL): dumping core
==64949==  Illegal opcode at address 0x48993C7
==64949==    at 0x48993C7: rdpmc (perf_helpers.h:48)
==64949==    by 0x48993C7: mmap_read_self (perf_helpers.h:130)
==64949==    by 0x48993C7: _pe_rdpmc_read (perf_event.c:1115)
==64949==    by 0x48993C7: _pe_read (perf_event.c:1281)
==64949==    by 0x48862D8: _papi_hwi_read (papi_internal.c:1710)
==64949==    by 0x4880BB8: PAPI_stop (papi.c:2888)
==64949==    by 0x10A7DC: ??? (in /usr/share/papi/ctests/all_native_events)
==64949==    by 0x10A9AF: ??? (in /usr/share/papi/ctests/all_native_events)
==64949==    by 0x491CB74: (below main) (libc-start.c:332)
==64949== 
==64949== HEAP SUMMARY:
==64949==     in use at exit: 465,593 bytes in 592 blocks
==64949==   total heap usage: 3,404 allocs, 2,812 frees, 3,474,443 bytes allocated
==64949== 
==64949== LEAK SUMMARY:
==64949==    definitely lost: 3,789 bytes in 42 blocks
==64949==    indirectly lost: 0 bytes in 0 blocks
==64949==      possibly lost: 0 bytes in 0 blocks
==64949==    still reachable: 461,804 bytes in 550 blocks
==64949==         suppressed: 0 bytes in 0 blocks
==64949== Rerun with --leak-check=full to see details of leaked memory
==64949== 
==64949== For lists of detected and suppressed errors, rerun with: -s
==64949== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
[1]    64949 illegal hardware instruction (core dumped)  valgrind ./ctests/all_native_events


However I suspect this is related to the stealtime bug:
https://bugzilla.redhat.com/show_bug.cgi?id=2007882 and https://bugzilla.redhat.com/show_bug.cgi?id=2007877
which appears to be a problem with the initialization order (try to access an array that got allocated with size 0)

Comment 13 William Cohen 2021-11-19 15:43:49 UTC
Backported following patch that addresses the problem

commit 3625bdbad9fd57d1cdb1e5615854545167d4adcb
Author: Anthony Castaldo <TonyCastaldo.edu>
Date:   Wed Aug 26 17:18:29 2020 -0400

    This modifies PAPI_library_init() to initialize components in two classes,
    separated by the initialization of the papi thread structure.  The first class
    is those that need no thread structure, currently everything but perf_event and
    perf_event_uncore. Following the init of the threading structure, we init the
    second class (perf_event and perf_event_uncore) that DOES need the thread
    structure to successfully init_component().  This required a change to
    _papi_hwi_init_global(), to add an argument to distinguish which class it
    should initialize.

Comment 14 Fedora Update System 2021-11-20 02:05:40 UTC
FEDORA-2021-752e807fdd has been pushed to the Fedora 34 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-752e807fdd`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-752e807fdd

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 15 Fedora Update System 2021-11-28 01:09:32 UTC
FEDORA-2021-752e807fdd has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.