Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be unavailable on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1412952 - [RHEL7][RFE] Support of the Intel Xeon PHI KNL in papi
Summary: [RHEL7][RFE] Support of the Intel Xeon PHI KNL in papi
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: papi
Version: 7.3
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 7.4
Assignee: William Cohen
QA Contact: Michael Petlan
URL:
Whiteboard:
Depends On: 1412950
Blocks: 1420851 1446211
TreeView+ depends on / blocked
 
Reported: 2017-01-13 08:56 UTC by Renaud Marigny
Modified: 2020-06-11 13:11 UTC (History)
6 users (show)

Fixed In Version: papi-5.2.0-23.el7
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-01 12:40:02 UTC
Target Upstream Version:


Attachments (Terms of Use)
Backport of patch to avoid tying search to build libpfm list of pmus (5.97 KB, patch)
2017-06-16 03:26 UTC, William Cohen
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2190 0 normal SHIPPED_LIVE libpfm and papi bug fix and enhancement update 2017-08-01 16:08:51 UTC

Comment 2 William Cohen 2017-01-13 15:49:44 UTC
It looks like newer RHEL 7 kernels have included identification and pmu events for Intel KNL.  The remaining question is whether people at RH have access to KNL machine to verify functionality of the papi and libpfm support.

Comment 5 William Cohen 2017-05-30 16:00:14 UTC
 papi that was built with the older libpfm-4.7.0-1 is not finding the newer knl events even when using the newer shared libraries from libpfm-4.7.0-4 that have the events:

Ah here is where things seem to go wrong in papi_avail. Using a locally built papi rpm built with "--with-debug=yes" in the configure.  Get different results for the following command. 

export PAPI_DEBUG=ALL
papi_avail >& papi.log

For the the papi built with libpfm-4.7.0-1 installed on the machine:

SUBSTRATE:papi_libpfm4_events.c:_papi_libpfm4_init:118:71093 SUBSTRATE:components/appio/appio.c:fwrite:389:71093 appio: intercepted fwrite\
(0x7fec5eeef9d4,1,18,0x7fec5ec6a1c0)
pfm_get_version()
SUBSTRATE:components/perf_event/pe_libpfm4_events.c:_pe_libpfm4_init:1374:71093 SUBSTRATE:components/appio/appio.c:fwrite:389:71093 appio:\
 intercepted fwrite(0x7fec5ef0edba,1,15,0x7fec5ec6a1c0)
Detected pmus:
SUBSTRATE:components/perf_event/pe_libpfm4_events.c:_pe_libpfm4_init:1383:71093         18 ix86arch Intel X86 architectural PMU 1
SUBSTRATE:components/perf_event/pe_libpfm4_events.c:_pe_libpfm4_init:1383:71093         51 perf perf_events generic PMU 3
SUBSTRATE:components/perf_event/pe_libpfm4_events.c:_pe_libpfm4_init:1383:71093         114 perf_raw perf_events raw PMU 3
SUBSTRATE:components/perf_event/pe_libpfm4_events.c:_pe_libpfm4_init:1406:71093 153 native events detected on 3 pmus
SUBSTRATE:components/perf_event/pe_libpfm4_events.c:_pe_libpfm4_init:1409:71093 SUBSTRATE:components/appio/appio.c:fwrite:389:71093 appio:\
 intercepted fwrite(0x7fec5ef0ee1e,1,27,0x7fec5ec6a1c0)
Could not find default PMU

For the papi built with libpfm-4.7.0-4 see:

SUBSTRATE:papi_libpfm4_events.c:_papi_libpfm4_init:118:57525 SUBSTRATE:components/appio/appio.c:fwrite:389:57525 appio: intercepted fwrite\
(0x7f686180f9d4,1,18,0x7f686158a1c0)
pfm_get_version()
SUBSTRATE:components/perf_event/pe_libpfm4_events.c:_pe_libpfm4_init:1374:57525 SUBSTRATE:components/appio/appio.c:fwrite:389:57525 appio:\
 intercepted fwrite(0x7f686182edba,1,15,0x7f686158a1c0)
Detected pmus:
SUBSTRATE:components/perf_event/pe_libpfm4_events.c:_pe_libpfm4_init:1383:57525         18 ix86arch Intel X86 architectural PMU 1
SUBSTRATE:components/perf_event/pe_libpfm4_events.c:_pe_libpfm4_init:1383:57525         51 perf perf_events generic PMU 3
SUBSTRATE:components/perf_event/pe_libpfm4_events.c:_pe_libpfm4_init:1383:57525         114 perf_raw perf_events raw PMU 3
SUBSTRATE:components/perf_event/pe_libpfm4_events.c:_pe_libpfm4_init:1383:57525         203 knl Intel Knights Landing 1
SUBSTRATE:components/perf_event/pe_libpfm4_events.c:_pe_libpfm4_init:1394:57525           knl is default
SUBSTRATE:components/perf_event/pe_libpfm4_events.c:_pe_libpfm4_init:1406:57525 184 native events detected on 4 pmus
SUBSTRATE:components/perf_event/pe_libpfm4_events.c:_pe_libpfm4_init:1423:57525 num_counters: 5
SUBSTRATE:papi_preset.c:_papi_load_preset_table:318:57525 SUBSTRATE:components/appio/appio.c:fwrite:389:57525 appio: intercepted fwrite(0x\
7f686182c22b,1,6,0x7f686158a1c0)
ENTER

Comment 6 Michael Petlan 2017-05-30 16:21:00 UTC
Yep. And that's why it does not show any events when testing on KNL. A respin is probably unavoidable.

Comment 7 William Cohen 2017-05-30 16:40:50 UTC
The problem is cause by use of the PFM_PMU_MAX enum in papi's pe_libpfm4_event.c initialization loop:

   for(i=0;i<PFM_PMU_MAX;i++) {
   ...
   }

The older libpfm-4.7.0-1 that papi was built with has a smaller value for PFM_PMU_MAX than the libpfm-4.7.0-4.  Thus the pmu entries added in the newer libpfm were not scanned by papi.

There should be a better way of doing this in libpfm.  As new pmus are added to libpfm the other code should not need to be recompiled because of a change li PFM_PMU_MAX.

Comment 8 William Cohen 2017-05-30 21:31:42 UTC
Posted patch in progress for upstream at

https://groups.google.com/a/icl.utk.edu/forum/#!topic/ptools-perfapi/Dx3xjwxplWo

Comment 9 Michael Petlan 2017-06-09 11:21:15 UTC
How is this looking so far? Has the patch a chance to get upstream? Or should we just rebuild papi once more against the proper libpfm and close this bug?

Comment 10 William Cohen 2017-06-09 15:25:35 UTC
Vince Weaver expected that something like the proposed patch would be put in papi.  He did ask some questions on how to test, but I haven't heard back.  I pinged him on the mailing list.

Worst case we do a rebuild with the newest version of libpfm in rhel7 next week.  However, I would prefer to get the patch in so that we can avoid future failures because papi was built with an older version of libpfm.  That dependency on PFM_PMU_MAX really should go away and make builds less problematic because I forgot to do a buildroot override.

Comment 11 William Cohen 2017-06-16 03:21:18 UTC
A patch has been included in the upstream papi to address the problem found with the Intel KNL testing where an older version of libpfm used during the build of papi prevented the discovery of the Intel KNL perf hardware.  papi would only examine up to the last pmu enum in the libpfm used during the build rather than whatever the last pmu enum is in the currently installed libpfm. There has been a patch incorporated into the upstream papi to address this problem:

https://bitbucket.org/icl/papi/commits/ba786c0c040c9caea5e9aa2aaed60cfcd1e57e3a

The patch has adapted into a backport for the papi-5.2.0 in rhel7.  However, need to have an exception flag set to allow checking in the patch and rebuilding the papi package so papi intel knl support works.

Comment 12 William Cohen 2017-06-16 03:26:34 UTC
Created attachment 1288239 [details]
Backport of patch to avoid tying search to build libpfm list of pmus

This is an adaptation of the upstream patch to avoid the use of PFM_PMU_MAX from the libpfm headers used at build time.  This allows checking of any additional pmus that might not have been in the libpfm used for the build.

Comment 17 William Cohen 2017-06-19 13:43:25 UTC
According to mpetlan the scratch build addressed the problem. Respun official papi-5.2.0-23.el7 build with the patch to address the problem.

Comment 19 Michael Petlan 2017-06-20 12:56:19 UTC
Tested papi-5.2.0-23.el7 build and this one works great on Knights Landing! Thanks for fixing!

VERIFIED

Comment 20 errata-xmlrpc 2017-08-01 12:40:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2190


Note You need to log in before you can comment on or make changes to this bug.