Bug 1474999

Summary: [LLNL 7.5 FEAT] Access to Broadwell Uncore Counters
Product: Red Hat Enterprise Linux 7 Reporter: Trent D'Hooge <tdhooge>
Component: libpfmAssignee: William Cohen <wcohen>
Status: CLOSED ERRATA QA Contact: Michael Petlan <mpetlan>
Severity: high Docs Contact: Vladimír Slávik <vslavik>
Priority: high    
Version: 7.3CC: achen35, bgollahe, brolley, fche, lberk, mbenitez, mcermak, mgoodwin, nathans, tgummels, vslavik, wcohen, woodard
Target Milestone: rcKeywords: FutureFeature
Target Release: 7.5   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: libpfm-4.7.0-6.el7 Doc Type: Enhancement
Doc Text:
Support for Intel Xeon v4 uncore performance events in *libpfm*, *pcp*, and *papi* This update adds support for Intel Xeon v4 uncore performance events to the *libpfm* performance monitoring library, the *pcp* tool, and the *papi* interface.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 14:25:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1400611, 1461180, 1522983    

Description Trent D'Hooge 2017-07-25 19:58:27 UTC
Description of problem:

pcp-3.11.3-4 does not provide Broadwell server uncore support

Version-Release number of selected component (if applicable):


How reproducible:

always


Steps to Reproduce:

access to broadwell node

Actual results:

pminfo | grep perf
perfevent.version
perfevent.active
perfevent.hwcounters
perfevent.derived

Expected results:

 pminfo | grep perf
perfevent.version
perfevent.active
perfevent.hwcounters.bdx_unc_imc0__UNC_M_CAS_COUNT_RD.dutycycle
perfevent.hwcounters.bdx_unc_imc0__UNC_M_CAS_COUNT_RD.value
perfevent.derived.active

Additional info:

we need to have the updates to libpfm4 in commit 488227 "add Intel Broadwell server uncore PMUs support".

Comment 2 Nathan Scott 2017-07-25 23:04:53 UTC
No PCP changes are required here - it'll need a patch/rebase to libpfm AIUI.
Details of the perfmon2/libpfm4 commit mentioned earlier...


commit 488227bf2128e8b80f9b7573869fe33fcbd63342
Author: Stephane Eranian <eranian>
Date:   Fri Jun 2 12:09:31 2017 -0700

    add Intel Broadwell server uncore PMUs support
    
    This patch adds Intel Broadwell Server (model 79, 86) uncore PMU
    support.  It adds the following PMUs:
    
    - IMC
    - CBOX
    - HA
    - UBOX
    - SBOX
    - IRP
    - PCU
    - QPI
    - R2PCIE
    - R3QPI
    
    Based on Broadwell Server Uncore Performance Monitoring Reference Manual
    available here:
        http://www.intel.com/content/www/us/en/processors/xeon/xeon-e5-e7-v4-uncore-performance-monitoring.html
    
    Signed-off-by: Stephane Eranian <eranian>

Comment 3 Ben Woodard 2017-07-27 23:47:39 UTC
BTW you may want to start working on Skylake EP or Purley or Intel Xeon Platinum, Gold, Silver or whatever they they are calling it these days. They are going to need this too.

Comment 4 William Cohen 2017-09-25 19:16:10 UTC
The Intel Broadwell server uncore PMUs support applies cleanly.  However, need the rhel-7.5 (and related) flag(s) set.

Comment 5 William Cohen 2017-09-27 20:25:33 UTC
The Broadwell uncore patches are in libpfm-4.7.0-6.el7

Comment 8 William Cohen 2017-12-11 19:14:30 UTC
Yes the release notes should mention the addition of the broadwell uncore libpfm support in the release notes.  Developers doing performance analysis using pcp, libpfm, or papi would be like to know if they can access broadwell uncore performance events.

Comment 10 William Cohen 2017-12-12 16:41:36 UTC
This release note info looks reasonable.

Comment 11 Michael Petlan 2018-01-10 19:01:41 UTC
Tested with libpfm-4.7.0-9.el7.x86_64. It knows bdw_unc* PMUs, compared to libpfm-4.7.0-4.el7.x86_64 which didn't know them.

# showevtinfo | grep -e bdw -e bdx
	[155, bdw, "Intel Broadwell"]
	[201, bdw_ep, "Intel Broadwell EP"]
	[269, bdx_unc_cbo0, "Intel BroadwellX C-Box 0 uncore"]
	[270, bdx_unc_cbo1, "Intel BroadwellX C-Box 1 uncore"]
	[271, bdx_unc_cbo2, "Intel BroadwellX C-Box 2 uncore"]
	[272, bdx_unc_cbo3, "Intel BroadwellX C-Box 3 uncore"]
	[273, bdx_unc_cbo4, "Intel BroadwellX C-Box 4 uncore"]
	[274, bdx_unc_cbo5, "Intel BroadwellX C-Box 5 uncore"]
	[275, bdx_unc_cbo6, "Intel BroadwellX C-Box 6 uncore"]
	[276, bdx_unc_cbo7, "Intel BroadwellX C-Box 7 uncore"]
	[277, bdx_unc_cbo8, "Intel BroadwellX C-Box 8 uncore"]
	[278, bdx_unc_cbo9, "Intel BroadwellX C-Box 9 uncore"]
	[279, bdx_unc_cbo10, "Intel BroadwellX C-Box 10 uncore"]
	[280, bdx_unc_cbo11, "Intel BroadwellX C-Box 11 uncore"]
	[281, bdx_unc_cbo12, "Intel BroadwellX C-Box 12 uncore"]
	[282, bdx_unc_cbo13, "Intel BroadwellX C-Box 13 uncore"]
	[283, bdx_unc_cbo14, "Intel BroadwellX C-Box 14 uncore"]
	[284, bdx_unc_cbo15, "Intel BroadwellX C-Box 15 uncore"]
	[285, bdx_unc_cbo16, "Intel BroadwellX C-Box 16 uncore"]
	[286, bdx_unc_cbo17, "Intel BroadwellX C-Box 17 uncore"]
	[287, bdx_unc_cbo18, "Intel BroadwellX C-Box 18 uncore"]
	[288, bdx_unc_cbo19, "Intel BroadwellX C-Box 19 uncore"]
	[289, bdx_unc_cbo20, "Intel BroadwellX C-Box 20 uncore"]
	[290, bdx_unc_cbo21, "Intel BroadwellX C-Box 21 uncore"]
	[291, bdx_unc_cbo22, "Intel BroadwellX C-Box 22 uncore"]
	[292, bdx_unc_cbo23, "Intel BroadwellX C-Box 23 uncore"]
	[293, bdx_unc_ha0, "Intel BroadwellX HA 0 uncore"]
	[294, bdx_unc_ha1, "Intel BroadwellX HA 1 uncore"]
	[295, bdx_unc_imc0, "Intel BroadwellX IMC0 uncore"]
	[296, bdx_unc_imc1, "Intel BroadwellX IMC1 uncore"]
	[297, bdx_unc_imc2, "Intel BroadwellX IMC2 uncore"]
	[298, bdx_unc_imc3, "Intel BroadwellX IMC3 uncore"]
	[299, bdx_unc_imc4, "Intel BroadwellX IMC4 uncore"]
	[300, bdx_unc_imc5, "Intel BroadwellX IMC5 uncore"]
	[301, bdx_unc_imc6, "Intel BroadwellX IMC6 uncore"]
	[302, bdx_unc_imc7, "Intel BroadwellX IMC7 uncore"]
	[303, bdx_unc_pcu, "Intel BroadwellX PCU uncore"]
	[304, bdx_unc_qpi0, "Intel BroadwellX QPI0 uncore"]
	[305, bdx_unc_qpi1, "Intel BroadwellX QPI1 uncore"]
	[306, bdx_unc_qpi2, "Intel BroadwellX QPI2 uncore"]
	[307, bdx_unc_ubo, "Intel BroadwellX U-Box uncore"]
	[308, bdx_unc_r2pcie, "Intel BroadwellX R2PCIe uncore"]
	[309, bdx_unc_r3qpi0, "Intel BroadwellX R3QPI0 uncore"]
	[310, bdx_unc_r3qpi1, "Intel BroadwellX R3QPI1 uncore"]
	[311, bdx_unc_r3qpi2, "Intel BroadwellX R3QPI2 uncore"]
	[312, bdx_unc_irp, "Intel BroadwellX IRP uncore"]
	[313, bdx_unc_sbo0, "Intel BroadwellX S-BOX0 uncore"]
	[314, bdx_unc_sbo1, "Intel BroadwellX S-BOX1 uncore"]
	[315, bdx_unc_sbo2, "Intel BroadwellX S-BOX2 uncore"]
	[316, bdx_unc_sbo3, "Intel BroadwellX S-BOX3 uncore"]

However, I don't know how is this used/supported in pcp. When I run `pminfo | grep perf`, I don't see any perfevent.* entries.

Comment 12 William Cohen 2018-01-10 20:01:07 UTC
Right now the configuration file/var/lib/pcp/pmdas/perfevent/perfevent.conf is going to need to specify the events for each particular processor.  This looks to be pretty duplicative of the information elsewhere.  It would be nice to have pcp extract that information from "perf list" rather than having another encoding of it.

Comment 13 Frank Ch. Eigler 2018-01-10 20:40:10 UTC
> It would be nice to have pcp extract that information

FWIW the pcp papi.* pmda does extract the information dynamically, except at the PAPI layer rather than perf, and am not sure how well the uncore counters come through.

Comment 14 Michael Petlan 2018-01-12 17:04:49 UTC
TESTING ON A BROADWELL-EP machine:

# uname -r ; rpmquery pcp libpfm papi
3.10.0-826.el7.x86_64
pcp-3.11.8-7.el7.x86_64
libpfm-4.7.0-4.el7.x86_64
papi-5.2.0-23.el7.x86_64


# pminfo | grep perf
perfevent.version
perfevent.active
perfevent.hwcounters.software.task_clock.dutycycle
perfevent.hwcounters.software.task_clock.value
perfevent.hwcounters.software.page_faults.dutycycle
perfevent.hwcounters.software.page_faults.value
perfevent.hwcounters.software.minor_faults.dutycycle
perfevent.hwcounters.software.minor_faults.value
perfevent.hwcounters.software.major_faults.dutycycle
perfevent.hwcounters.software.major_faults.value
perfevent.hwcounters.software.emulation_faults.dutycycle
perfevent.hwcounters.software.emulation_faults.value
perfevent.hwcounters.software.cpu_migrations.dutycycle
perfevent.hwcounters.software.cpu_migrations.value
perfevent.hwcounters.software.cpu_clock.dutycycle
perfevent.hwcounters.software.cpu_clock.value
perfevent.hwcounters.software.context_switches.dutycycle
perfevent.hwcounters.software.context_switches.value
perfevent.hwcounters.software.alignment_faults.dutycycle
perfevent.hwcounters.software.alignment_faults.value
perfevent.hwcounters.uncore_imc_4.clockticks.dutycycle
perfevent.hwcounters.uncore_imc_4.clockticks.value
perfevent.hwcounters.uncore_imc_4.cas_count_read.dutycycle
perfevent.hwcounters.uncore_imc_4.cas_count_read.value
perfevent.hwcounters.uncore_imc_4.cas_count_write.dutycycle
perfevent.hwcounters.uncore_imc_4.cas_count_write.value
perfevent.hwcounters.uncore_imc_3.clockticks.dutycycle
perfevent.hwcounters.uncore_imc_3.clockticks.value
perfevent.hwcounters.uncore_imc_3.cas_count_read.dutycycle
perfevent.hwcounters.uncore_imc_3.cas_count_read.value
perfevent.hwcounters.uncore_imc_3.cas_count_write.dutycycle
perfevent.hwcounters.uncore_imc_3.cas_count_write.value
perfevent.hwcounters.uncore_imc_2.clockticks.dutycycle
perfevent.hwcounters.uncore_imc_2.clockticks.value
perfevent.hwcounters.uncore_imc_2.cas_count_read.dutycycle
perfevent.hwcounters.uncore_imc_2.cas_count_read.value
perfevent.hwcounters.uncore_imc_2.cas_count_write.dutycycle
perfevent.hwcounters.uncore_imc_2.cas_count_write.value
perfevent.hwcounters.uncore_imc_1.clockticks.dutycycle
perfevent.hwcounters.uncore_imc_1.clockticks.value
perfevent.hwcounters.uncore_imc_1.cas_count_read.dutycycle
perfevent.hwcounters.uncore_imc_1.cas_count_read.value
perfevent.hwcounters.uncore_imc_1.cas_count_write.dutycycle
perfevent.hwcounters.uncore_imc_1.cas_count_write.value
perfevent.hwcounters.uncore_imc_0.clockticks.dutycycle
perfevent.hwcounters.uncore_imc_0.clockticks.value
perfevent.hwcounters.uncore_imc_0.cas_count_read.dutycycle
perfevent.hwcounters.uncore_imc_0.cas_count_read.value
perfevent.hwcounters.uncore_imc_0.cas_count_write.dutycycle
perfevent.hwcounters.uncore_imc_0.cas_count_write.value
perfevent.hwcounters.power.energy_cores.dutycycle
perfevent.hwcounters.power.energy_cores.value
perfevent.hwcounters.power.energy_ram.dutycycle
perfevent.hwcounters.power.energy_ram.value
perfevent.hwcounters.power.energy_pkg.dutycycle
perfevent.hwcounters.power.energy_pkg.value
perfevent.hwcounters.msr.mperf.dutycycle
perfevent.hwcounters.msr.mperf.value
perfevent.hwcounters.msr.aperf.dutycycle
perfevent.hwcounters.msr.aperf.value
perfevent.hwcounters.msr.tsc.dutycycle
perfevent.hwcounters.msr.tsc.value
perfevent.hwcounters.msr.smi.dutycycle
perfevent.hwcounters.msr.smi.value
perfevent.hwcounters.cpu.cpu_cycles.dutycycle
perfevent.hwcounters.cpu.cpu_cycles.value
perfevent.hwcounters.cpu.tx_commit.dutycycle
perfevent.hwcounters.cpu.tx_commit.value
perfevent.hwcounters.cpu.instructions.dutycycle
perfevent.hwcounters.cpu.instructions.value
perfevent.hwcounters.cpu.cycles_t.dutycycle
perfevent.hwcounters.cpu.cycles_t.value
perfevent.hwcounters.cpu.cycles_ct.dutycycle
perfevent.hwcounters.cpu.cycles_ct.value
perfevent.hwcounters.cpu.branch_instructions.dutycycle
perfevent.hwcounters.cpu.branch_instructions.value
perfevent.hwcounters.cpu.mem_stores.dutycycle
perfevent.hwcounters.cpu.mem_stores.value
perfevent.hwcounters.cpu.cache_misses.dutycycle
perfevent.hwcounters.cpu.cache_misses.value
perfevent.hwcounters.cpu.ref_cycles.dutycycle
perfevent.hwcounters.cpu.ref_cycles.value
perfevent.hwcounters.cpu.bus_cycles.dutycycle
perfevent.hwcounters.cpu.bus_cycles.value
perfevent.hwcounters.cpu.el_start.dutycycle
perfevent.hwcounters.cpu.el_start.value
perfevent.hwcounters.cpu.el_abort.dutycycle
perfevent.hwcounters.cpu.el_abort.value
perfevent.hwcounters.cpu.cache_references.dutycycle
perfevent.hwcounters.cpu.cache_references.value
perfevent.hwcounters.cpu.el_conflict.dutycycle
perfevent.hwcounters.cpu.el_conflict.value
perfevent.hwcounters.cpu.el_capacity.dutycycle
perfevent.hwcounters.cpu.el_capacity.value
perfevent.hwcounters.cpu.tx_conflict.dutycycle
perfevent.hwcounters.cpu.tx_conflict.value
perfevent.hwcounters.cpu.tx_capacity.dutycycle
perfevent.hwcounters.cpu.tx_capacity.value
perfevent.hwcounters.cpu.tx_start.dutycycle
perfevent.hwcounters.cpu.tx_start.value
perfevent.hwcounters.cpu.tx_abort.dutycycle
perfevent.hwcounters.cpu.tx_abort.value
perfevent.hwcounters.cpu.el_commit.dutycycle
perfevent.hwcounters.cpu.el_commit.value
perfevent.hwcounters.cpu.mem_loads.dutycycle
perfevent.hwcounters.cpu.mem_loads.value
perfevent.hwcounters.cpu.branch_misses.dutycycle
perfevent.hwcounters.cpu.branch_misses.value
perfevent.hwcounters.UNHALTED_CORE_CYCLES.dutycycle
perfevent.hwcounters.UNHALTED_CORE_CYCLES.value
perfevent.hwcounters.INSTRUCTION_RETIRED.dutycycle
perfevent.hwcounters.INSTRUCTION_RETIRED.value
perfevent.hwcounters.UNHALTED_REFERENCE_CYCLES.dutycycle
perfevent.hwcounters.UNHALTED_REFERENCE_CYCLES.value
perfevent.hwcounters.LLC_MISSES.dutycycle
perfevent.hwcounters.LLC_MISSES.value
perfevent.derived.active

Comment 15 Michael Petlan 2018-01-15 12:52:01 UTC
So from my point I think that Broadwell Uncore counters are accessible to pcp now. VERIFIED.

Comment 18 errata-xmlrpc 2018-04-10 14:25:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0812