Bug 2213907 - glibc: Memcpy throughput lower on RH9.3 compared to RHEL 8.3/RHEL 7.5 - same Skylake hardware
Summary: glibc: Memcpy throughput lower on RH9.3 compared to RHEL 8.3/RHEL 7.5 - same ...
Keywords:
Status: MODIFIED
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: glibc
Version: 9.3
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: DJ Delorie
QA Contact: Sergey Kolosov
URL:
Whiteboard:
Depends On: 2180462
Blocks: 2166710
TreeView+ depends on / blocked
 
Reported: 2023-06-09 20:42 UTC by Carlos O'Donell
Modified: 2023-08-16 13:15 UTC (History)
11 users (show)

Fixed In Version: glibc-2.34-82.el9
Doc Type: Enhancement
Doc Text:
Feature: Improved string and memory routine performance on Intel Skylake-based hardware. Reason: The default amount of cache to use for string and memory routine performance is a balance between single process and whole system performance. It was found that on Intel Skylake-based systems the tuning could result in lower than expected performance. The default amount of cache to use for string and memory routines was reviewed against industry standard benchmarks. Result: the default amount of cache to use has been increased to improve performance.
Clone Of: 2180462
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-159421 0 None None None 2023-06-09 20:43:18 UTC

Comment 1 Carlos O'Donell 2023-06-09 20:43:13 UTC
In RHEL 9 we should review the amount of L3 used for in-flight memory copies and adjust based on upstream discussions with Intel.

The same issue for RHEL 8 is this one:
https://bugzilla.redhat.com/show_bug.cgi?id=2180462

Comment 2 Florian Weimer 2023-06-13 09:03:44 UTC
In particular, this should include a backport of this commit to benefit TDX environments as they exist today:

commit ed2f9dc9420c4c61436328778a70459d0a35556a
Author: Noah Goldstein <goldstein.w.n>
Date:   Mon May 8 22:10:20 2023 -0500

    x86: Use 64MB as nt-store threshold if no cacheinfo [BZ #30429]
    
    If `non_temporal_threshold` is below `minimum_non_temporal_threshold`,
    it almost certainly means we failed to read the systems cache info.
    
    In this case, rather than defaulting the minimum correct value, we
    should default to a value that gets at least reasonable
    performance. 64MB is chosen conservatively to be at the very high
    end. This should never cause non-temporal stores when, if we had read
    cache info, we wouldn't have otherwise.
    Reviewed-by: Florian Weimer <fweimer>


Note You need to log in before you can comment on or make changes to this bug.