Bug 2213907

Summary: glibc: Memcpy throughput lower on RH9.3 compared to RHEL 8.3/RHEL 7.5 - same Skylake hardware
Product: Red Hat Enterprise Linux 9 Reporter: Carlos O'Donell <codonell>
Component: glibcAssignee: DJ Delorie <dj>
Status: CLOSED ERRATA QA Contact: Sergey Kolosov <skolosov>
Severity: high Docs Contact: Jacob Taylor Valdez <jvaldez>
Priority: unspecified    
Version: 9.3CC: ashankar, barend.havenga, bgray, codonell, dj, fweimer, jmario, jvaldez, mcoufal, pfrankli, qe-baseos-tools-bugs, sipoyare, skolosov
Target Milestone: rcKeywords: Regression, Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glibc-2.34-82.el9 Doc Type: Enhancement
Doc Text:
.Improved string and memory routine performance on Intel® Xeon® v5-based hardware in `glibc` Previously, the default amount of cache used by `glibc` for string and memory routines resulted in lower than expected performance on Intel® Xeon® v5-based systems. With this update, the amount of cache to use has been tuned to improve performance.
Story Points: ---
Clone Of: 2180462 Environment:
Last Closed: 2023-11-07 08:37:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2180462    
Bug Blocks: 2166710    

Comment 1 Carlos O'Donell 2023-06-09 20:43:13 UTC
In RHEL 9 we should review the amount of L3 used for in-flight memory copies and adjust based on upstream discussions with Intel.

The same issue for RHEL 8 is this one:
https://bugzilla.redhat.com/show_bug.cgi?id=2180462

Comment 2 Florian Weimer 2023-06-13 09:03:44 UTC
In particular, this should include a backport of this commit to benefit TDX environments as they exist today:

commit ed2f9dc9420c4c61436328778a70459d0a35556a
Author: Noah Goldstein <goldstein.w.n>
Date:   Mon May 8 22:10:20 2023 -0500

    x86: Use 64MB as nt-store threshold if no cacheinfo [BZ #30429]
    
    If `non_temporal_threshold` is below `minimum_non_temporal_threshold`,
    it almost certainly means we failed to read the systems cache info.
    
    In this case, rather than defaulting the minimum correct value, we
    should default to a value that gets at least reasonable
    performance. 64MB is chosen conservatively to be at the very high
    end. This should never cause non-temporal stores when, if we had read
    cache info, we wouldn't have otherwise.
    Reviewed-by: Florian Weimer <fweimer>

Comment 15 errata-xmlrpc 2023-11-07 08:37:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glibc bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6582