Description of problem:
This feature request is to enhance qemu to present virtual L3 cache
info for vcpus.
This feature is available in upstream qemu as:
This feature is required to pass SAP/HANA performance acceptance
Version-Release number of selected component (if applicable):
Upstream qemu 2.8.
This feature is needed in qemu-kvm-rhev for RHEL7.4 and 7.3.z.
Very reproducible with the SAP HANA performance tests.
Steps to Reproduce:
1. Please see the performance team (Dave Dumas, Joe Mario)
Without this feature, SAP HANA performance tests fail to meet acceptance criteria.
SAP HANA performance tests need to meet acceptance criteria.
This L3 fix is confusing, since many incorrectly assume it implies without it that KVM guests will not use the host cpu's L3 cache.
Here's a little clarification from the SAP HANA that we did with Dave Dumas and all.
Without this L3 fix, running lscpu in a guest did not show any L3. With this fix it does.
We ran SAP in a guest where qemu was backed by the host's default 4K pages, by the host's 2-meg hugepages, and by the host's 1-gig hugepages.
This L3 patch caused no change in performance when qemu was backed by the host's 4k or 2-meg hugepages. It was when we booted the guest where qemu was backed by the host's 1-gig hugepages that we saw a performance increase.
Why this patch only showed a performance increase when the guest qemu was backed by 1-gig hugepages (and not with 4k or 2-meg pages) was related to reduced TLB misses. With this L3 cache and 1-gig hugepages, the KVM TLB-miss handling code can better see that two cpus share the same L3 and reduce the cost of handling the TLB-miss.
If anyone says they need this L3 patch because they want their guests to be able to use the host cpu's L3 cache for better performance, that's mistaken. Their guests are already using L3 even though lscpu doesn't show it.
I forgot to answer this part of your question:
> Do we have numbers on what the performance was versus the "passthrough"
> L3 cache that was presented in older versions of RHEL (7.2)?
We have no numbers. Even though we experimented with both "passthrough" and "cpu-exact" on both 7.2 and 7.3, (where passthrough outperformed cpu-exact), we saw no significant performance gain between the two.
Understand were also experimenting with our xml files in those early days, but again L3 only helped us when we got that L3 patch and booted the guest backed by 1-gig hugepages.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.