Bug 1428534

Summary: Enhance qemu to present virtual L3 cache info for vcpus
Product: Red Hat Enterprise Linux 7 Reporter: Hai Huang <hhuang>
Component: qemu-kvm-rhevAssignee: Bandan Das <bdas>
Status: CLOSED ERRATA QA Contact: Guo, Zhiyi <zhguo>
Severity: urgent Docs Contact:
Priority: high    
Version: 7.4CC: bdas, chayang, coli, djdumas, hhuang, jinzhao, jmario, juzhang, knoel, michen, mrezanin, mtessun, pbonzini, rkrcmar, salmy, sgordon, snagar, virt-maint, zhguo
Target Milestone: rcKeywords: ZStream
Target Release: 7.4   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.8.0-2.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1428952 1430802 (view as bug list) Environment:
Last Closed: 2017-08-02 03:39:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1428952, 1430802, 1434537    

Description Hai Huang 2017-03-02 18:56:00 UTC
Description of problem:
This feature request is to enhance qemu to present virtual L3 cache 
info for vcpus.

This feature is available in upstream qemu as:
https://patchwork.kernel.org/patch/9308401/

This feature is required to pass SAP/HANA performance acceptance 
test.


Version-Release number of selected component (if applicable):
Upstream qemu 2.8.
This feature is needed in qemu-kvm-rhev for RHEL7.4 and 7.3.z.


How reproducible:
Very reproducible with the SAP HANA performance tests.


Steps to Reproduce:
1. Please see the performance team (Dave Dumas, Joe Mario)
2.
3.

Actual results:
Without this feature, SAP HANA performance tests fail to meet acceptance criteria.

Expected results:
SAP HANA performance tests need to meet acceptance criteria.

Additional info:

Comment 17 Joe Mario 2017-07-06 19:17:31 UTC
Hi Steve:
This L3 fix is confusing, since many incorrectly assume it implies without it that KVM guests will not use the host cpu's L3 cache.

Here's a little clarification from the SAP HANA that we did with Dave Dumas and all.

Without this L3 fix, running lscpu in a guest did not show any L3.  With this fix it does. 

We ran SAP in a guest where qemu was backed by the host's default 4K pages, by the host's 2-meg hugepages, and by the host's 1-gig hugepages.

This L3 patch caused no change in performance when qemu was backed by the host's 4k or 2-meg hugepages.  It was when we booted the guest where qemu was backed by the host's 1-gig hugepages that we saw a performance increase.  

Why this patch only showed a performance increase when the guest qemu was backed by 1-gig hugepages (and not with 4k or 2-meg pages) was related to reduced TLB misses.    With this L3 cache and 1-gig hugepages, the KVM TLB-miss handling code can better see that two cpus share the same L3 and reduce the cost of handling the TLB-miss.

If anyone says they need this L3 patch because they want their guests to be able to use the host cpu's L3 cache for better performance, that's mistaken.  Their guests are already using L3 even though lscpu doesn't show it.

Joe

Comment 18 Joe Mario 2017-07-06 19:34:04 UTC
Steve:
 I forgot to answer this part of your question:

 > Do we have numbers on what the performance was versus the "passthrough" 
 > L3 cache that was presented in older versions of RHEL (7.2)?

We have no numbers.  Even though we experimented with both "passthrough" and "cpu-exact" on both 7.2 and 7.3, (where passthrough outperformed cpu-exact), we saw no significant performance gain between the two.  

Understand were also experimenting with our xml files in those early days, but again L3 only helped us when we got that L3 patch and booted the guest backed by 1-gig hugepages.

Comment 20 errata-xmlrpc 2017-08-02 03:39:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392