Bug 1507027 - [ESXi][RHEL7.6]x86/vmware: Add paravirt sched clock
Summary: [ESXi][RHEL7.6]x86/vmware: Add paravirt sched clock
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel   
(Show other bugs)
Version: 7.6
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Vitaly Kuznetsov
QA Contact: ldu
Jiri Herrmann
URL:
Whiteboard:
Keywords: FutureFeature
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-27 12:40 UTC by Daniele
Modified: 2018-10-30 08:21 UTC (History)
11 users (show)

Fixed In Version: kernel-3.10.0-883.el7
Doc Type: Release Note
Doc Text:
Paravirtualized clock added to Red Hat Enterprise Linux VMs With this update, the paravirtualized `sched_clock()` function has been integrated in the Red Hat Enterprise Linux kernel. This improves the performance of Red Hat Enterprise Linux virtual machines (VMs) running on VMWare hypervisors. Note that the function is enabled by default. To disable it, add the "no-vmw-sched-clock" option to the kernel command line.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-10-30 08:19:58 UTC
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:3083 None None None 2018-10-30 08:21 UTC

Description Daniele 2017-10-27 12:40:24 UTC
TAM Customer Motorola is asking to port some upstream patches to the RHEL Kernel to increase compatibility with VMWare and avoid problems in the future.

This one specifically is a port request for:
https://patchwork.kernel.org/patch/9404895/

Comment 6 ldu 2018-04-16 02:59:25 UTC
Hi Vitaly,
I try the LKP on RHEL, but RHEL is not a supported system for LKP.
I run another performance tool unixbench on the guest.
The test result show this update brought some performance improvement.
Below is the detail test result:

Without the sched patch kernel:
=======================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: bootp-73-199-91.lab.eng.pek2.redhat.com: GNU/Linux
   OS: GNU/Linux -- 3.10.0-862.el7.x86_64 -- #1 SMP Wed Mar 21 18:14:51 EDT 2018
   Machine: x86_64 (x86_64)
   Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
   CPU 0: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (4400.0 bogomips)
          x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 1: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (4400.0 bogomips)
          x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   10:34:15 up 16:42,  2 users,  load average: 0.18, 0.12, 0.09; runlevel 3

------------------------------------------------------------------------
Benchmark Run: Fri Apr 13 2018 10:34:15 - 11:02:19
2 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       29005066.9 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     4330.7 MWIPS (9.8 s, 7 samples)
Execl Throughput                               2525.3 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        491421.2 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          129891.5 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       1609588.0 KBps  (30.0 s, 2 samples)
Pipe Throughput                              664258.2 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 136408.7 lps   (10.0 s, 7 samples)
Process Creation                               7613.7 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   5460.2 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1048.3 lpm   (60.0 s, 2 samples)
System Call Overhead                         618401.1 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   29005066.9   2485.4
Double-Precision Whetstone                       55.0       4330.7    787.4
Execl Throughput                                 43.0       2525.3    587.3
File Copy 1024 bufsize 2000 maxblocks          3960.0     491421.2   1241.0
File Copy 256 bufsize 500 maxblocks            1655.0     129891.5    784.8
File Copy 4096 bufsize 8000 maxblocks          5800.0    1609588.0   2775.2
Pipe Throughput                               12440.0     664258.2    534.0
Pipe-based Context Switching                   4000.0     136408.7    341.0
Process Creation                                126.0       7613.7    604.3
Shell Scripts (1 concurrent)                     42.4       5460.2   1287.8
Shell Scripts (8 concurrent)                      6.0       1048.3   1747.2
System Call Overhead                          15000.0     618401.1    412.3
                                                                   ========
System Benchmarks Index Score                                         908.7

------------------------------------------------------------------------
Benchmark Run: Fri Apr 13 2018 11:02:19 - 11:30:24
2 CPUs in system; running 2 parallel copies of tests

Dhrystone 2 using register variables       57995632.4 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     8648.0 MWIPS (9.8 s, 7 samples)
Execl Throughput                               4966.4 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        874465.9 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          227793.2 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       2870742.7 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1325348.0 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 270309.4 lps   (10.0 s, 7 samples)
Process Creation                              16341.7 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   7689.6 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1087.2 lpm   (60.0 s, 2 samples)
System Call Overhead                        1161983.8 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   57995632.4   4969.6
Double-Precision Whetstone                       55.0       8648.0   1572.4
Execl Throughput                                 43.0       4966.4   1155.0
File Copy 1024 bufsize 2000 maxblocks          3960.0     874465.9   2208.2
File Copy 256 bufsize 500 maxblocks            1655.0     227793.2   1376.4
File Copy 4096 bufsize 8000 maxblocks          5800.0    2870742.7   4949.6
Pipe Throughput                               12440.0    1325348.0   1065.4
Pipe-based Context Switching                   4000.0     270309.4    675.8
Process Creation                                126.0      16341.7   1297.0
Shell Scripts (1 concurrent)                     42.4       7689.6   1813.6
Shell Scripts (8 concurrent)                      6.0       1087.2   1811.9
System Call Overhead                          15000.0    1161983.8    774.7
                                                                   ========
System Benchmarks Index Score                                        1618.3

################################################################################

With the sched patch kernel:

Pipe Throughput                              669672.4 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 145046.6 lps   (10.0 s, 7 samples)
Process Creation                               7889.8 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   5320.7 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1039.9 lpm   (60.0 s, 2 samples)
System Call Overhead                         618191.2 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   29015828.0   2486.4
Double-Precision Whetstone                       55.0       4326.6    786.7
Execl Throughput                                 43.0       2497.3    580.8
File Copy 1024 bufsize 2000 maxblocks          3960.0     491846.6   1242.0
File Copy 256 bufsize 500 maxblocks            1655.0     130984.0    791.4
File Copy 4096 bufsize 8000 maxblocks          5800.0    1630917.6   2811.9
Pipe Throughput                               12440.0     669672.4    538.3
Pipe-based Context Switching                   4000.0     145046.6    362.6
Process Creation                                126.0       7889.8    626.2
Shell Scripts (1 concurrent)                     42.4       5320.7   1254.9
Shell Scripts (8 concurrent)                      6.0       1039.9   1733.2
System Call Overhead                          15000.0     618191.2    412.1
                                                                   ========
System Benchmarks Index Score                                         914.9

------------------------------------------------------------------------
Benchmark Run: Fri Apr 13 2018 14:38:17 - 15:06:21
2 CPUs in system; running 2 parallel copies of tests

Dhrystone 2 using register variables       58034919.9 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     8632.3 MWIPS (9.8 s, 7 samples)
Execl Throughput                               4786.9 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        874006.4 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          229655.3 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       2960246.9 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1325788.9 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 279446.8 lps   (10.0 s, 7 samples)
Process Creation                              16720.9 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   7744.1 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1088.8 lpm   (60.0 s, 2 samples)
System Call Overhead                        1169166.7 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   58034919.9   4973.0
Double-Precision Whetstone                       55.0       8632.3   1569.5
Execl Throughput                                 43.0       4786.9   1113.2
File Copy 1024 bufsize 2000 maxblocks          3960.0     874006.4   2207.1
File Copy 256 bufsize 500 maxblocks            1655.0     229655.3   1387.6
File Copy 4096 bufsize 8000 maxblocks          5800.0    2960246.9   5103.9
Pipe Throughput                               12440.0    1325788.9   1065.7
Pipe-based Context Switching                   4000.0     279446.8    698.6
Process Creation                                126.0      16720.9   1327.1
Shell Scripts (1 concurrent)                     42.4       7744.1   1826.4
Shell Scripts (8 concurrent)                      6.0       1088.8   1814.7
System Call Overhead                          15000.0    1169166.7    779.4
                                                                   ========
System Benchmarks Index Score                                        1628.0



======= Script description and score comparison completed! =======

Comment 7 Vitaly Kuznetsov 2018-04-16 08:55:54 UTC
(In reply to ldu from comment #6)
> Hi Vitaly,
> I try the LKP on RHEL, but RHEL is not a supported system for LKP.
> I run another performance tool unixbench on the guest.
> The test result show this update brought some performance improvement.

Thank you Lily, I'll go ahead with the patchset.

Comment 9 Bruno Meneguele 2018-05-04 22:02:08 UTC
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 11 Bruno Meneguele 2018-05-07 20:10:55 UTC
Patch(es) available on kernel-3.10.0-883.el7

Comment 13 ldu 2018-07-18 06:42:51 UTC
Verified this bug on RHEL 7.6 with kernel kernel-3.10.0-915.el7
check the dmesg could see the "sched" related log:
[root@bootp-73-199-225 ~]# dmesg |grep vmware
[    0.000000] vmware: TSC freq read from hypervisor : 2199.998 MHz
[    0.000000] vmware: Host bus clock speed read from hypervisor : 66000000 Hz
[    0.000000] vmware: using sched offset of 7509058513 ns
 
Also the performance seems a little improvement with tools unixbench.

so Change the status to verified.

Comment 15 errata-xmlrpc 2018-10-30 08:19:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3083


Note You need to log in before you can comment on or make changes to this bug.