Bug 562037

Summary: cpu performance is not improved with advanced cpu flag
Product: Red Hat Enterprise Linux 5 Reporter: Suqin Huang <shuang>
Component: kvmAssignee: john cooper <john.cooper>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 5.5CC: acathrow, cpelland, dshaks, john.cooper, llim, mwagner, nobody, shuang, tburke, virt-maint, ykaul
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-13 19:51:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 580948    
Attachments:
Description Flags
cpu benchmark test result
none
sandra muti-media test report
none
have some improvement
none
without sse4.1, sse4.2 none

Description Suqin Huang 2010-02-05 03:48:16 UTC
Created attachment 388956 [details]
cpu benchmark test result

Description of problem:
According to https://bugzilla.redhat.com/show_bug.cgi?id=518090#c29, create this bug to trace vm cpu performance.

Version-Release number of selected component (if applicable):
kvm-83-155.el5

How reproducible:


Steps to Reproduce:
1. Boot guest with the following cpu para to compare their performance.
Intel:
-cpu qemu64,+sse2,+ssse3,+sse4_1,+sse4_2,+popcnt 
-cpu qemu64,+sse2,+ssse3

AMD:
-cpu qemu64,+sse2,+cx16,+sse4a,+misalignsse,+popcnt,+abm
-cpu qemu64,+sse2,+cx16, RDTSCP 

RHEL system: 
run phoronix-test-suite, cpu test suite, java test suite, encoding test suite
There is no obvious improvement with and without sse4.2 on vm.
The test cases result of physical box testing show that with sse4.2 is much more batter than without sse4.2.

Windows:
x264: fewer improvement with sse4.2, have some improvement with sse4a.

sandra muti-media: no obvious improvement with sse4.2, but the result report
show that sse4.1 instruction is used.

Details please refer to the attachment.


2.
3.
  
Actual results:


Expected results:


Additional info:
Intel: 

host:
flags	: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc ida nonstop_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm

processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 26
model name	: Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz


AMD:
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc nonstop_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw

processor	: 3
vendor_id	: AuthenticAMD
cpu family	: 16
model		: 2
model name	: Quad-Core AMD Opteron(tm) Processor 1352

Comment 1 Suqin Huang 2010-02-05 04:57:29 UTC
Created attachment 388967 [details]
sandra muti-media test report

Comment 2 Suqin Huang 2010-02-05 05:07:59 UTC
Mark,
Do you have any suggestion for the performance benchmark testing.

Dor,
Do you have any suggestion for the testing result?

Comment 3 Dor Laor 2010-02-09 09:41:25 UTC
The differences on physical machines might be bigger since it is not only the flags that were changed. Can you post the /proc/cpuinfo of the guest? Since sandra sees sse4_2 it does work, not sure if it is a real bug.

Maybe there is a sse4_2 unit test that can provide better results?

Comment 5 john cooper 2010-02-10 16:03:37 UTC
(In reply to comment #3)
> The differences on physical machines might be bigger since it is not only the
> flags that were changed. Can you post the /proc/cpuinfo of the guest?

Please include an "x86info -a -f" dump in addition to /proc/cpuinfo
as reading the raw cpuid data is the best confirmation of the flag
state.

> Since sandra sees sse4_2 it does work, not sure if it is a real bug.

Yea, seems to be.  Just to validate that assumption how does the
Sandra Benchmark react if an sse4.2 instruction isn't discovered?
Might be useful to attach a log of that scenario here for reference.

Comment 6 Suqin Huang 2010-02-21 09:10:59 UTC
sandra benchmark is tested on windows, can not cat "x86info -a -f" info.
get cpu flag with CPUID: sse sse2 sse3 ssse3 sse4.1 sse4.2 VT

Comment 10 Yaniv Kaul 2010-03-08 09:23:48 UTC
(In reply to comment #6)
> sandra benchmark is tested on windows, can not cat "x86info -a -f" info.
> get cpu flag with CPUID: sse sse2 sse3 ssse3 sse4.1 sse4.2 VT    

So using the exact command line, just boot it from a Live CD with Linux.
I'm pretty sure for start the flags won't change.

Comment 12 john cooper 2010-03-08 18:16:28 UTC
[Forgot to set the "need additional info" request in previous comment.]

Comment 19 Suqin Huang 2010-11-26 10:41:58 UTC
1. host:
kernel:
2.6.18-232.el5

kvm:
kvm-83-215.el5

cpu:

processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 26
model name	: Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz

flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc ida nonstop_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm

2. guest
window2008r2

Sandra:
http://download1us.softpedia.com/dl/bd9271c3262ac3049ef0134f3f538b57/4cee0a6d/100005280/software/system/info/san1720.exe

3. x86info 

boot with sse4.1,sse4.2
x87info v1.21.  Dave Jones 2001-2007
Feedback to <davej>.

Found 2 CPUs
--------------------------------------------------------------------------
CPU #1
Family: 6 Model: 6 Stepping: 3 Type: 0 Brand: 0
CPU Model: Celeron / Mobile Pentium II Original OEM
Feature flags:
 fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflsh mmx fxsr sse sse2
Extended feature flags:
 sse3 [19] [20] [31]
 [0] [2] [3] [4] [5] [6] [7] [8] [9] SYSCALL [13] [15] [16] xd [23] [24] em64t
Cache info
 L1 Instruction cache: 32KB, 8-way associative. 64 byte line size.
 L1 Data cache: 32KB, 8-way associative. 64 byte line size.
 L2 unified cache: 2MB, sectored, 8-way associative. 64 byte line size.
TLB info
--------------------------------------------------------------------------
CPU #2
Family: 6 Model: 6 Stepping: 3 Type: 0 Brand: 0
CPU Model: Celeron / Mobile Pentium II Original OEM
Feature flags:
 fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflsh mmx fxsr sse sse2
Extended feature flags:
 sse3 [19] [20] [31]
 [0] [2] [3] [4] [5] [6] [7] [8] [9] SYSCALL [13] [15] [16] xd [23] [24] em64t
Cache info
 L1 Instruction cache: 32KB, 8-way associative. 64 byte line size.
 L1 Data cache: 32KB, 8-way associative. 64 byte line size.
 L2 unified cache: 2MB, sectored, 8-way associative. 64 byte line size.
TLB info
--------------------------------------------------------------------------
WARNING: Detected SMP, but unable to access cpuid driver.
Used Uniprocessor CPU routines. Results inaccurate.

Comment 20 Suqin Huang 2010-11-26 10:43:30 UTC
Created attachment 463057 [details]
have some improvement

Comment 21 Suqin Huang 2010-11-26 10:44:15 UTC
Created attachment 463058 [details]
without sse4.1, sse4.2

Comment 24 RHEL Program Management 2011-01-11 20:37:22 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Comment 25 RHEL Program Management 2011-01-11 22:49:30 UTC
This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 26 john cooper 2011-01-13 19:51:50 UTC
Looks like a benchmark issue citing comment/attachements #20/#21
above which indicate a ~50% improvement in the case of +sse4_1,+sse4_2.