Currently we are not able to expose some of the newer CPU flags to the virtual machines Two examples are sse4.1 and sse4.2 which we aren't currently able to passed to the guest. At the very least sse4.1 and 4.2 should be supported but we should research other available flags, including the option of backporting -cpu host
The SSE 4.1 and 4.2 instructions are not supported across all CPUs that are capable of hardware virtualization (vmx, svm). Migration will become a problem if these instructions are exposed without any checking by the management layers.
*** Bug 518337 has been marked as a duplicate of this bug. ***
can not export sse4 cpu flag to guest: 1. host cpu flag: [root@s157 ~]# cat /proc/cpuinfo | grep flag flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc ida nonstop_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm 2. command: /usr/libexec/qemu-kvm -m 2G -smp 2 -drive file=rhel5.4-32.raw,if=ide,cache=off -net nic,model=e1000,vlan=1,macaddr=DE:AD:BE:EF:17:19 -net tap,vlan=1,script=/etc/qemu-ifup -no-hpet -rtc-td-hack -no-kvm-pit-reinjection -monitor stdio -notify all -cpu qemu64,+sse2,+sse4.1 -balloon none -vnc :1 3. guest cpu flag: [root@localhost x86info-1.25]# ./x86info -f x86info v1.25. Dave Jones 2001-2009 Feedback to <davej>. Found 2 CPUs, but found 16d CPUs in MPTable. -------------------------------------------------------------------------- CPU #1 EFamily: 0 EModel: 0 Family: 6 Model: 6 Stepping: 3 CPU Model: Celeron / Mobile Pentium II Type: 0 (Original OEM) Brand: 0 (Unsupported) Number of cores per physical package=1 Number of logical processors per socket=1 Number of logical processors per core=1 APIC ID: 0x0 Package: 0 Core: 0 SMT ID 0 Feature flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflsh mmx fxsr sse sse2 Extended feature flags: sse3 [19] [31] -------------------------------------------------------------------------- WARNING: Detected SMP, but unable to access cpuid driver. Used Uniprocessor CPU routines. Results inaccurate. [root@localhost x86info-1.25]# cat /proc/cpuinfo flags : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm pni 4. guest rhel5.4-32
5. kvm [root@s157 ~]# rpm -qa | grep kvm kvm-83-149.el5 etherboot-zroms-kvm-5.4.4-13.el5 kvm-qemu-img-83-149.el5 etherboot-roms-kvm-5.4.4-13.el5 kmod-kvm-83-149.el5 kvm-tools-83-149.el5 kvm-debuginfo-83-149.el5 [root@s157 ~]# uname -r 2.6.18-185.el5
(In reply to comment #13) > can not export sse4 cpu flag to guest: : > 3. guest cpu flag: > > [root@localhost x86info-1.25]# ./x86info -f > x86info v1.25. Dave Jones 2001-2009 > Feedback to <davej>. > > Found 2 CPUs, but found 16d CPUs in MPTable. > -------------------------------------------------------------------------- > CPU #1 > EFamily: 0 EModel: 0 Family: 6 Model: 6 Stepping: 3 > CPU Model: Celeron / Mobile Pentium II > Type: 0 (Original OEM) Brand: 0 (Unsupported) > Number of cores per physical package=1 > Number of logical processors per socket=1 > Number of logical processors per core=1 > APIC ID: 0x0 Package: 0 Core: 0 SMT ID 0 > Feature flags: > fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflsh mmx > fxsr sse sse2 > Extended feature flags: > sse3 [19] [31] Looks like this version of x86info isn't displaying the sse4.1 cpuid bit which is the "Extended feature flags" bit [19] above. So it appears to be exported correctly to the guest. Note x86info will display the raw data when invoked with "-a -f" flags. You should see something like: eax in: 0x00000001, eax = 00010676 ebx = 00020800 ecx = 0008e3fd edx = bfebfbff where for cpuid (eax in) of 0x1, ecx & 1 << 19 is set corresponding to sse4.1. Refer to the Intel and AMD CPUID specifications for an exhaustive definition of the bit encodings.
or use rhel 5.5 as a guest to test with cat /proc/cpuinfo?
Moving back to ON_QA, so testing is done following the instructions on comment #16.
Use /proc/cpuinfo, x86info to show rhel cpu flag, cpu-z to show windows cpu flag. Both /proc/cpuinfo and x86info can not check cpu flag is effect. 1. Intel host: CLI: /usr/libexec/qemu-kvm -smp 2 -m 2G -drive file=/mnt/images/rhel5.4-64-virtio.bak,if=ide -net nic,vlan=0,macaddr=00:2a:4a:01:00:67,model=virtio -net tap,vlan=0,script=/etc/qemu-ifup -uuid `uuidgen` -no-hpet -usbdevice tablet -rtc-td-hack -startdate now -cpu qemu64,+sse2,+sse4.1,+sse4.2,+popcnt,+ssse3,+vmx -monitor stdio -boot c -vnc :8 windows08-R2-64: CPUID:sse sse2 sse3 ssse3 sse4.1 sse4.2 VT rhel5.5: 1. #cat /proc/cpuinfo flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm pni vmx ssse3 sse4_1 sse4_2 popcnt 2. x86info -a -f CPU #4 EFamily: 0 EModel: 0 Family: 6 Model: 6 Stepping: 3 CPU Model: Celeron / Mobile Pentium II Type: 0 (Original OEM) Brand: 0 (Unsupported) Number of reporting banks : 0 Erk, MCG_CTL not present! :0000000000000000: Number of cores per physical package=1 Number of logical processors per socket=1 Number of logical processors per core=1 APIC ID: 0x3 Package: 0 Core: 0 SMT ID 0 eax in: 0x00000000, eax = 00000004 ebx = 756e6547 ecx = 6c65746e edx = 49656e69 eax in: 0x00000001, eax = 00000663 ebx = 03000800 ecx = 80980221 edx = 078bfbfd eax in: 0x00000002, eax = 00000001 ebx = 00000000 ecx = 00000000 edx = 002c307d eax in: 0x00000003, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x00000004, eax = 00000121 ebx = 01c0003f ecx = 0000003f edx = 00000001 eax in: 0x80000000, eax = 8000000a ebx = 68747541 ecx = 444d4163 edx = 69746e65 eax in: 0x80000001, eax = 078bfbfd ebx = 00000000 ecx = 00000000 edx = 2191abfd eax in: 0x80000002, eax = 554d4551 ebx = 72695620 ecx = 6c617574 edx = 55504320 eax in: 0x80000003, eax = 72657620 ebx = 6e6f6973 ecx = 392e3020 edx = 0000312e eax in: 0x80000004, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000005, eax = 01ff01ff ebx = 01ff01ff ecx = 40020140 edx = 40020140 eax in: 0x80000006, eax = 00000000 ebx = 42004200 ecx = 02008140 edx = 00000000 eax in: 0x80000007, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000008, eax = 00003028 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000009, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x8000000a, eax = 00000001 ebx = 00000010 ecx = 00000000 edx = 00000000 Cache info L1 Instruction cache: 32KB, 8-way associative. 64 byte line size. L1 Data cache: 32KB, 8-way associative. 64 byte line size. L2 cache: 2MB, 8-way associative. 64 byte line size. TLB info Feature flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflsh mmx fxsr sse sse2 Extended feature flags: sse3 vmx ssse3 [19] [20] [23] [31] Connector type: Socket 370 (370 Pin PGA) MTRR registers: MTRRcap (0xfe): 0x0000000000000508 MTRRphysBase0 (0x200): 0x00000000c0000000 MTRRphysMask0 (0x201): 0xffffffffe0000800 MTRRphysBase1 (0x202): 0x0000000000000000 MTRRphysMask1 (0x203): 0x0000000000000000 MTRRphysBase2 (0x204): 0x0000000000000000 MTRRphysMask2 (0x205): 0x0000000000000000 MTRRphysBase3 (0x206): 0x0000000000000000 MTRRphysMask3 (0x207): 0x0000000000000000 MTRRphysBase4 (0x208): 0x0000000000000000 MTRRphysMask4 (0x209): 0x0000000000000000 MTRRphysBase5 (0x20a): 0x0000000000000000 MTRRphysMask5 (0x20b): 0x0000000000000000 MTRRphysBase6 (0x20c): 0x0000000000000000 MTRRphysMask6 (0x20d): 0x0000000000000000 MTRRphysBase7 (0x20e): 0x0000000000000000 MTRRphysMask7 (0x20f): 0x0000000000000000 MTRRfix64K_00000 (0x250): 0x0606060606060606 MTRRfix16K_80000 (0x258): 0x0606060606060606 MTRRfix16K_A0000 (0x259): 0x0000000000000000 MTRRfix4K_C8000 (0x269): 0x0000000000000000 MTRRfix4K_D0000 0x26a: 0x0000000000000000 MTRRfix4K_D8000 0x26b: 0x0000000000000000 MTRRfix4K_E0000 0x26c: 0x0000000000000000 MTRRfix4K_E8000 0x26d: 0x0000000000000000 MTRRfix4K_F0000 0x26e: 0x0000000000000000 MTRRfix4K_F8000 0x26f: 0x0000000000000000 MTRRdefType (0x2ff): 0x0000000000000c06 2.65GHz processor (estimate).
AMD host: /usr/libexec/qemu-kvm -smp 2 -m 2G -drive file=/mnt/images/rhel5.4-64-virtio.bak,media=disk,if=ide,cache=off,index=0,serial=fb-bde1-8bcf10f72b98 -net nic,vlan=0,macaddr=00:2a:4a:01:00:67,model=virtio -net tap,vlan=0,script=/etc/qemu-ifup -uuid `uuidgen` -no-hpet -usbdevice tablet -rtc-td-hack -startdate now -cpu qemu64,+sse2,+cx16,+sse4a,+misalignsse,+popcnt,+abm -monitor stdio -boot c -vnc :8 window08-R2-64: cpu-z: sse (1,2,3,4a), x86-64 rhel5.5: #cat /proc/cpuinfo (can not show misalignsse) flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm pni cx16 popcnt altmovcr8 abm sse4a #x86info -a -f EFamily: 0 EModel: 0 Family: 6 Model: 6 Stepping: 3 CPU Model: Athlon XP (Palomino) MSR: 0x0000002a=0x00000000 : 00000000 00000000 00000000 00000000 MSR: 0xc0000080=0x00000d01 : 00000000 00000000 00001101 00000001 MSR: 0xc0010010=0x00000000 : 00000000 00000000 00000000 00000000 MSR: 0xc0010015=0x00000000 : 00000000 00000000 00000000 00000000 Couldn't read MSR 0xc001001b Erk, MCG_CTL not present! :0000000000000000: Number of reporting banks : 0 31 23 15 7 PowerNOW! Technology information Available features: None SVM: revision 1, 16 ASIDs Address Size: 48 bits virtual, 40 bits physical eax in: 0x00000000, eax = 00000004 ebx = 68747541 ecx = 444d4163 edx = 69746e65 eax in: 0x00000001, eax = 00000663 ebx = 01000800 ecx = 80802001 edx = 078bfbfd eax in: 0x00000002, eax = 00000001 ebx = 00000000 ecx = 00000000 edx = 002c307d eax in: 0x00000003, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x00000004, eax = 00000121 ebx = 01c0003f ecx = 0000003f edx = 00000001 eax in: 0x80000000, eax = 8000000a ebx = 68747541 ecx = 444d4163 edx = 69746e65 eax in: 0x80000001, eax = 078bfbfd ebx = 00000000 ecx = 000000e0 edx = 2191abfd eax in: 0x80000002, eax = 554d4551 ebx = 72695620 ecx = 6c617574 edx = 55504320 eax in: 0x80000003, eax = 72657620 ebx = 6e6f6973 ecx = 392e3020 edx = 0000312e eax in: 0x80000004, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000005, eax = 01ff01ff ebx = 01ff01ff ecx = 40020140 edx = 40020140 eax in: 0x80000006, eax = 00000000 ebx = 42004200 ecx = 02008140 edx = 00000000 eax in: 0x80000007, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000008, eax = 00003028 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x80000009, eax = 00000000 ebx = 00000000 ecx = 00000000 edx = 00000000 eax in: 0x8000000a, eax = 00000001 ebx = 00000010 ecx = 00000000 edx = 00000000 L1 Data TLB (2M/4M): Direct mapped. 255 entries. L1 Instruction TLB (2M/4M): Direct mapped. 255 entries. L1 Data TLB (4K): Direct mapped. 255 entries. L1 Instruction TLB (4K): Direct mapped. 255 entries. L1 Data cache: Size: 64Kb 2-way associative. lines per tag=1 line size=64 bytes. L1 Instruction cache: Size: 64Kb 2-way associative. lines per tag=1 line size=64 bytes. L2 Data TLB (2M/4M): Disabled. 0 entries. L2 Instruction TLB (2M/4M): Disabled. 0 entries. L2 Data TLB (4K): 4-way associative. 512 entries. L2 Instruction TLB (4K): 4-way associative. 512 entries. L2 cache: Size: 512Kb 16-way associative. lines per tag=1 line size=64 bytes. Feature flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflsh mmx fxsr sse sse2 sse3 cmpxchg16b popcnt Extended feature flags: Connector type: Socket A (462 Pin PGA) MTRR registers: MTRRcap (0xfe): 0x0000000000000508 MTRRphysBase0 (0x200): 0x00000000c0000000 MTRRphysMask0 (0x201): 0xffffffffe0000800 MTRRphysBase1 (0x202): 0x0000000000000000 MTRRphysMask1 (0x203): 0x0000000000000000 MTRRphysBase2 (0x204): 0x0000000000000000 MTRRphysMask2 (0x205): 0x0000000000000000 MTRRphysBase3 (0x206): 0x0000000000000000 MTRRphysMask3 (0x207): 0x0000000000000000 MTRRphysBase4 (0x208): 0x0000000000000000 MTRRphysMask4 (0x209): 0x0000000000000000 MTRRphysBase5 (0x20a): 0x0000000000000000 MTRRphysMask5 (0x20b): 0x0000000000000000 MTRRphysBase6 (0x20c): 0x0000000000000000 MTRRphysMask6 (0x20d): 0x0000000000000000 MTRRphysBase7 (0x20e): 0x0000000000000000 MTRRphysMask7 (0x20f): 0x0000000000000000 MTRRfix64K_00000 (0x250): 0x0606060606060606 MTRRfix16K_80000 (0x258): 0x0606060606060606 MTRRfix16K_A0000 (0x259): 0x0000000000000000 MTRRfix4K_C8000 (0x269): 0x0000000000000000 MTRRfix4K_D0000 0x26a: 0x0000000000000000 MTRRfix4K_D8000 0x26b: 0x0000000000000000 MTRRfix4K_E0000 0x26c: 0x0000000000000000 MTRRfix4K_E8000 0x26d: 0x0000000000000000 MTRRfix4K_F0000 0x26e: 0x0000000000000000 MTRRfix4K_F8000 0x26f: 0x0000000000000000 MTRRdefType (0x2ff): 0x0000000000000c06 2.20GHz processor (estimate).
1. Intel: command: -cpu qemu64,+sse2,+sse4.1,+sse4.2,+popcnt,+ssse3,+vmx result: rhel5.5: /proc/cpuinfo :vmx sse2 ssse3 sse4_1 sse4_2 popcnt Window08-R2-64 CPUID:sse sse2 sse3 ssse3 sse4.1 sse4.2 VT AMD can not show misalignsse, which is mention on wiki http://cleo.tlv.redhat.com/qumrawiki/KVM/VirtualizeCpus command: -cpu qemu64,+sse2,+cx16,+sse4a,+misalignsse,+popcnt,+abm result: rhel5.5: /proc/cpuinfo: sse2 cx16 sse4a popcnt abm (miss misalignsse) window08-R2-64 cpu-z: sse (1,2,3,4a), x86-64 For x86info result please refer to comment#19 and comment#20 2. performance testing is running use phoronix-test-suite to test rhel performance. mwagner: do you have any suggestion for performance testing on windows.
Created attachment 388039 [details] test result for cpu test suite 1. MCS CPU Benchmark result(windows 2008-r2-64): test six times, the data is cpu speed. with sse4.2: 15490, 14973, 14781, 15144, 15420, 14385 no sse4.2: 15110, 15449, 15123, 15257, 15360, 15462 2. phoronix-test-suite, test cpu suite (rhel5.5 with 186 kernel) can not see obvious difference between "with sse4.2" and "without sse4.2" 3. Detail result please see http://focus.bne.redhat.com/~shuang/phoronix-test-suite/cpu-test-suite/ 4. command (without sse4.2, sse4.1) /usr/libexec/qemu-kvm -smp 4 -m 4G -drive file=/mnt/images/rhel5.4-64-virtio.bak,media=disk,if=ide,cache=off,index=0,serial=fb-bde1-8bcf10f72b98 -net nic,vlan=0,macaddr=00:2a:4a:01:00:67,model=virtio -net tap,vlan=0,script=/etc/qemu-ifup -uuid `uuidgen` -no-hpet -usbdevice tablet -rtc-td-hack -startdate now -cpu qemu64,+sse2,+ssse3,+vmx -monitor stdio -boot c -vnc :8
Concerning the AMD cpuid misalignsse flag, it appears the flag is being exposed correctly to the guest given the raw data reported by x86info: eax in: 0x80000001, eax = 078bfbfd ebx = 00000000 ecx = 000000e0 edx = 2191abfd where for cpuid(0x80000001), ecx indicates misalignsse, sse4a, and abm are visible to the guest as defined in AMD's CPUID Specification: CPUID Fn8000_0001_ECX Feature Identifiers : Bits Description 7 MisAlignSse: misaligned SSE mode. 6 SSE4A: EXTRQ, INSERTQ, MOVNTSS, and MOVNTSD instruction support. 5 ABM: advanced bit manipulation. LZCNT instruction support. However neither the guest kernel's generated /proc/cpuinfo nor x86info is advertising the flag (x86info should at least be printing out bit [7] for 0x80000001_ecx which isn't apparent in the above dump). Is there any other concern above where cpuid feature flags are not being seen by a guest or was this the last issue? Note the performance question probably should be addressed by its own BZ case as we're only dealing with export of flags here.
yes only misalignsse can not be shown, which can be shown on host.
Performance test result: 1. RHEL test with phoronix-test-suite, test suite: cpu, encoding, java (detail test cases include in the result file) no obvious improvement with sse4.2, and with sse4a 2. Windows x264: fewer improvement with sse4.2, have obvious improvement with sse4a on AMD machine sandra muti-media: no obvious improvement with sse4.2, but the result report show that sse4.1 instruction is used. Details please refer to the attachment. Dor, John, do you have any suggestion for the testing result.
Created attachment 388512 [details] performance test result
Created attachment 388513 [details] sandra muti-media test report
(In reply to comment #24) > yes only misalignsse can not be shown, which can be shown on host. So if I understand the above correctly, we don't have a qemu issue since the flag is being exported as indicated by the raw cpuid data. (In reply to comment #25) > Performance test result: > > 1. RHEL > test with phoronix-test-suite, test suite: cpu, encoding, java (detail test > cases include in the result file) > > no obvious improvement with sse4.2, and with sse4a > > 2. Windows > x264: fewer improvement with sse4.2, have obvious improvement with sse4a on AMD > machine > > sandra muti-media: no obvious improvement with sse4.2, but the result report > show that sse4.1 instruction is used. > > Details please refer to the attachment. > > Dor, John, do you have any suggestion for the testing result. Nothing which immediately comes to mind. The performance folks most likely can offer some insight and/or a more appropriate test. At this point it seems we should close this BZ and open a separate case to track the performance issue.
create new bug 562037 to trace performance issue
I can get the flags to be passed through to the guest OS only with a 5.4 errata kernel (2.6.18-164.1.1.el5 or higher). So the fix for this issue requires that the patch from BZ 520626 (for 5.4.z) or 517928 (5.5). Now that I have the ssse3, sse4.1, and sse4.2 flags are passed through to the guest, we can run some CPU benchmarks. I will post here or to bug 562037 if performance is negatively impacted.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0271.html