Bug 1014682 - Enabling <cpu mode="host-model"> does not use correct cpuid level, causes kernel panics
Enabling <cpu mode="host-model"> does not use correct cpuid level, causes ker...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt (Show other bugs)
7.1
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Jiri Denemark
Virtualization Bugs
:
Depends On: 870071
Blocks: TRACKER-bugs-affecting-libguestfs
  Show dependency treegraph
 
Reported: 2013-10-02 10:22 EDT by Dave Allan
Modified: 2016-04-26 11:12 EDT (History)
23 users (show)

See Also:
Fixed In Version: libvirt-1.1.1-10.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 870071
: 1018251 (view as bug list)
Environment:
Last Closed: 2014-06-13 06:04:36 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Dave Allan 2013-10-02 10:22:30 EDT
+++ This bug was initially created as a clone of Bug #870071 +++

Description of problem:

This is using a SandyBridge CPU which has AVX instructions:
https://en.wikipedia.org/wiki/Advanced_Vector_Extensions

I'm booting a guest using <cpu mode="host-model"/>.  Inside
the guest, when initializing an mdadm device (yes, this guest
has RAID arrays inside), we see the trace attached below.

I think what is happening here:

 (a) CPU flags are copied from host to guest, advertising 'avx'
 (b) Guest tries to use 'avx'.
 (c) KVM doesn't emulate it, so it all falls in a hole.

Perhaps libvirt should filter flags based on what KVM can actually do?

Version-Release number of selected component (if applicable):

qemu-1.2.0-16.fc18.x86_64
libvirt-0.10.2-3.fc18.x86_64
kernel-3.6.2-2.fc18.x86_64

How reproducible:

100%

Steps to Reproduce:
1. in libguestfs test suite: make -C tests/md check

Additional info:

Host CPU flags:
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid

mdadm --create --run r5t1 --level 5 --raid-devices 4 --spare-devices 1 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2 missing
mdadm: Defaulting to version 1.2 metadata
[    5.131487] md: bind<sda2>
[    5.132218] md: bind<sdb2>
[    5.132966] md: bind<sdc2>
[    5.133773] md: bind<sdd2>
[    5.150258] async_tx: api initialized (async)
[    5.152459] xor: automatically using best checksumming function:
[    5.153064] invalid opcode: 0000 [#1] SMP 
[    5.153423] Modules linked in: xor(+) async_tx raid1 ghash_clmulni_intel microcode virtio_net virtio_scsi virtio_blk virtio_rng virtio_balloon virtio_mmio sparse_keymap rfkill sym53c8xx scsi_transport_spi crc8 crc_ccitt crc_itu_t crc32c_intel libcrc32c
[    5.154012] CPU 0 
[    5.154012] Pid: 262, comm: modprobe Not tainted 3.6.2-2.fc18.x86_64.debug #1 Bochs Bochs
[    5.154012] RIP: 0010:[<ffffffffa0095c6c>]  [<ffffffffa0095c6c>] xor_avx_2+0x5c/0x270 [xor]
[    5.154012] RSP: 0018:ffff88001abfdd00  EFLAGS: 00010202
[    5.154012] RAX: 000000008005003b RBX: ffff8800192d0000 RCX: 0000000000000001
[    5.154012] RDX: ffff8800192d3000 RSI: ffff8800192d0000 RDI: 0000000000001000
[    5.154012] RBP: ffff88001abfddc8 R08: 0000000000000000 R09: 0000000000000000
[    5.154012] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8800192d3000
[    5.154012] R13: 0000000000000008 R14: 000000008005003b R15: ffff8800192d0000
[    5.154012] FS:  00007f1e5e769740(0000) GS:ffff88001f000000(0000) knlGS:0000000000000000
[    5.154012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.154012] CR2: 00007fff1f8d5000 CR3: 0000000019278000 CR4: 00000000000007f0
[    5.154012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    5.154012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    5.154012] Process modprobe (pid: 262, threadinfo ffff88001abfc000, task ffff88001a0fa450)
[    5.154012] Stack:
[    5.154012]  ffffffff8134df3e 0000000000000000 0000000000000000 0000000000000001
[    5.154012]  0000000000000001 ffff88001abfdfd8 ffff88001abfc000 ffff88001a0fa450
[    5.154012]  ffff88001abfdd48 ffffffff8111c227 ffffffff816dcb30 0000000000000000
[    5.154012] Call Trace:
[    5.154012]  [<ffffffff8134df3e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[    5.154012]  [<ffffffff8111c227>] ? rcu_irq_exit+0x87/0xd0
[    5.154012]  [<ffffffff816dcb30>] ? retint_restore_args+0x13/0x13
[    5.154012]  [<ffffffffa0096a69>] do_xor_speed+0x7d/0xe7 [xor]
[    5.154012]  [<ffffffffa0005075>] calibrate_xor_blocks+0x75/0x1000 [xor]
[    5.154012]  [<ffffffffa0005000>] ? 0xffffffffa0004fff
[    5.154012]  [<ffffffff8100212a>] do_one_initcall+0x12a/0x180
[    5.154012]  [<ffffffff810e3553>] sys_init_module+0x133/0x2340
[    5.154012]  [<ffffffff81362490>] ? ddebug_proc_open+0xd0/0xd0
[    5.154012]  [<ffffffff81099143>] ? up_write+0x23/0x40
[    5.154012]  [<ffffffff816e55e9>] system_call_fastpath+0x16/0x1b
[    5.154012] Code: 98 00 00 00 31 c0 49 c1 ed 09 65 48 8b 04 25 b0 c8 00 00 83 80 44 e0 ff ff 01 e8 10 54 fb e0 66 90 49 89 c6 0f 06 66 66 90 66 90 <c5> fc 29 04 24 c5 fc 29 4c 24 20 c5 fc 29 54 24 40 c5 fc 29 5c 
[    5.154012] RIP  [<ffffffffa0095c6c>] xor_avx_2+0x5c/0x270 [xor]
[    5.154012]  RSP <ffff88001abfdd00>
[    5.174236] ---[ end trace 1500dad90bed99ad ]---
[    5.174615] BUG: sleeping function called from invalid context at kernel/rwsem.c:20
[    5.175227] in_atomic(): 1, irqs_disabled(): 0, pid: 262, name: modprobe
[    5.175754] INFO: lockdep is turned off.
[    5.176088] Pid: 262, comm: modprobe Tainted: G      D      3.6.2-2.fc18.x86_64.debug #1
[    5.176725] Call Trace:
[    5.176922]  [<ffffffff810a281a>] __might_sleep+0x18a/0x240
[    5.177381]  [<ffffffff816d9af6>] down_read+0x26/0x98
[    5.177790]  [<ffffffff81081e64>] exit_signals+0x24/0x130
[    5.178243]  [<ffffffff8106e6bd>] do_exit+0xbd/0xb00
[    5.178629]  [<ffffffff8106b348>] ? kmsg_dump+0x1b8/0x240
[    5.179089]  [<ffffffff8106b1b5>] ? kmsg_dump+0x25/0x240
[    5.179499]  [<ffffffff816dda5d>] oops_end+0x9d/0xe0
[    5.179893]  [<ffffffff8101d9e8>] die+0x58/0x90
[    5.180282]  [<ffffffff816dd320>] do_trap+0xc0/0x170
[    5.180669]  [<ffffffff8101ae96>] ? do_invalid_op+0x86/0xc0
[    5.181147]  [<ffffffff8101aec0>] do_invalid_op+0xb0/0xc0
[    5.181563]  [<ffffffffa0095c6c>] ? xor_avx_2+0x5c/0x270 [xor]
[    5.182016]  [<ffffffff8134df7d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[    5.182559]  [<ffffffff816dcb60>] ? restore_args+0x30/0x30
[    5.182981]  [<ffffffff816e663b>] invalid_op+0x1b/0x20
[    5.183428]  [<ffffffffa0095c6c>] ? xor_avx_2+0x5c/0x270 [xor]
[    5.183878]  [<ffffffff8134df3e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[    5.184418]  [<ffffffff8111c227>] ? rcu_irq_exit+0x87/0xd0
[    5.184844]  [<ffffffff816dcb30>] ? retint_restore_args+0x13/0x13
[    5.185347]  [<ffffffffa0096a69>] do_xor_speed+0x7d/0xe7 [xor]
[    5.185816]  [<ffffffffa0005075>] calibrate_xor_blocks+0x75/0x1000 [xor]
[    5.186381]  [<ffffffffa0005000>] ? 0xffffffffa0004fff
[    5.186810]  [<ffffffff8100212a>] do_one_initcall+0x12a/0x180
[    5.187292]  [<ffffffff810e3553>] sys_init_module+0x133/0x2340
[    5.187775]  [<ffffffff81362490>] ? ddebug_proc_open+0xd0/0xd0
[    5.188273]  [<ffffffff81099143>] ? up_write+0x23/0x40
[    5.188692]  [<ffffffff816e55e9>] system_call_fastpath+0x16/0x1b
[    5.189202] BUG: scheduling while atomic: modprobe/262/0x10000003
[    5.189683] INFO: lockdep is turned off.
[    5.190034] Modules linked in: xor(+) async_tx raid1 ghash_clmulni_intel microcode virtio_net virtio_scsi virtio_blk virtio_rng virtio_balloon virtio_mmio sparse_keymap rfkill sym53c8xx scsi_transport_spi crc8 crc_ccitt crc_itu_t crc32c_intel libcrc32c
[    5.192151] Pid: 262, comm: modprobe Tainted: G      D      3.6.2-2.fc18.x86_64.debug #1
[    5.192779] Call Trace:
[    5.192979]  [<ffffffff816cfa51>] __schedule_bug+0x67/0x75
[    5.193452]  [<ffffffff816da78b>] __schedule+0x98b/0x9f0
[    5.193872]  [<ffffffff810a500a>] __cond_resched+0x2a/0x40
[    5.194356]  [<ffffffff816da870>] _cond_resched+0x30/0x40
[    5.194783]  [<ffffffff816d9afb>] down_read+0x2b/0x98
[    5.195211]  [<ffffffff81081e64>] exit_signals+0x24/0x130
[    5.195657]  [<ffffffff8106e6bd>] do_exit+0xbd/0xb00
[    5.196079]  [<ffffffff8106b348>] ? kmsg_dump+0x1b8/0x240
[    5.196524]  [<ffffffff8106b1b5>] ? kmsg_dump+0x25/0x240
[    5.196946]  [<ffffffff816dda5d>] oops_end+0x9d/0xe0
[    5.197372]  [<ffffffff8101d9e8>] die+0x58/0x90
[    5.197755]  [<ffffffff816dd320>] do_trap+0xc0/0x170
[    5.198182]  [<ffffffff8101ae96>] ? do_invalid_op+0x86/0xc0
[    5.198642]  [<ffffffff8101aec0>] do_invalid_op+0xb0/0xc0
[    5.199095]  [<ffffffffa0095c6c>] ? xor_avx_2+0x5c/0x270 [xor]
[    5.199557]  [<ffffffff8134df7d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[    5.200119]  [<ffffffff816dcb60>] ? restore_args+0x30/0x30
[    5.200551]  [<ffffffff816e663b>] invalid_op+0x1b/0x20
[    5.200975]  [<ffffffffa0095c6c>] ? xor_avx_2+0x5c/0x270 [xor]
[    5.201465]  [<ffffffff8134df3e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[    5.201992]  [<ffffffff8111c227>] ? rcu_irq_exit+0x87/0xd0
[    5.202455]  [<ffffffff816dcb30>] ? retint_restore_args+0x13/0x13
[    5.202944]  [<ffffffffa0096a69>] do_xor_speed+0x7d/0xe7 [xor]
[    5.203446]  [<ffffffffa0005075>] calibrate_xor_blocks+0x75/0x1000 [xor]
[    5.203976]  [<ffffffffa0005000>] ? 0xffffffffa0004fff
[    5.204432]  [<ffffffff8100212a>] do_one_initcall+0x12a/0x180
[    5.204887]  [<ffffffff810e3553>] sys_init_module+0x133/0x2340
[    5.205395]  [<ffffffff81362490>] ? ddebug_proc_open+0xd0/0xd0
[    5.205858]  [<ffffffff81099143>] ? up_write+0x23/0x40
[    5.206314]  [<ffffffff816e55e9>] system_call_fastpath+0x16/0x1b
[    5.206857] note: modprobe[262] exited with preempt_count 2
[    5.207586] BUG: scheduling while atomic: modprobe/262/0x10000003
[    5.208102] INFO: lockdep is turned off.
[    5.208424] Modules linked in: xor(+) async_tx raid1 ghash_clmulni_intel microcode virtio_net virtio_scsi virtio_blk virtio_rng virtio_balloon virtio_mmio sparse_keymap rfkill sym53c8xx scsi_transport_spi crc8 crc_ccitt crc_itu_t crc32c_intel libcrc32c
[    5.210496] Pid: 262, comm: modprobe Tainted: G      D W    3.6.2-2.fc18.x86_64.debug #1
[    5.211161] Call Trace:
[    5.211369]  [<ffffffff816cfa51>] __schedule_bug+0x67/0x75
[    5.211825]  [<ffffffff816da78b>] __schedule+0x98b/0x9f0
[    5.212264]  [<ffffffff810976d0>] ? lock_hrtimer_base.isra.20+0x30/0x60
[    5.212772]  [<ffffffff810a500a>] __cond_resched+0x2a/0x40
[    5.213241]  [<ffffffff816da870>] _cond_resched+0x30/0x40
[    5.213657]  [<ffffffff816d9afb>] down_read+0x2b/0x98
[    5.214091]  [<ffffffff810e7fae>] acct_collect+0x4e/0x1b0
[    5.214507]  [<ffffffff8106ee05>] do_exit+0x805/0xb00
[    5.214896]  [<ffffffff8106b348>] ? kmsg_dump+0x1b8/0x240
[    5.215363]  [<ffffffff8106b1b5>] ? kmsg_dump+0x25/0x240
[    5.215773]  [<ffffffff816dda5d>] oops_end+0x9d/0xe0
[    5.216205]  [<ffffffff8101d9e8>] die+0x58/0x90
[    5.216557]  [<ffffffff816dd320>] do_trap+0xc0/0x170
[    5.216942]  [<ffffffff8101ae96>] ? do_invalid_op+0x86/0xc0
[    5.217419]  [<ffffffff8101aec0>] do_invalid_op+0xb0/0xc0
[    5.217834]  [<ffffffffa0095c6c>] ? xor_avx_2+0x5c/0x270 [xor]
[    5.218324]  [<ffffffff8134df7d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[    5.218834]  [<ffffffff816dcb60>] ? restore_args+0x30/0x30
[    5.219287]  [<ffffffff816e663b>] invalid_op+0x1b/0x20
[    5.219704]  [<ffffffffa0095c6c>] ? xor_avx_2+0x5c/0x270 [xor]
[    5.220181]  [<ffffffff8134df3e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[    5.220700]  [<ffffffff8111c227>] ? rcu_irq_exit+0x87/0xd0
[    5.221153]  [<ffffffff816dcb30>] ? retint_restore_args+0x13/0x13
[    5.221638]  [<ffffffffa0096a69>] do_xor_speed+0x7d/0xe7 [xor]
[    5.222140]  [<ffffffffa0005075>] calibrate_xor_blocks+0x75/0x1000 [xor]
[    5.222656]  [<ffffffffa0005000>] ? 0xffffffffa0004fff
[    5.223103]  [<ffffffff8100212a>] do_one_initcall+0x12a/0x180
[    5.223550]  [<ffffffff810e3553>] sys_init_module+0x133/0x2340
[    5.224034]  [<ffffffff81362490>] ? ddebug_proc_open+0xd0/0xd0
[    5.224492]  [<ffffffff81099143>] ? up_write+0x23/0x40
[    5.224887]  [<ffffffff816e55e9>] system_call_fastpath+0x16/0x1b
[    5.225927] md: personality for level 5 is not loaded!
[    5.226402] md: md125 stopped.
[    5.226664] md: unbind<sdd2>
[    5.226902] md: export_rdev(sdd2)
mdadm: RUN_ARRAY failed: Invalid argument
[    5.227520] md: unbind<sdc2>
[    5.227759] md: export_rdev(sdc2)
[    5.228085] md: unbind<sdb2>
[    5.228343] md: export_rdev(sdb2)
[    5.228626] md: unbind<sda2>
[    5.228863] md: export_rdev(sda2)

--- Additional comment from Richard W.M. Jones on 2012-10-25 10:30:42 EDT ---

Instruction that KVM failed to parse was:
     bfc:       c5 fc 29 04 24          vmovaps %ymm0,(%rsp)

--- Additional comment from Richard W.M. Jones on 2012-10-25 11:04:44 EDT ---

Apparently you can't just change the libvirt XML to disable
features that don't work:

  <cpu mode="host-model">
    <model fallback="allow"/>
    <feature policy="disable" name="avx"/>
  </cpu>

gives the error:

*stdin*:6: libguestfs: error: could not create appliance through libvirt: internal error Non-empty feature list specified without CPU model [code=1 domain=31]

--- Additional comment from Jiri Denemark on 2012-10-25 13:19:05 EDT ---

(In reply to comment #2)
> Apparently you can't just change the libvirt XML to disable
> features that don't work:
> 
>   <cpu mode="host-model">
>     <model fallback="allow"/>
>     <feature policy="disable" name="avx"/>
>   </cpu>

Right, bug 799354 is tracking that.

--- Additional comment from Jiri Denemark on 2012-10-25 13:25:07 EDT ---

Could you share the QEMU command line generated by libvirt? I believe, it does not explicitly mention avx, i.e., it gets there through SandyBridge model, right? Anyway, avx is supposed to work with KVM since QEMU supports SandyBridge model, which enables avx. Thus, it's either QEMU or kernel bug. I'm moving this bug to the former for further investigation.

--- Additional comment from Richard W.M. Jones on 2012-10-25 14:03:43 EDT ---

LC_ALL=C LD_LIBRARY_PATH=/tmp/whenjobs2f2d92b86ba2111addc7e199fa77e648/libguestfs-1.19.53/src/.libs:/tmp/whenjobs2f2d92b86ba2111addc7e199fa77e648/libguestfs-1.19.53/gobject/.libs:/tmp/whenjobs2f2d92b86ba2111addc7e199fa77e648/libguestfs-1.19.53/ruby/ext/guestfs PATH=/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/rjones/.local/bin:/home/rjones/bin HOME=/home/rjones USER=rjones LOGNAME=rjones TMPDIR=/home/rjones/d/libguestfs/tmp /usr/bin/qemu-kvm -name guestfs-1t6f28e33d5tqbmu -S -M pc-1.2 -cpu Westmere,+rdtscp,+avx,+osxsave,+xsave,+tsc-deadline,+pdcm,+xtpr,+tm2,+est,+vmx,+ds_cpl,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme -enable-kvm -m 500 -smp 1,sockets=1,cores=1,threads=1 -uuid 3ab7b5c6-31ff-2591-bc35-4d338347423c -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/home/rjones/.config/libvirt/qemu/lib/guestfs-1t6f28e33d5tqbmu.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-reboot -no-shutdown -no-acpi -kernel /home/rjones/d/libguestfs/tmp/.guestfs-1000/kernel.17966 -initrd /home/rjones/d/libguestfs/tmp/.guestfs-1000/initrd.17966 -append panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sde selinux=0 guestfs_verbose=1 TERM=xterm  -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/home/rjones/d/libguestfs/tests/md/md-test1.img,if=none,id=drive-scsi0-0-0-0,format=raw,cache=none -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -drive file=/home/rjones/d/libguestfs/tests/md/md-test2.img,if=none,id=drive-scsi0-0-1-0,format=raw,cache=none -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=1,lun=0,drive=drive-scsi0-0-1-0,id=scsi0-0-1-0 -drive file=/home/rjones/d/libguestfs/tests/md/md-test3.img,if=none,id=drive-scsi0-0-2-0,format=raw,cache=none -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=2,lun=0,drive=drive-scsi0-0-2-0,id=scsi0-0-2-0 -drive file=/home/rjones/d/libguestfs/tests/md/md-test4.img,if=none,id=drive-scsi0-0-3-0,format=raw,cache=none -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=3,lun=0,drive=drive-scsi0-0-3-0,id=scsi0-0-3-0 -drive file=/home/rjones/d/libguestfs/tmp/libguestfsowvgCb/snapshot1,if=none,id=drive-scsi0-0-4-0,format=qcow2,cache=unsafe -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=4,lun=0,drive=drive-scsi0-0-4-0,id=scsi0-0-4-0 -chardev socket,id=charserial0,path=/home/rjones/d/libguestfs/tmp/libguestfsowvgCb/console.sock -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/home/rjones/d/libguestfs/tmp/libguestfsowvgCb/guestfsd.sock -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.libguestfs.channel.0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

--- Additional comment from Cole Robinson on 2012-10-31 11:49:35 EDT ---

Eduardo, can you comment here?

--- Additional comment from Jiri Denemark on 2012-11-01 05:24:16 EDT ---

Interesting, your SandyBridge machine is likely missing x2apic feature. If had that feature, libvirt would report SandyBridge instead of Westmere + avx. Could you try changing the XML to contain

  <cpu mode='custom' match='exact'>
    <model fallback='forbid'>SandyBridge</model>
    <feature policy='disable' name='x2apic'/>
  </cpu>

and see if that makes any difference?

--- Additional comment from Richard W.M. Jones on 2012-11-01 05:52:22 EDT ---

Yes, this works:

  <cpu mode='custom' match='exact'>
    <model fallback='forbid'>SandyBridge</model>
    <feature policy='disable' name='x2apic'/>
  </cpu>

Host /proc/cpuinfo is below.  It is indeed missing x2apic.

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
stepping	: 7
microcode	: 0x28
cpu MHz		: 1600.000
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 6822.32
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
stepping	: 7
microcode	: 0x28
cpu MHz		: 1600.000
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 1
cpu cores	: 4
apicid		: 2
initial apicid	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 6822.32
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 2
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
stepping	: 7
microcode	: 0x28
cpu MHz		: 1600.000
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 2
cpu cores	: 4
apicid		: 4
initial apicid	: 4
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 6822.32
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
stepping	: 7
microcode	: 0x28
cpu MHz		: 1600.000
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 3
cpu cores	: 4
apicid		: 6
initial apicid	: 6
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 6822.32
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 4
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
stepping	: 7
microcode	: 0x28
cpu MHz		: 1600.000
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 6822.32
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 5
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
stepping	: 7
microcode	: 0x28
cpu MHz		: 1600.000
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 1
cpu cores	: 4
apicid		: 3
initial apicid	: 3
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 6822.32
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 6
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
stepping	: 7
microcode	: 0x28
cpu MHz		: 1600.000
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 2
cpu cores	: 4
apicid		: 5
initial apicid	: 5
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 6822.32
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
stepping	: 7
microcode	: 0x28
cpu MHz		: 1600.000
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 3
cpu cores	: 4
apicid		: 7
initial apicid	: 7
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 6822.32
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

--- Additional comment from Jiri Denemark on 2012-11-01 06:39:04 EDT ---

Thanks, what about:

  <cpu mode='custom' match='exact'>
    <model fallback='forbid'>SandyBridge</model>
    <feature policy='disable' name='x2apic'/>
    <feature policy='require' name='osxsave'/>
    <feature policy='require' name='pdcm'/>
    <feature policy='require' name='xtpr'/>
    <feature policy='require' name='tm2'/>
    <feature policy='require' name='est'/>
    <feature policy='require' name='vmx'/>
    <feature policy='require' name='ds_cpl'/>
    <feature policy='require' name='monitor'/>
    <feature policy='require' name='dtes64'/>
    <feature policy='require' name='pbe'/>
    <feature policy='require' name='tm'/>
    <feature policy='require' name='ht'/>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='acpi'/>
    <feature policy='require' name='ds'/>
    <feature policy='require' name='vme'/>
  </cpu>

that should give you the same (feature-wise) CPU but using SandyBridge rather
than Westmere model.

And BTW, it looks like we have a bug since

  <cpu mode='custom' match='exact'>
    <model fallback='forbid'>SandyBridge</model>
    <feature policy='force' name='x2apic'/>
  </cpu>

should work even if the host CPU does not support x2apic (and AFAIK x2apic is
one of the features that QEMU will emulate) but I tried that and libvirt is
complaining that x2apic is not supported by host CPU.

--- Additional comment from Richard W.M. Jones on 2012-11-01 06:57:32 EDT ---

This works:

  <cpu mode='custom' match='exact'>
    <model fallback='forbid'>SandyBridge</model>
    <feature policy='disable' name='x2apic'/>
    <feature policy='require' name='osxsave'/>
    <feature policy='require' name='pdcm'/>
    <feature policy='require' name='xtpr'/>
    <feature policy='require' name='tm2'/>
    <feature policy='require' name='est'/>
    <feature policy='require' name='vmx'/>
    <feature policy='require' name='ds_cpl'/>
    <feature policy='require' name='monitor'/>
    <feature policy='require' name='dtes64'/>
    <feature policy='require' name='pbe'/>
    <feature policy='require' name='tm'/>
    <feature policy='require' name='ht'/>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='acpi'/>
    <feature policy='require' name='ds'/>
    <feature policy='require' name='vme'/>
  </cpu>

I also tried above plus:
    <feature policy='require' name='avx'/>
which *worked*.  Is that expected?

--- Additional comment from Jiri Denemark on 2012-11-01 10:21:45 EDT ---

The <feature policy='require' name='avx'/> element is redundant since avx is already required by the SandyBridge model, the guest OS should see exactly the same CPU regardless on this element.

Anyway, it's expected that SandyBridge works with avx since it explicitly has support for it. The fact that it doesn't work when it's added on top of Westmere is unfortunate but not entirely surprising. It's likely influenced by bits that are not covered by libvirt, such as cpu family, model, stepping and other stuff. We've seen this behaviour in the past.

I think we need a new mode in addition to custom, host-model, and host-passthrough, that would be similar to host-model but will only use bare CPU model without trying to add all features that are not included in the model but supported by host CPU.

The situation may also become a bit better once we have a better interface for CPU probing (bug 824989).

Eduardo, could you confirm that the kernel panic might be caused by libvirt using Westmere + avx and that is an unsupported configuration? If so, we can move this bug to libvirt.

--- Additional comment from Eduardo Habkost on 2012-11-01 14:17:25 EDT ---

"-cpu Westmere,+avx" actually should enable the bit on CPUID if and only if KVM is able to handle the feature. When KVM can't handle the feature, it should be filtered out before the guest CPUID table is built. I still don't understand why exactly the guest got an invalid operation exception, as the instruction was supposed to be working. Maybe it's related to the "level" field and the xsave feature (that is required for AVX, as far as I recall), that needs level >= 0xD.

I don't know if the guest is really allowed to use the feature when the AVX bit is set but the necessary xsave bits are not present (in it is not, then this is a guest bug). If the guest was simply misled by the CPUID information, and correct in trying to use the instructions, it is a QEMU bug (QEMU should have disabled the feature, and abort in case the "enforce" flag is set). On either case, it is not a libvirt bug to ask for "-cpu Westmere,+avx".

But it would be interesting if libvirt could treat some CPU features as "can be safely disabled". It would be much better if libvirt used "-cpu SandyBridge,-x2apic" on that host, instead of "-cpu Westmere,+<lots of flags>".

--- Additional comment from Cole Robinson on 2012-12-14 17:11:54 EST ---

Reassigning to libvirt based on above discussion

--- Additional comment from Richard W.M. Jones on 2013-08-05 11:42:03 EDT ---

Do we have an update on this?  I would really like to
start using host-model.

--- Additional comment from Richard W.M. Jones on 2013-09-03 06:35:11 EDT ---

This still happens with Fedora 19, libvirt-1.0.5.5-1.fc19.x86_64.

Loading the btrfs module, which loads the xor module, fails
because it tries to run an AVX instruction:

modprobe btrfs
[    1.804591] xor: automatically using best checksumming function:
[    1.806020] invalid opcode: 0000 [#1] SMP 
[    1.806416] Modules linked in: xor(+) snd_pcsp snd_pcm snd_page_alloc ghash_clmulni_intel snd_timer microcode snd soundcore virtio_net virtio_scsi virtio_blk virtio_rng virtio_balloon virtio_mmio sparse_keymap rfkill sym53c8xx scsi_transport_spi crc8 crc_ccitt crc32 crc_itu_t crc32_pclmul crc32c_intel libcrc32c megaraid megaraid_sas megaraid_mbox megaraid_mm
[    1.809709] CPU: 0 PID: 150 Comm: modprobe Not tainted 3.10.9-200.fc19.x86_64.debug #1
[    1.810397] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    1.810931] task: ffff880019cf8000 ti: ffff880019ca4000 task.ti: ffff880019ca4000
[    1.811597] RIP: 0010:[<ffffffffa0119da0>]  [<ffffffffa0119da0>] xor_avx_2+0x50/0x230 [xor]
[    1.812333] RSP: 0018:ffff880019ca5d08  EFLAGS: 00010202
[    1.812811] RAX: 0000000000000007 RBX: ffff880019eb8000 RCX: ffff880019cf8000
[    1.813429] RDX: ffff880019ca5fd8 RSI: 0000000000000000 RDI: 0000000000001000
[    1.814071] RBP: ffff880019ca5d20 R08: 0000000000000002 R09: 0000000000000000
[    1.814684] R10: 0000000000000001 R11: 0000000000000001 R12: ffff880019ebb000
[    1.815320] R13: 0000000000000008 R14: ffff880019eb8000 R15: 00000000fffb732e
[    1.815952] FS:  00007fb93b96a740(0000) GS:ffff88001f000000(0000) knlGS:0000000000000000
[    1.816647] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.817170] CR2: 00007f8d740e9000 CR3: 0000000019e2b000 CR4: 00000000000007f0
[    1.817785] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    1.818424] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    1.819047] Stack:
[    1.819245]  0000000000000000 ffffffffa011c000 ffff880019ebb000 ffff880019ca5d60
[    1.819929]  ffffffffa00a7080 0000000000000005 ffff880019eb8000 ffff880019ebb000
[    1.820651]  ffffffffa011c110 0000000000000001 ffffffffa011c0c0 ffff880019ca5d80
[    1.821354] Call Trace:
[    1.821581]  [<ffffffffa00a7080>] do_xor_speed+0x80/0xe0 [xor]
[    1.822097]  [<ffffffffa00a714b>] calibrate_xor_blocks+0x6b/0xf20 [xor]
[    1.822683]  [<ffffffffa00a70e0>] ? do_xor_speed+0xe0/0xe0 [xor]
[    1.823214]  [<ffffffff810020e2>] do_one_initcall+0xe2/0x1a0
[    1.823723]  [<ffffffff810e95e2>] load_module+0x1c62/0x27d0
[    1.824217]  [<ffffffff813749d0>] ? ddebug_proc_write+0xf0/0xf0
[    1.824748]  [<ffffffff810ea2e6>] SyS_finit_module+0x86/0xb0
[    1.825247]  [<ffffffff81720099>] system_call_fastpath+0x16/0x1b
[    1.825779] Code: 01 00 00 65 48 8b 04 25 f0 c8 00 00 83 80 44 e0 ff ff 01 e8 23 7a f0 e0 4d 85 ed 49 8d 45 ff 0f 84 9b 01 00 00 66 0f 1f 44 00 00 <c4> c1 7d 6f 04 24 c5 fc 57 03 c5 fd 7f 03 c4 c1 7d 6f 4c 24 20 
[    1.828335] RIP  [<ffffffffa0119da0>] xor_avx_2+0x50/0x230 [xor]
[    1.828869]  RSP <ffff880019ca5d08>
[    1.829213] ---[ end trace 70ce68c981f09edb ]---

However using plain old -cpu host on the qemu command line works fine.

--- Additional comment from Richard W.M. Jones on 2013-09-23 17:24:40 EDT ---

Since this bug has been around for almost *a year*, and it's extremely
annoying, I'm trying to work out if this is a bug in the guest kernel,
qemu, or libvirt.  I'm not any closer to working that out.

libvirt passes the following CPU/machine-related flags:

  -machine pc-i440fx-1.6,accel=kvm,usb=off
  -cpu Westmere,+rdtscp,+avx,+osxsave,+xsave,+tsc-deadline,+pcid,+pdcm,+xtpr,+tm2,+est,+vmx,+ds_cpl,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme
  -m 500
  -realtime mlock=off
  -smp 1,sockets=1,cores=1,threads=1

Host CPU flags are reported to be:

flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
cpuid level	: 13

Guest CPU flags are reported to be:

flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc rep_good nopl pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes avx hypervisor lahf_lm
cpuid level	: 13

(In reply to Eduardo Habkost from comment #12)
> "-cpu Westmere,+avx" actually should enable the bit on CPUID if and only if
> KVM is able to handle the feature. When KVM can't handle the feature, it
> should be filtered out before the guest CPUID table is built. I still don't
> understand why exactly the guest got an invalid operation exception, as the
> instruction was supposed to be working. Maybe it's related to the "level"
> field and the xsave feature (that is required for AVX, as far as I recall),
> that needs level >= 0xD.

I see the xsave flag in the host CPU flags, and in the libvirt-generated qemu
command line.  I do NOT see the xsave flag in the guest flags.  Not sure what
that means.

Assuming "level" means "cpuid level", then both report 13 == 0xD.

> I don't know if the guest is really allowed to use the feature when the AVX
> bit is set but the necessary xsave bits are not present (in it is not, then
> this is a guest bug).

As far as I can tell from the kernel code, cpu_has_avx just checks the
avx feature flag.  It doesn't check for xsave.  The xor code which is
throwing the invalid opcode is only checking cpu_has_avx, ie. only checking
for the avx flag.

According to the Intel PRM it does appear that you shouldn't use avx unless
xsave is supported, although it doesn't appear to be an absolute requirement.
I'm assuming it's something to do with those extra registers not being
saved over a context switch, which doesn't sound like an invalid opcode
situation to me (corrupt data OTOH).

Why would xsave bit not be present in the guest?

> If the guest was simply misled by the CPUID
> information, and correct in trying to use the instructions, it is a QEMU bug
> (QEMU should have disabled the feature, and abort in case the "enforce" flag
> is set). On either case, it is not a libvirt bug to ask for "-cpu
> Westmere,+avx".
> 
> But it would be interesting if libvirt could treat some CPU features as "can
> be safely disabled". It would be much better if libvirt used "-cpu
> SandyBridge,-x2apic" on that host, instead of "-cpu Westmere,+<lots of
> flags>".

--- Additional comment from Richard W.M. Jones on 2013-09-23 17:33:14 EDT ---

Also to confirm, the instruction which fails is an AVX instruction
(not xsave):

    1c60:       c4 c1 7d 6f 04 24       vmovdqa (%r12),%ymm0

--- Additional comment from Richard W.M. Jones on 2013-09-24 06:33:42 EDT ---

Still happening in Rawhide (albeit using the F19 kernel, because
the Rawhide kernel has other issues):

libvirt-1.1.2-1.fc21.x86_64
qemu-1.6.0-5.fc21.x86_64
kernel-3.10.10-200.fc19.x86_64

--- Additional comment from Richard W.M. Jones on 2013-09-24 06:51:05 EDT ---

The following program compiled and ran fine on the host, so I
guess that indicates that the host has no problem with AVX
instructions:

        .text
        .globl main
main:
        movq $testdata,%r12
        vmovdqa (%r12),%ymm0
        /*movq (%r12),%r10*/
        movq $0,%rax
        ret

        .data
        .align 32
testdata:
        .float 1,2,3,4,5,6,7,8
Comment 2 Paolo Bonzini 2013-10-04 08:12:50 EDT
Ironically, right now for AVX to work you do not require the AVX CPUID bit (though application will probably not use it unless the CPUID bit is required).  AVX works if the XCR0 register's bit 2 is set.  This requires:

- the XSAVE CPUID feature, otherwise the kernel will not try to set the OSXSAVE bit in CR4

- the OSXSAVE CPUID feature, otherwise the processor will not enable the XSETBV instruction that writes to XCR0.  This feature however is ignored on the command line.  KVM sets it when the kernel writes 1 to the OSXSAVE bit of CR4

- the bit 2 of EAX to be set in CPUID leaf EAX=0xD/ECX=0 (in current RHEL7 QEMU this is always true; later it will be keyed on the AVX CPUID bit, bug 1005695), otherwise the processor will not enable AVX instructions.

- CPUID level to be 13 or higher, otherwise the CPUID leaf is not available.

Thus, "-cpu Westmere,+xsave,+avx,level=13" is required to enable AVX.  Current QEMU will enable it even if you omit "+avx" but that's not future-proof.
Comment 3 Jiri Denemark 2013-10-04 09:40:02 EDT
Oh cool, finally someone who knows something about this area :-) Thanks Paolo. Libvirt currently doesn't model anything but CPU model and features. And when detecting what host CPU is, we only use CPU features, which means we may easily detect the CPU as an older model plus additional features. Thus host-model can select a model+features combination that does not actually work. We need to make host CPU probing smarter (and we plan to involve QEMU in the process see bug 824989) so that the CPU it creates is always usable. Until we do that, using "host-model" is fine if it works for you but it's too fragile to be generally recommended. The same applies to a full copy of host CPU from capabilities XML. I'd suggest using either of the following:

- host-passthrough CPU mode
- just the CPU model from capabilities XML without the additional features; it should be possible to force-add features that QEMU is able to emulate, e.g., <feature name="x2apic" policy="force"/> but I'm not sure if that's safe for all CPU models or not.
Comment 4 Jiri Denemark 2013-10-10 09:03:04 EDT
So all we can do for 7.0 is to better document how fragile host-model is and finally make it better once we have bug 824989 fixed. I'll keep this bug for the documentation changes and clone it for the real work later.
Comment 5 Jiri Denemark 2013-10-22 11:02:51 EDT
Fixed upstream by v1.1.3-190-g34adf62:

commit 34adf622a352cbb80a98162a0f9ac3f9de3f95cb
Author: Jiri Denemark <jdenemar@redhat.com>
Date:   Thu Oct 17 16:02:38 2013 +0200

    docs: Expand description of host-model CPU mode
    
    host-model is a nice idea but it's current implementation make it
    useless on some hosts so it should be used with care.
Comment 8 Jincheng Miao 2013-10-24 06:58:25 EDT
# rpm -q libvirt 
libvirt-1.1.1-10.el7.x86_64

# grep -rn Beware -A 9 /usr/share/doc/libvirt-docs-1.1.1/html/formatdomain.html 
1044:          the capabilities of the new host. <strong>Beware</strong>, due to the
1045-          way libvirt detects host CPU and due to the fact libvirt does not
1046-          talk to QEMU/KVM when creating the CPU model, CPU configuration
1047-          created using <code>host-model</code> may not work as expected. The
1048-          guest CPU may differ from the configuration and it may also confuse
1049-          guest OS by using a combination of CPU features and other parameters
1050-          (such as CPUID level) that don't work. Until these issues are fixed,
1051-          it's a good idea to avoid using <code>host-model</code> and use
1052-          <code>custom</code> mode with just the CPU model from host
1053-          capabilities XML.</dd><dt><code>host-passthrough</code></dt><dd>With this mode, the CPU visible to the guest should be exactly

The explanation is existing in html, so I change the status to VERIFIED.
Comment 9 Ludek Smid 2014-06-13 06:04:36 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.