1184125 – -cpu host: cache CPUID passthrough may not make sense depending on VM CPU topology

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1184125 - -cpu host: cache CPUID passthrough may not make sense depending on VM CPU topology

Summary: -cpu host: cache CPUID passthrough may not make sense depending on VM CPU top...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Eduardo Habkost
QA Contact:	Guo, Zhiyi
Docs Contact:
URL:
Whiteboard:
Depends On:	1169577
Blocks:
TreeView+	depends on / blocked

Reported:	2015-01-20 15:38 UTC by Eduardo Habkost
Modified:	2016-11-07 20:19 UTC (History)
CC List:	19 users (show)
Fixed In Version:	qemu-kvm-rhev-2.6.0-1.el7
Doc Type:	Bug Fix
Doc Text:
Clone Of:	1169577
Environment:
Last Closed:	2016-11-07 20:19:31 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:2673	0	normal	SHIPPED_LIVE	qemu-kvm-rhev bug fix and enhancement update	2016-11-08 01:06:13 UTC

Description Eduardo Habkost 2015-01-20 15:38:49 UTC

Cloning for RHEL-7. The RHEL-6 bug is likely to be closed.

+++ This bug was initially created as a clone of Bug #1169577 +++

Description of problem:
Running a redhat-6.4-64bit (kernel 2.6.32-358.el6.x86_64) or elder guest on
qemu-2.1, with kvm enabled and -cpu host, non default cpu-topology and guest
numa.
I'm seeing a reliable kernel panic from the guest shortly after boot. It is
happening in find_busiest_group().


Version-Release number of selected component (if applicable):
qemu-kvm-2.1.0 (We found it happend since commit
787aaf5703a702094f395db6795e74230282cd62 by git bisect.)

How reproducible:
100%

Steps to Reproduce:
1.config VM with -cpu host, cpu topo and numa-node. The full qemu cmd line:
qemu-system-x86_64 -machine pc-i440fx-2.1,accel=kvm,usb=off \
-cpu host -m 16384 \
-smp 16,sockets=2,cores=4,threads=2 \
-object memory-backend-ram,size=8192M,id=ram-node0 \
-numa node,nodeid=0,cpus=0-7,memdev=ram-node0 \
-object memory-backend-ram,size=8192M,id=ram-node1 \
-numa node,nodeid=1,cpus=8-15,memdev=ram-node1 \
-boot c -drive file=/image/dir/redhat_6.4_64 \
-vnc 0.0.0.0:0 -device cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x1.0x4 \
-msg timestamp=on

2.
3.

Actual results:
Guest kernel divide error panic.

Expected results:
VM started with right numa-topo.

Additional info:

(1)the guest kernel messages:

divide error: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in:

Pid: 1, comm: swapper Not tainted 2.6.32-358.el6.x86_64 #1 QEMU Standard PC (i440FX + PIIX, 1996)
RIP: 0010:[<ffffffff81059a9c>]  [<ffffffff81059a9c>] find_busiest_group+0x55c/0x9f0
RSP: 0018:ffff88023c85f9e0  EFLAGS: 00010046
RAX: 0000000000100000 RBX: ffff88023c85fbdc RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000010 RDI: 0000000000000010
RBP: ffff88023c85fb50 R08: ffff88023ca16c10 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 00000000ffffff01
R13: 0000000000016700 R14: ffffffffffffffff R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001a85000 CR4: 00000000000407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff88023c85e000, task ffff88043d27c040)
Stack:
 ffff88023c85faf0 ffff88023c85fa60 ffff88023c85fbc8 0000000200000000
<d> 0000000100000000 ffff880028210b60 0000000100000001 0000000000000008
<d> 0000000000016700 0000000000016700 ffff88023ca16c00 0000000000016700
Call Trace:
 [<ffffffff8150da2a>] thread_return+0x398/0x76e
 [<ffffffff8150e555>] schedule_timeout+0x215/0x2e0
 [<ffffffff81065905>] ? enqueue_entity+0x125/0x410
 [<ffffffff8150e1d3>] wait_for_common+0x123/0x180
 [<ffffffff81063310>] ? default_wake_function+0x0/0x20
 [<ffffffff8150e2ed>] wait_for_completion+0x1d/0x20
 [<ffffffff81096a89>] kthread_create+0x99/0x120
 [<ffffffff81090950>] ? worker_thread+0x0/0x2a0
 [<ffffffff81167769>] ? alternate_node_alloc+0xc9/0xe0
 [<ffffffff810908d9>] create_workqueue_thread+0x59/0xd0
 [<ffffffff8150ebce>] ? mutex_lock+0x1e/0x50
 [<ffffffff810911bd>] __create_workqueue_key+0x14d/0x200
 [<ffffffff81c47233>] init_workqueues+0x9f/0xb1
 [<ffffffff81c2788c>] kernel_init+0x25e/0x2fe
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffff81c2762e>] ? kernel_init+0x0/0x2fe
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Code: 8b b5 b0 fe ff ff 48 8b bd b8 fe ff ff e8 9d 85 ff ff 0f 1f 44 00 00 48 8b 95 e0
fe ff ff 48 8b 45 a8 8b 4a 08 48 c1 e0 0a 31 d2 <48> f7 f1 48 8b 4d b0 48 89 45 a0
31 c0 48 85 c9 74 0c 48 8b 45
RIP  [<ffffffff81059a9c>] find_busiest_group+0x55c/0x9f0
 RSP <ffff88023c85f9e0>
divide error: 0000 [#2]
---[ end trace d7d20afc6dd05e71 ]---
Kernel panic - not syncing: Fatal exception
Pid: 1, comm: swapper Tainted: G      D    ---------------    2.6.32-358.el6.x86_64 #1
Call Trace:
 [<ffffffff8150cfc8>] ? panic+0xa7/0x16f
 [<ffffffff815111f4>] ? oops_end+0xe4/0x100
 [<ffffffff8100f19b>] ? die+0x5b/0x90
 [<ffffffff81510a34>] ? do_trap+0xc4/0x160
 [<ffffffff8100cf7f>] ? do_divide_error+0x8f/0xb0
 [<ffffffff81059a9c>] ? find_busiest_group+0x55c/0x9f0
 [<ffffffff8113b3a9>] ? zone_statistics+0x99/0xc0
 [<ffffffff8100bdfb>] ? divide_error+0x1b/0x20
 [<ffffffff81059a9c>] ? find_busiest_group+0x55c/0x9f0
 [<ffffffff8150da2a>] ? thread_return+0x398/0x76e
 [<ffffffff8150e555>] ? schedule_timeout+0x215/0x2e0
 [<ffffffff81065905>] ? enqueue_entity+0x125/0x410
 [<ffffffff8150e1d3>] ? wait_for_common+0x123/0x180
 [<ffffffff81063310>] ? default_wake_function+0x0/0x20
 [<ffffffff8150e2ed>] ? wait_for_completion+0x1d/0x20
 [<ffffffff81096a89>] ? kthread_create+0x99/0x120
 [<ffffffff81090950>] ? worker_thread+0x0/0x2a0
 [<ffffffff81167769>] ? alternate_node_alloc+0xc9/0xe0
 [<ffffffff810908d9>] ? create_workqueue_thread+0x59/0xd0
 [<ffffffff8150ebce>] ? mutex_lock+0x1e/0x50
 [<ffffffff810911bd>] ? __create_workqueue_key+0x14d/0x200
 [<ffffffff81c47233>] ? init_workqueues+0x9f/0xb1
 [<ffffffff81c2788c>] ? kernel_init+0x25e/0x2fe
 [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
 [<ffffffff81c2762e>] ? kernel_init+0x0/0x2fe
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

-- the divide error line:
"sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power; "
in update_sg_lb_stats(), file sched.c, line 4094

(2)host info

/proc/cpuinfo on the host has 16 of these:

processor       : 15
vendor_id       : GenuineIntel
cpu family      : 6
model           : 45
model name      : Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz
stepping        : 7
microcode       : 1803
cpu MHz         : 3301.000
cache size      : 10240 KB
physical id     : 1
siblings        : 8
core id         : 3
cpu cores       : 4
apicid          : 39
initial apicid  : 39
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx
est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt
tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts
dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips        : 6599.83
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:


host numa topo:

node 0 cpus: 0 1 2 3 8 9 10 11
node 0 size: 40936 MB
node 0 free: 39625 MB
node 1 cpus: 4 5 6 7 12 13 14 15
node 1 size: 40960 MB
node 1 free: 39876 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10

(3) With "sched_debug loglevel=8" kernel parameter command line,
you can see follow error log(those "ERROR"s):

 CPU0 attaching sched-domain:
  domain 0: span 0-15 level MC
   groups: 0 (cpu_power = 1023) 1 2 3 4 5 6 7 8 9 10 (cpu_power = 1023) 11 12
 13 14 15
 ERROR: parent span is not a superset of domain->span
   domain 1: span 0-7 level CPU
 ERROR: domain->groups does not contain CPU0
    groups: 8-15 (cpu_power = 16382)
 ERROR: groups don't span domain->span
    domain 2: span 0-15 level NODE
     groups:
 ERROR: domain->cpu_power not set

--- Additional comment from Wang Xin on 2014-12-01 21:56:10 EST ---

We found after QEMU commit 787aaf57(target-i386:
forward CPUID cache leaves when -cpu host is used), guest will get cpu cache
from host when -cpu host is used. But if we configure guest numa:
   node 0 cpus 0~7
   node 1 cpus 8~15
then the numa nodes lie in the same host cpu cache (cpus 0~16).
When the guest os boot, calculate group->cpu_power, but the guest find thoes
two different nodes own the same cache, then node1's group->cpu_power
will not be valued, just is the initial value '0'. And when vcpu is scheduled,
division by 0 causes kernel panic.

--- Additional comment from Andrew Jones on 2014-12-02 04:31:52 EST ---

This should be fixed since kernel-2.6.32-395.el6 with

commit 08d7ef55afc468ed6cb29d892b53063dc382c9fa
Author: Radim Krcmar <rkrcmar>
Date:   Wed Jun 5 10:19:02 2013 -0400

    [kernel] sched: make weird topologies bootable

Please update your guest kernel.

--- Additional comment from Wang Xin on 2014-12-03 01:02:01 EST ---

Thanks, Andrew.

Yeah, the patch can avoid guest kernel painc prolem.
While I think the other problem is that QEMU with the right args:

" -cpu host -m 16384 \
-smp 16,sockets=2,cores=4,threads=2 \
-object memory-backend-ram,size=8192M,id=ram-node0 \
-numa node,nodeid=0,cpus=0-7,memdev=ram-node0 \
-object memory-backend-ram,size=8192M,id=ram-node1 \
-numa node,nodeid=1,cpus=8-15,memdev=ram-node1 "

, but emulated a weird topology for VM.

It seems QEMU can't emulate a fine topologies when with
both '-cpu host' and 'guest numa'.

In my example, QEMU can emulate the right cpu topology and
guest numa node according user's config.
But with '-cpu host', QEMU use host CPU cache info driectly
instead of emulated it, which makes the CPU chache info and
the emulated numa topo missmatch.
Whatever, QEMU should ensure the sharing cache's vcpus at
the same guest numa node. Can QEMU build some rules to avoid
create weird topologies for guest?

--- Additional comment from Wang Xin on 2014-12-03 02:16:20 EST ---

furthermore,

Linux kernel makes assumption that cpus sharing last level cache belong to same numa node, while current qemu cannot guarantee that, so some guests panic when boot and newer guests such as RHEL 7(linux 3.10) and linux 3.17 will warn on that, like this:
[    0.139016] ------------[ cut here ]------------
[    0.139016] WARNING: at arch/x86/kernel/smpboot.c:326 topology_sane.isra.1+0x6f/0x80()
[    0.139016] sched: CPU #8's smt-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
[    0.139016] Modules linked in:
[    0.139016] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 3.10.0-123.el7.x86_64 #1
[    0.139016] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    0.139016]  ffff88003e2ffe38 0f82a03969bb4fc8 ffff88003e2ffdf0 ffffffff815e19ba
[    0.139016]  ffff88003e2ffe28 ffffffff8105dee1 0000000000000001 0000000000013c80
[    0.139016]  0000000000000008 0000000000000001 0000000000000000 ffff88003e2ffe90
[    0.139016] Call Trace:
[    0.139016]  [<ffffffff815e19ba>] dump_stack+0x19/0x1b
[    0.139016]  [<ffffffff8105dee1>] warn_slowpath_common+0x61/0x80
[    0.139016]  [<ffffffff8105df5c>] warn_slowpath_fmt+0x5c/0x80
[    0.139016]  [<ffffffff8102ee8d>] ? __mcheck_cpu_init_timer+0x4d/0x60
[    0.139016]  [<ffffffff815cf731>] topology_sane.isra.1+0x6f/0x80
[    0.139016]  [<ffffffff815cfa35>] set_cpu_sibling_map+0x2b9/0x500
[    0.139016]  [<ffffffff815cfe17>] start_secondary+0x19b/0x27b
[    0.139016] ---[ end trace 5508d90aed792a9b ]---

Relevant code in linux kernel 3.17 is:
1.	topology_sane() checks whether 2 cpus are in the same numa node, and if not, it will print a warning;
2.	in match_llc() if two cpus’s cpu_llc_id(last level cache ID) is the same, topology_sane is invoked;
3.	in init_intel_cacheinfo(), l2_id or l3_id(later became this cpu’s cpu_llc_id) is calculated as cpu’s apicid divides num_threads_sharing(number of cpus sharing this l2 or l3 cache). num_threads_sharing is got from cupid() leaf 04 in the guest, which is generated by qemu.

So with cpu model host-passthrough, with Benoît’s patch(http://git.qemu.org/?p=qemu.git;a=commit;h=787aaf5703a702094f395db6795e74230282cd62), cupid leaf 04 is passthroughed from host to guest. But in our host(Dual Xeon E5620), num_threads_sharing is 32, and host cpu socket 0’s apicid is from 0~31, and host cpu socket 1’s cupid is from 32~63, so in host this is right. But for example for a 8-vcpus guest, qemu and seabios will give guest cpu apicid from 0 to 7, and thus guest would think that all vcpus sharing L3 cache, but if guest numa configures vcpu0-vcpu3 in numa node 0, other vcpus in numa node 1, this will confuses guest and guest thinks this is a weird topology as cpus sharing last level cache in different numa nodes, unless vcpu topology is that all vcpus are in different sockets.

Even without Benoît’s patch, or if we use cpu model host-model, qemu will present to guest that all hyper-threads sharing same l2 cache(and no l3 cache), and if we configure vcpus from the same hyper-threads group in different numa nodes(ie, vcpu topology is 1 socket, 1 core, 2 threads, and vcpu0 in numa node 0 and vcpu1 in numa node 1), this is also a weird topology.

So I think, 
1.	cupid leaf 4 should not be directly passthroughed to guest.
2.	and qemu should check whether vcpus in same hyper-thread groups(or sharing last level cache) are in different numa group, if they are, we should stop boot.

Benoît and Paolo, how do you think about revert Benoît’s patch on cupid case 4(leaf 4)?  Or do you have any better idea? 

Thanks.

--- Additional comment from Andrew Jones on 2014-12-03 07:06:13 EST ---

Let's see what Benoît and Paolo say, but in my opinion, if you ask for -cpu host, then you should expect to see what the host sees. Now, a patch to qemu that complains and fails to start the guest when a user requests -cpu host and also some numa topology that doesn't exactly match the host, does seem reasonable.

Another thing to ask is, why is '-cpu host' necessary in your config? Maybe we should be looking at what features we're missing from the cpu models instead. Then, if we add those, it'll allow the config to stay emulated.

--- Additional comment from Paolo Bonzini on 2014-12-04 17:39:52 EST ---

I think it makes sense to remove the automatic "-cpu host" -> pass through the info, and instead add a property like "host_cache_info" that always defaults to false and that can also be used for other models than "-cpu host".

--- Additional comment from Benoît Canet on 2014-12-04 18:24:12 EST ---


Hello,

A bit about the use case which pushed me to write the patch.

Some CPU intensive applications (3DS Simulia) running in the guest uses the cpuid leaves to make best guess about the cpu topology and autotune themselves by using the cpuid results.

So there is a real business use case for this patch and Red Hat probably have some users doing similar compute intensive workload in KVM so I think the option to passthrough this leave should be kept in a way or another.

Best regards

Benoît

--- Additional comment from Wang Xin on 2014-12-05 03:12:07 EST ---

Thanks, paolo. 
It's a good idea, we just need to initialize x86_cpu_def.cache_info_passthrough to false, and turn on it according to opt args, such as "-host_cache_info=[on/off]".

But, there are still problems:
1) The cache info we passthroughed to guest was got by host CPUID at vcpus Initialization. If vcpus are not pinned to pcpus, guest will get the wrong cache info when vcpu be sched to other physical core or physical node. 
   
2) QEMU passthrough host CPUID leaf.04H to guest, while the apic_id was emulated by itself. Without host cpu APIC ID, the guest can't correctly parse the Deterministic Cache Parameters from CPUID 04H.

Unless we solve this two problems, passthrough the cache info to guest does not make sense, and is not correct infomation. Do you think there is need to modify APICID generation code in QEMU/seabios?

Benoît, do you know exactly which information from CPUID 04H guest app needs?

3) Plus, if cpu model is custom or host model, QEMU should check whether vcpu topology threads are in same guest numa node, or same problem exists. Do you think this is OK?

--- Additional comment from Benoît Canet on 2014-12-05 03:28:52 EST ---

Benoît, do you know exactly which information from CPUID 04H guest app needs?

I just asked to the user now I need to wait for a response.

Best regards

Benoît

--- Additional comment from Benoît Canet on 2014-12-05 10:21:41 EST ---


Hello,

If my memories and the one of my customers are exacts this particular application need to guess the L3 cache topology. (mainly size)

Best regards

Benoît

--- Additional comment from Wang Xin on 2014-12-08 02:11:25 EST ---

(In reply to Benoît Canet from comment #11)
> Hello,
> 
> If my memories and the one of my customers are exacts this particular
> application need to guess the L3 cache topology. (mainly size)
> 

Hi, Benoît.
Why not emulate the L3 cache info according to the host cpu?  Have you ever tried it?

> Best regards
> 
> Benoît

Comment 2 Eduardo Habkost 2015-08-19 16:41:42 UTC

It is a regression from qemu-kvm-rhev-1.5.3, proposing exception+ flag.

Comment 4 Eduardo Habkost 2015-08-19 17:13:52 UTC

Upstream patch submitted:

From: Eduardo Habkost <ehabkost>
To: qemu-devel
Subject: [PATCH] target-i386: Disable cache info passthrough by default
Date: Wed, 19 Aug 2015 10:08:22 -0700
Message-Id: <1440004102-4822-1-git-send-email-ehabkost>

http://article.gmane.org/gmane.comp.emulators.qemu/356417

Comment 5 Karen Noel 2015-08-29 02:26:26 UTC

(In reply to Eduardo Habkost from comment #4)
> Upstream patch submitted:
> 
> From: Eduardo Habkost <ehabkost>
> To: qemu-devel
> Subject: [PATCH] target-i386: Disable cache info passthrough by default
> Date: Wed, 19 Aug 2015 10:08:22 -0700
> Message-Id: <1440004102-4822-1-git-send-email-ehabkost>
> 
> http://article.gmane.org/gmane.comp.emulators.qemu/356417

No upstream reviews yet. Many people are on vacation. Maybe worth a ping? Or, choose others to review?

Comment 7 Eduardo Habkost 2015-09-02 14:21:42 UTC

New upstream patch posted to qemu-devel:

  Subject: [PATCH v2] target-i386: Disable cache info passthrough by default
  Date: Wed,  2 Sep 2015 11:19:11 -0300
  Message-Id: <1441203551-15403-1-git-send-email-ehabkost>

Comment 11 Eduardo Habkost 2016-06-23 22:57:07 UTC

Fixed on v2.6.0 rebase.

Comment 12 Guo, Zhiyi 2016-09-08 12:30:26 UTC

Hi Eduardo,

Could you tell QE how to verify this bug? Thanks!

BR/
Guo,Zhiyi

Comment 13 Eduardo Habkost 2016-09-09 19:44:39 UTC

(In reply to Guo, Zhiyi from comment #12)
> Hi Eduardo,
> 
> Could you tell QE how to verify this bug? Thanks!
> 

In case it is not possible to reproduce the crash from the original bug report, you can do this:

1) Start a guest with -cpu host with a CPU topology different from the host (preferably make it very different: use a host with a large number of threads/cores per socket, and start a VM with a smaller number of threads/cores per socket, or vice-versa).
2) Check "lscpu -e" output on the guest.
2.1) On old qemu-kvm-rhev (or latest qemu-kvm-rhev using rhel7.2.0 machine-type), cache topology information will be arbitrary and may not make sense, because it is copied from host CPUID (e.g. CPUs in different sockets may appear sharing the same L1 cache ID, or CPU threads inside the same core may appear _not_ sharing the same L3 cache ID).
2.2) Using newer QEMU with rhel7.3.0 machine-type, cache information should be reasonable and match the CPU topology specified in the command-line (more exactly: different ID for L1i and L1d for each thread, same L2 ID for threads inside the same core, and no L3 cache)

Comment 14 Guo, Zhiyi 2016-09-13 12:54:45 UTC

Reproduce the issue wiht qemu-kvm-rhev-2.3.0-31.el7_2.21.x86_64 and rhel7.3 guest.

qemu cli used:
/usr/libexec/qemu-kvm -name rhel7.3 -m 2048 \
	-cpu host \
        -smp 24,threads=4,cores=6,sockets=1 \
        -device virtio-serial -chardev spicevmc,id=vdagent,debug=0,name=vdagent \
	-spice port=3003,disable-ticketing \
	-device qxl \
        -serial unix:/tmp/m,server,nowait \
        -device virtserialport,chardev=vdagent,name=com.redhat.spice.0 \
        -drive file=/home/ss1rhel73.qcow2,if=none,id=drive-scsi-disk0,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-scsi-pci,id=scsi0 -device scsi-hd,drive=drive-scsi-disk0,bus=scsi0.0,scsi-id=0,lun=0,id=scsi-disk0,bootindex=1 \
        -monitor stdio \
        -netdev tap,id=idinWyYp,vhost=on -device virtio-net-pci,mac=42:ce:a9:d2:4d:d7,id=idlbq7eA,netdev=idinWyYp \

steps:
1.boot guest with qemu cli
2.inside guest, lscpu -e:
CPU NODE SOCKET CORE L1d:L1i:L2 ONLINE
0   0    0      0    0:0:0      yes
1   0    0      0    0:0:0      yes
2   0    0      0    0:0:0      yes
3   0    0      0    0:0:0      yes
4   0    0      1    1:1:0      yes
5   0    0      1    1:1:0      yes
6   0    0      1    1:1:0      yes
7   0    0      1    1:1:0      yes
8   0    0      2    2:2:1      yes
9   0    0      2    2:2:1      yes
10  0    0      2    2:2:1      yes
11  0    0      2    2:2:1      yes
12  0    0      3    3:3:1      yes
13  0    0      3    3:3:1      yes
14  0    0      3    3:3:1      yes
15  0    0      3    3:3:1      yes
16  0    0      4    4:4:2      yes
17  0    0      4    4:4:2      yes
18  0    0      4    4:4:2      yes
19  0    0      4    4:4:2      yes
20  0    0      5    5:5:2      yes
21  0    0      5    5:5:2      yes
22  0    0      5    5:5:2      yes
23  0    0      5    5:5:2      yes

Many threads have same L1i and L1d and threads inside another core have same L2 ID

Verify against qemu-kvm-rhev-2.6.0-24.el7.x86_64:
# lscpu -e
CPU NODE SOCKET CORE L1d:L1i:L2 ONLINE
0   0    0      0    0:0:0      yes
1   0    0      0    1:1:0      yes
2   0    0      0    2:2:0      yes
3   0    0      0    3:3:0      yes
4   0    0      1    4:4:1      yes
5   0    0      1    5:5:1      yes
6   0    0      1    6:6:1      yes
7   0    0      1    7:7:1      yes
8   0    0      2    8:8:2      yes
9   0    0      2    9:9:2      yes
10  0    0      2    10:10:2    yes
11  0    0      2    11:11:2    yes
12  0    0      3    12:12:3    yes
13  0    0      3    13:13:3    yes
14  0    0      3    14:14:3    yes
15  0    0      3    15:15:3    yes
16  0    0      4    16:16:4    yes
17  0    0      4    17:17:4    yes
18  0    0      4    18:18:4    yes
19  0    0      4    19:19:4    yes
20  0    0      5    20:20:5    yes
21  0    0      5    21:21:5    yes
22  0    0      5    22:22:5    yes
23  0    0      5    23:23:5    yes

different ID for L1i and L1d for each thread, same L2 ID for threads inside the same core, and no L3 cache

Comment 18 errata-xmlrpc 2016-11-07 20:19:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2673.html

Note You need to log in before you can comment on or make changes to this bug.