Bug 1200685

Summary: RHEL6 64bit guest hangs during boot on 7.2 host when default VCPU->NUMA mapping is used
Product: Red Hat Enterprise Linux 7 Reporter: Yanhui Ma <yama>
Component: qemu-kvm-rhevAssignee: Igor Mammedov <imammedo>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 7.2CC: ehabkost, imammedo, juzhang, lmiksik, michen, mrezanin, virt-maint, xfu, zhguo
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: upstream 2.3 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-04 16:31:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg
none
host cpu info
none
guest full console logs
none
host full dmesg
none
RHEL6.6 guest full console logs
none
call trace info none

Description Yanhui Ma 2015-03-11 07:51:27 UTC
Created attachment 1000284 [details]
dmesg

Description of problem:
when I boot a RHEL6.7 64bit guest on 7.2 host and hotplug 17 cpus(sometimes 25,to reproduct it you can increase it) in guest,call trace info appears or guest hangs,there are only 4 cpus in /proc/cpuinfo.

Version-Release number of selected component (if applicable):

qemu-kvm-rhev-2.2.0-5.el7.x86_64
3.10.0-230.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.boot a guest with the following qemu command:
/usr/libexec/qemu-kvm -M pc-i440fx-rhel7.1.0 -m 4G 
-cpu Opteron_G3,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_time 
-smp 1,sockets=60,cores=4,threads=1,maxcpus=240 \

-monitor stdio -vga qxl -spice port=5900,disable-ticketing \
 
-drive file=/home/RHEL-Server-6.7-64-virtio.qcow2,if=none,id=drive-data-disk1,cache=none,format=qcow2,aio=threads,werror=stop,rerror=stop -device ide-drive,drive=drive-data-disk1,id=data-disk1 \
 
-netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device e1000,netdev=hostnet0,id=virtio-net-pci0,mac=00:24:21:7f:b6:11,bus=pci.0,addr=0x9\
 
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 
-global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 \

-object memory-backend-ram,host-nodes=0,id=mem-0,policy=bind,prealloc=yes,size=2G -numa node,nodeid=0,memdev=mem-0 -object memory-backend-ram,host-nodes=0,id=mem-1,policy=bind,prealloc=yes,size=2G -numa node,nodeid=1,memdev=mem-1 \

-usb -device usb-tablet,id=input0 -qmp tcp:0:5559,server,nowait
2. hotplug 17 vcpus
3.

Actual results:
Call trace info appears, there are only 4 cpus in /proc/cpuinfo,and guest hangs

Expected results:
no call trace info,guest works well and gets 17 cpus 

Additional info:
The problem cannot be hitted with qemu-kvm-rhev-2.1.2-23.el7 and qemu-kvm-1.5.3-86.el7.
There are 32 cpus in host.The cpu info of host is attached.

Comment 1 Yanhui Ma 2015-03-11 07:57:12 UTC
Created attachment 1000288 [details]
host cpu info

Comment 3 FuXiangChun 2015-03-11 08:03:18 UTC
As qemu-kvm-rhev-2.1.2-23.el7 didn't hit this issue.  So set this bug is as regression bug,  If I am wrong, please remove regression from keywords. 

Another, 7.1 works well.

Comment 4 FuXiangChun 2015-03-11 08:05:27 UTC
sorry, correct it comment 3,  7.1/7.1 guest.

Comment 6 Igor Mammedov 2015-03-11 12:57:23 UTC
FuXiangChun,

Please always attach full console/demesg logs from guest/host to BZ.

Also
could you please reproduce issue, leave it in hang state and provide access to host where it happened.

Comment 7 Yanhui Ma 2015-03-12 02:36:57 UTC
Created attachment 1000749 [details]
guest full console logs

Comment 8 Yanhui Ma 2015-03-12 02:38:59 UTC
Created attachment 1000750 [details]
host full dmesg

Comment 10 Igor Mammedov 2015-03-12 16:28:39 UTC
Yanhui Ma,

Could you try to reproduce bug with RHEL6.6 guest kernel?

Comment 11 Yanhui Ma 2015-03-13 03:46:01 UTC
(In reply to Igor Mammedov from comment #10)
> Yanhui Ma,
> 
> Could you try to reproduce bug with RHEL6.6 guest kernel?

I have reproduced bug with RHEL6.6 guest kernel.The guest hang is not 100% reproduced, call trace and only 4 cpus successfully hotpluged  can be 100% reproduced. Another I have attached the RHEL6.6 guest full console logs.
You can access the host to see it according to comment 9.

Comment 12 Yanhui Ma 2015-03-13 03:47:40 UTC
Created attachment 1001271 [details]
RHEL6.6 guest full console logs

Comment 13 Igor Mammedov 2015-03-17 16:36:47 UTC
So here goes my analysis:

 1. I wasn't able to reproduce bug locally (probably due to lack of effort)
    but it's reproducible reliably on host in comment 9

 2. issue has nothing to do with cpu-hotplug, regular boot is affected as well when CPU count 5 and we have 2 NUMA nodes, here is striped down reproducer:

 qemu-kvm -m 4G -smp 5,sockets=1,cores=4,threads=1,maxcpus=8 -numa node,nodeid=0 -numa node,nodeid=1 -drive file=/home/rhel66-64-virtio.qcow2,if=virtio

 3. CPU[0] hangs in smp_call_function_many() waiting on execution of call on CPU[4], but CPU[4] loops in update_sd_lb_stats() due to incorrectly initialized
sched_groups looping on condition "while (sg != sd->groups)" since the last 'sg->next' instead of pointing to head 'sd->groups' points to itself.

crash> bt -a
PID: 1      TASK: ffff88007f307500  CPU: 0   COMMAND: "swapper"
    [exception RIP: smp_call_function_many+482]
[...]
 #0 [ffff88007f309ce8] smp_call_function at ffffffff810c7c22
 #1 [ffff88007f309cf8] on_each_cpu at ffffffff8108e1fd
 #2 [ffff88007f309d28] do_tune_cpucache at ffffffff8118186b
 #3 [ffff88007f309d98] enable_cpucache at ffffffff81181d36
 #4 [ffff88007f309dc8] setup_cpu_cache at ffffffff8150fcc2
 #5 [ffff88007f309e08] kmem_cache_create at ffffffff81182232
 #6 [ffff88007f309eb8] shmem_init at ffffffff81c4f27e
 #7 [ffff88007f309ed8] kernel_init at ffffffff81c29ef6
 #8 [ffff88007f309f48] kernel_thread at ffffffff8100c1ca
[...]

PID: 27     TASK: ffff88013f28eae0  CPU: 4   COMMAND: "events/4"
[...]
 #0 [ffff88013f295a58] update_sd_lb_stats at ffffffff8106fc8d
 #1 [ffff88013f295b38] find_busiest_group at ffffffff8106fe7a
 #2 [ffff88013f295c18] load_balance_newidle at ffffffff81070a57
 #3 [ffff88013f295d08] idle_balance at ffffffff8107e38e
 #4 [ffff88013f295d88] schedule at ffffffff815304e0
 #5 [ffff88013f295e38] worker_thread at ffffffff810aa640
 #6 [ffff88013f295ee8] kthread at ffffffff810af4d0
 #7 [ffff88013f295f48] kernel_thread at ffffffff8100c1ca

 4. Issue is caused by commit:
     dd0247e0 pc: acpi: mark all possible CPUs as enabled in SRAT
    which makes kernel actually use QEMU supplied numa mapping in SRAT for vCPUs, before that commit guest kernel was discarding CPU related SRAT info.

 5. Problem is that QEMU by default distributes vCPUs among NUMA nodes in round-robin order, which leads to insane topology where VCPU threads from one socket(package) end up in different NUMA nodes.

Setting cpus to node mapping manually with sane topology (i.e. threads from the same socket are on the same node), makes bug go away:
 -numa node,nodeid=0,cpus=0-3 -numa node,nodeid=1,cpus=4-7

Comment 14 Igor Mammedov 2015-03-17 16:52:10 UTC
 Yanhui Ma,

1. Could you test if problem affects RHEL7, RHEL5 and Windows guests?

2. Also after #1 try following scratch build with fix:
https://brewweb.devel.redhat.com/taskinfo?taskID=8861393

Comment 15 Igor Mammedov 2015-03-19 10:14:20 UTC
to reproduce on Intel host following options have to be added to reproducer from comment 13:
 -cpu Opteron_G3,vendor=AuthenticAMD

Comment 16 Igor Mammedov 2015-03-19 10:34:15 UTC
Upstream fix posted:
https://lists.gnu.org/archive/html/qemu-devel/2015-03/msg04008.html

Comment 18 Igor Mammedov 2015-03-19 13:04:08 UTC
Tested with different Windows versions, it's 90% reproducible with WS2012x64,
also seen it with WS2012R2x64 but only once.

Windows fails to boot going into reboot cycle with C4 error.

Comment 19 Yanhui Ma 2015-03-20 04:51:45 UTC
(In reply to Igor Mammedov from comment #14)
>  Yanhui Ma,
> 
> 1. Could you test if problem affects RHEL7, RHEL5 and Windows guests?
> 
Steps to Reproduce:
1. boot a HELR7.1/RHEL5.11/win-server-2008r2 guest with following qemu command line:

 /usr/libexec/qemu-kvm -M pc-i440fx-rhel7.1.0 -m 4G -cpu Opteron_G3,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_time -smp 5,sockets=1,cores=4,threads=1,maxcpus=8 -numa node,nodeid=0 -numa node,nodeid=1 \

-monitor stdio -vga qxl -spice port=5900,disable-ticketing  \

-drive file=/home/rhel59-64-virtio.qcow2,if=none,id=drive-data-disk1,cache=none,format=qcow2,aio=threads,werror=stop,rerror=stop -device ide-drive,drive=drive-data-disk1,id=data-disk1 \

-netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device e1000,netdev=hostnet0,id=virtio-net-pci0,mac=00:24:21:7f:b6:11,bus=pci.0,addr=0x9 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -usb -device usb-tablet,id=input0


results:

RHEL5.11 and RHEL6.7 hit the issue, RHEL7.1 and win-server-2008r2 does not hit the issue.

> 2. Also after #1 try following scratch build with fix:
> https://brewweb.devel.redhat.com/taskinfo?taskID=8861393


test again with the above fixed build, RHEL5.11,RHEL6.7,RHEL7.1 and win-server-2008r2 can not hit the issue.

Comment 20 Igor Mammedov 2015-03-20 13:51:03 UTC
Fixed upstream in 2.3

fb43b73 pc: fix default VCPU to NUMA node mapping
57924bc numa: introduce machine callback for VCPU to node mapping

Please retest when qemu-kvm-rhev is rebased to 2.3 version.

Comment 23 Yanhui Ma 2015-07-27 07:09:45 UTC
Reproduce this issue.

host info:
qemu-kvm-rhev-2.2.0-8.el7.x86_64
3.10.0-230.el7.x86_64

Steps to Reproduce:
1. boot a RHEL6.7/win2012r2x64 guest with following qemu command line:

 /usr/libexec/qemu-kvm -M pc-i440fx-rhel7.1.0 -m 4G -cpu Opteron_G3,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_time -smp 5,sockets=1,cores=4,threads=1,maxcpus=8 -numa node,nodeid=0 -numa node,nodeid=1 \

-monitor stdio -vga qxl -spice port=5900,disable-ticketing  \

-drive file=/home/RHEL-Server-6.7-64-virtio.qcow2,if=none,id=drive-data-disk1,cache=none,format=qcow2,aio=threads,werror=stop,rerror=stop -device ide-drive,drive=drive-data-disk1,id=data-disk1 \

-netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device e1000,netdev=hostnet0,id=virtio-net-pci0,mac=00:24:21:7f:b6:11,bus=pci.0,addr=0x9 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -usb -device usb-tablet,id=input0

results:
rhel6.7 guest hit call trace info during boot,see attachment, win2012r2x64 guest does not hit the issue.

#########################################################

Verified this issue.
host info:
3.10.0-297.el7.x86_64
qemu-kvm-rhev-2.3.0-12.el7.x86_64

steps is the same as above

Result:
qemu prints out following info:
cpu topology: error: sockets (1) * cores (4) * threads (1) < smp_cpus (5)

hi, Igor
Is it the expected result? I remember guest can boot up successfully with scratch build in comment 14. If the qemu error is the expected result, then the bug has been fixed.

Comment 24 Yanhui Ma 2015-07-27 07:11:00 UTC
Created attachment 1056469 [details]
call trace info

Comment 25 Igor Mammedov 2015-07-27 12:16:48 UTC
(In reply to Yanhui Ma from comment #23)
> Reproduce this issue.
[...]
> Result:
> qemu prints out following info:
> cpu topology: error: sockets (1) * cores (4) * threads (1) < smp_cpus (5)
> 
> hi, Igor
> Is it the expected result? I remember guest can boot up successfully with
> scratch build in comment 14. If the qemu error is the expected result, then
> the bug has been fixed.

bug was fixed upstream by commit
 fb43b73b "pc: fix default VCPU to NUMA node mapping"
and error message was introduced upstream by 
 ec2cbbdd8 "vl: Don't silently change topology when all -smp options were set"
after 2.2 release which scratch build is based on.

So yes, I'd say exit with error is expected result.

Comment 26 Yanhui Ma 2015-07-28 02:02:47 UTC
(In reply to Igor Mammedov from comment #25)
> (In reply to Yanhui Ma from comment #23)
> > Reproduce this issue.
> [...]
> > Result:
> > qemu prints out following info:
> > cpu topology: error: sockets (1) * cores (4) * threads (1) < smp_cpus (5)
> > 
> > hi, Igor
> > Is it the expected result? I remember guest can boot up successfully with
> > scratch build in comment 14. If the qemu error is the expected result, then
> > the bug has been fixed.
> 
> bug was fixed upstream by commit
>  fb43b73b "pc: fix default VCPU to NUMA node mapping"
> and error message was introduced upstream by 
>  ec2cbbdd8 "vl: Don't silently change topology when all -smp options were
> set"
> after 2.2 release which scratch build is based on.
> 
> So yes, I'd say exit with error is expected result.

Thanks, if so, I think the bug was fixed.

Comment 27 juzhang 2015-08-03 04:12:47 UTC
According to comment23-comment26, set this issue as verified.

Comment 29 errata-xmlrpc 2015-12-04 16:31:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2546.html