Bug 826811

Summary: xen: RHEL7 guest Call Trace when pinning vCPUs to part of the physical CPUs
Product: Red Hat Enterprise Linux 5 Reporter: Qin Guan <qguan>
Component: kernel-xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: medium    
Version: 5.9CC: drjones, imammedo, juzhang, knoel, leiwang, moli, qwan, xen-maint, yuzhou
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-04 10:28:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
rhel7-3.3.0-0.12-dmesg.txt
none
xm-dmesg.txt none

Description Qin Guan 2012-05-31 02:41:29 UTC
Description of problem:
RHEL7 guest with kernel 3.3.0-0.12 booting up get Call Trace when pinning the vCPU to part of the physical CPUs.

Found on both Intel and AMD host(all with 4 physical CPUs):
 - Intel 5130
 - Intel W3520
 - AMD Phenom(tm) 9600B Quad-Core Processor

Problem happens when:
pin more than 2 (tested with 3,4,8) vCPUs to 2 physical CPUs.

No problem when:
pin 2 vCPUs to 2 physical CPUs.
pin 4 vCPUs to 3 physical CPUs.
pin 8 vCPUs to 4 physical CPUs.

Part of the Call Trace:
-------------
[   28.162700] BUG: soft lockup - CPU#0 stuck for 23s! [migration/0:6]
[   28.162700] Modules linked in:
[   28.162700] irq event stamp: 216598
[   28.162700] hardirqs last  enabled at (216597): [<ffffffff816630f4>] restore_args+0x0/0x30
[   28.162700] hardirqs last disabled at (216598): [<ffffffff8166c4ae>] apic_timer_interrupt+0x6e/0x80
[   28.162700] softirqs last  enabled at (216596): [<ffffffff8106d1a4>] __do_softirq+0x154/0x380
[   28.162700] softirqs last disabled at (216591): [<ffffffff8166ceac>] call_softirq+0x1c/0x30
[   28.162700] CPU 0 
[   28.162700] Modules linked in:
[   28.162700] 
[   28.162700] Pid: 6, comm: migration/0 Not tainted 3.3.0-0.12.el7.x86_64 #1 Red Hat HVM domU
[   28.162700] RIP: 0010:[<ffffffff810f1d1a>]  [<ffffffff810f1d1a>] stop_machine_cpu_stop+0x7a/0x110
[   28.162700] RSP: 0018:ffff88003d327d80  EFLAGS: 00000293
[   28.162700] RAX: 0000000000000001 RBX: ffffffff816630f4 RCX: 0000000000000e20
[   28.162700] RDX: ffff88003d32a600 RSI: 0000000000000001 RDI: ffff88003d2a7d88
[   28.162700] RBP: ffff88003d327db0 R08: 0000000000000000 R09: 0000000000000000
[   28.162700] R10: ffffffff810f1ca0 R11: 0000000000000000 R12: ffff88003d327cf8
[   28.162700] R13: ffff88003d32a680 R14: ffff88003d326000 R15: ffff88003d327fd8
[   28.162700] FS:  0000000000000000(0000) GS:ffff88003de00000(0000) knlGS:0000000000000000
[   28.162700] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   28.162700] CR2: 0000000000000000 CR3: 0000000001937000 CR4: 00000000000006f0
[   28.162700] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   28.162700] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   28.162700] Process migration/0 (pid: 6, threadinfo ffff88003d326000, task ffff88003d32a680)
[   28.162700] Stack:
[   28.162700]  ffff88003d327d90 ffff88003d32a680 ffff88003dfcefa0 ffff88003d327fd8
[   28.162700]  ffff88003d327fd8 ffff88003d2a7ca8 ffff88003d327e90 ffffffff810f1e86
[   28.162700]  ffffffff810f1ca0 ffff88003d2a7d88 0000000000000000 0000000000000000
[   28.162700] Call Trace:
[   28.162700]  [<ffffffff810f1e86>] cpu_stopper_thread+0xd6/0x1b0
[   28.162700]  [<ffffffff810f1ca0>] ? cpu_stop_queue_work+0x80/0x80
[   28.162700]  [<ffffffff8166088b>] ? __schedule+0x42b/0x9a0
[   28.162700]  [<ffffffff810f1db0>] ? stop_machine_cpu_stop+0x110/0x110
[   28.162700]  [<ffffffff8108d787>] kthread+0xb7/0xc0
[   28.162700]  [<ffffffff8166cdb4>] kernel_thread_helper+0x4/0x10
[   28.162700]  [<ffffffff816630f4>] ? retint_restore_args+0x13/0x13
[   28.162700]  [<ffffffff8108d6d0>] ? kthread_worker_fn+0x1a0/0x1a0
[   28.162700]  [<ffffffff8166cdb0>] ? gs_change+0x13/0x13
[   28.162700] Code: 90 41 83 fc 03 74 4a f0 ff 4b 24 0f 94 c2 84 d2 44 89 e0 74 0f 8b 53 20 8b 4b 10 83 c2 01 89 4b 24 89 53 20 83 f8 04 74 38 f3 90 <44> 8b 63 20 41 39 c4 74 f0 41 83 fc 02 75 c7 fa 66 66 90 66 66 
---------

Version-Release number of selected component (if applicable):

Host: 2.6.18-318.el5xen, xen-3.0.3-139.el5
Guest: 3.3.0-0.12.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Check the physical CPU numbers:
# cat /proc/cpuinfo | grep processor | wc -l
4

2. Pin vCPUs to part of the physical CPUs:
# xm cr hvm-rhel7.0-x86_64 cpus="0-1" vcpus=4 -c
or
# xm cr hvm-rhel7.0-x86_64 cpus="2-3" vcpus=4 -c

3. Check the guest boot console

Actual results:
Call Trace message prints when guest booting up

Expected results:
No Call Trace message prints when guest booting up

Additional info:

Comment 1 Qin Guan 2012-05-31 02:43:32 UTC
Created attachment 587925 [details]
rhel7-3.3.0-0.12-dmesg.txt

Comment 2 Qin Guan 2012-05-31 02:43:58 UTC
Created attachment 587926 [details]
xm-dmesg.txt

Comment 3 Igor Mammedov 2012-05-31 09:33:55 UTC
Qin Guan,

can you provide access to the host where problem reproduces?

Comment 5 Igor Mammedov 2012-06-04 09:29:59 UTC
I've checked guest's core-dump and all guest's cpus have a correctly initialized stopper task in a running state. The issue is that one of VCPUs is not scheduled [often enough|never] by xen.

# xm vcpu-list hvm-rhel7u0-x86_64
Name                              ID VCPUs   CPU State   Time(s) CPU Affinity
hvm-rhel7u0-x86_64                 2     0     0   ---  198461.8 0-1
hvm-rhel7u0-x86_64                 2     1     1   r--  156968.5 0-1
hvm-rhel7u0-x86_64                 2     2     1   ---  118720.0 0-1
hvm-rhel7u0-x86_64                 2     3     1   ---       0.0 0-1

shows that vcpu 3 wasn't scheduled at all after it came online.
Taking trace with command:
 xentrace -c 3 -D -S 256 -e 0x2f00a trace.raw
and then counting __enter_scheduler entries with VCPU3, yields 0 which confirms that VCPU3 isn't scheduled by xen at all.
Rebinding stuck VCPU to PCPU not bound to domain yet, often fixes problem.

This happens only if # of bound PCPUs to domain is less than # of VCPUs.
Issue affects only specific hosts (intel-5130-16-1.englab.nay.redhat.com ). And it doesn't reproduce on every system (I've wasn't able to reproduce it on dell p610 and on Lenovo T510).

Bug 570056 and bug 541840 considered case of binding to 1 PCPU to more than 1 VCPU as invalid configuration and so user-space was 'enhanced' to report error for such invalid config [i.e. guest config with 1PCPU bound to 2..n VCPUs].
This bug is practically the same case where several VCPUs ends up to be bound to a single PCPU but with guest config that bounds #N PCPUs to #M VCPUs where  N < M.

This late in life-cycle of RHEL5 it would be very risky to touch xen scheduler code to really fix issue and very time consuming to debug it on remote host host to find out what's wrong. So I think it isn't worth to fix such corner case and propose to WONTFIX it.