Bug 826811
Summary: | xen: RHEL7 guest Call Trace when pinning vCPUs to part of the physical CPUs | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Qin Guan <qguan> | ||||||
Component: | kernel-xen | Assignee: | Xen Maintainance List <xen-maint> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 5.9 | CC: | drjones, imammedo, juzhang, knoel, leiwang, moli, qwan, xen-maint, yuzhou | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2012-06-04 10:28:01 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Qin Guan
2012-05-31 02:41:29 UTC
Created attachment 587925 [details]
rhel7-3.3.0-0.12-dmesg.txt
Created attachment 587926 [details]
xm-dmesg.txt
Qin Guan, can you provide access to the host where problem reproduces? I've checked guest's core-dump and all guest's cpus have a correctly initialized stopper task in a running state. The issue is that one of VCPUs is not scheduled [often enough|never] by xen. # xm vcpu-list hvm-rhel7u0-x86_64 Name ID VCPUs CPU State Time(s) CPU Affinity hvm-rhel7u0-x86_64 2 0 0 --- 198461.8 0-1 hvm-rhel7u0-x86_64 2 1 1 r-- 156968.5 0-1 hvm-rhel7u0-x86_64 2 2 1 --- 118720.0 0-1 hvm-rhel7u0-x86_64 2 3 1 --- 0.0 0-1 shows that vcpu 3 wasn't scheduled at all after it came online. Taking trace with command: xentrace -c 3 -D -S 256 -e 0x2f00a trace.raw and then counting __enter_scheduler entries with VCPU3, yields 0 which confirms that VCPU3 isn't scheduled by xen at all. Rebinding stuck VCPU to PCPU not bound to domain yet, often fixes problem. This happens only if # of bound PCPUs to domain is less than # of VCPUs. Issue affects only specific hosts (intel-5130-16-1.englab.nay.redhat.com ). And it doesn't reproduce on every system (I've wasn't able to reproduce it on dell p610 and on Lenovo T510). Bug 570056 and bug 541840 considered case of binding to 1 PCPU to more than 1 VCPU as invalid configuration and so user-space was 'enhanced' to report error for such invalid config [i.e. guest config with 1PCPU bound to 2..n VCPUs]. This bug is practically the same case where several VCPUs ends up to be bound to a single PCPU but with guest config that bounds #N PCPUs to #M VCPUs where N < M. This late in life-cycle of RHEL5 it would be very risky to touch xen scheduler code to really fix issue and very time consuming to debug it on remote host host to find out what's wrong. So I think it isn't worth to fix such corner case and propose to WONTFIX it. |