541840 – PV guest hang on boot when VCPU bound to specific CPU

Bug 541840 - PV guest hang on boot when VCPU bound to specific CPU

Summary: PV guest hang on boot when VCPU bound to specific CPU

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	xen
Sub Component:
Version:	5.4
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Miroslav Rezanina
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	570056 (view as bug list)
Depends On:
Blocks:	514500
TreeView+	depends on / blocked

Reported:	2009-11-27 09:37 UTC by Yufang Zhang
Modified:	2011-01-13 22:19 UTC (History)
CC List:	13 users (show)
Fixed In Version:	xen-3.0.3-117.el5
Doc Type:	Bug Fix
Doc Text:	Xen guests will not boot with a configuration that binds multiple vcpus to a single cpu.
Clone Of:
Environment:
Last Closed:	2011-01-13 22:19:30 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
xend.log (177.38 KB, text/plain) 2009-11-27 09:37 UTC, Yufang Zhang	no flags	Details
config file used to create the pv guest (5.42 KB, text/plain) 2009-11-27 09:39 UTC, Yufang Zhang	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2011:0031	0	normal	SHIPPED_LIVE	xen bug fix and enhancement update	2011-01-12 15:59:24 UTC

Description Yufang Zhang 2009-11-27 09:37:22 UTC

Created attachment 374174 [details]
xend.log

Description of problem:
When bind VCPU to specific CPU (via cpus parameter (cpus = "0") in config file),the PV guest hang on boot time. And there is no output when attach console to PV guest:

# xm cr /etc/xen/test_pv -c
Using config file "/etc/xen/test_pv".
file /root/pv.img
Started domain PvDomain



# xm li
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0     3409     4 r-----    163.3
PvDomain                                  16      512     4 ------   1156.6


# xm vcpu-list PvDomain
Name                              ID VCPUs   CPU State   Time(s) CPU Affinity
PvDomain                          16     0     0   r--    1293.3 0
PvDomain                          16     1     0   -b-       0.1 0
PvDomain                          16     2     0   ---       0.0 0
PvDomain                          16     3     0   ---       0.0 0

# virsh vcpuinfo PvDomain
VCPU:           0
CPU:            0
State:          running
CPU time:       1351.0s
CPU Affinity:   y---

VCPU:           1
CPU:            0
State:          idle
CPU time:       0.1s
CPU Affinity:   y---

VCPU:           2
CPU:            0
State:          no state
CPU time:       0.0s
CPU Affinity:   y---

VCPU:           3
CPU:            0
State:          no state
CPU Affinity:   y---


Version-Release number of selected component (if applicable):
xen-3.0.3-94.el5

How reproducible:
Always

Steps to Reproduce:
1. In config file of PV guest, add:
   cpus = "0"
   vcpus = 4
2. create PV guest with this config file

  
Actual results:
The guest hang at boot time. 

Expected results:
The guest boot successfully.


Additional info:
xend.log uploaded.

Comment 1 Yufang Zhang 2009-11-27 09:39:39 UTC

Created attachment 374175 [details]
config file used to create the pv guest

Comment 2 Paolo Bonzini 2009-12-01 14:51:37 UTC

I can reproduce it, though for me it hangs here:

Grant table initialized
NET: Registered protocol family 16
Initializing CPU#1
migration_cost=14
migration_cost=14
Initializing CPU#2
migration_cost=14
Brought up 4 CPUs
PCI: setting up Xen PCI frontend stub
Initializing CPU#3
ACPI: Interpreter disabled.
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI: disabled
xen_mem: Initialising balloon driver.
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: System does not support PCI
PCI: System does not support PCI
NetLabel: Initializing
NetLabel:  domain hash size = 128
NetLabel:  protocols = UNLABELED CIPSOv4
NetLabel:  unlabeled traffic allowed by default
NET: Registered protocol family 2

What would come later is this:

IP route cache hash table entries: 32768 (order: 6, 262144 bytes)
TCP established hash table entries: 131072 (order: 9, 2097152 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
audit: initializing netlink socket (disabled)
type=2000 audit(1259679049.404:1): initialized
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
Initializing Cryptographic API
alg: No test for crc32c (crc32c-generic)
ksign: Installing public key data
Loading keyring
- Added public key 23B7022ABAB774D1
- User ID: Red Hat, Inc. (Kernel Module GPG key)

Comment 3 Paolo Bonzini 2009-12-01 14:54:35 UTC

It's broken every time the number of fields in "cpus" does not match the number of vcpus.

Moving it to kernel-xen, though if the configuration is declared bogus we may move it back to Xen and fix it in xm.

Comment 4 Don Dutile (Red Hat) 2009-12-01 16:53:15 UTC

and this isn't a problem (from boot log):????

PCI: System does not support PCI
PCI: System does not support PCI

that would suggest a broken bios callback in guest...

Comment 6 Andrew Jones 2010-03-11 09:27:57 UTC

Reproduced and grabbed a stack. I don't see anything unusual in the boot messages that were displayed before the hang. This occurs any time we attempt to bind multiple vcpus to a single cpu, i.e. 'cpus=0-1 vcpus=4' and 'cpus=0 vcpus=1' work.

# xenctx -s a/System.map-2.6.18-187.el5xen 2

rip: ffffffff80274476 __smp_call_function_many+0x94

rsp: ffff8800024c14e0

rax: 00000001   rbx: ffff8800024c1560   rcx: 00000000   rdx: ffffffffff578000

rsi: ffff8800024c1480   rdi: 00000000   rbp: 00000003

 r8: 00000001    r9: ffff8800024c1560   r10: ffffffff80736fe0   r11: 00000002

r12: 00000001   r13: ffff8800024c15e0   r14: ffffffff802d0ff4   r15: ffff8800024c1560

 cs: 0000e033    ds: 00000000    fs: 00000000    gs: 00000000



Stack:

 ffffffff802d0ff4 ffff8800024c15e0 0000000100000001 ffffffff00000001

 0000000000000001 ffff88003fc46700 0000000000000001 ffff8800024c15e0

 ffffffff802d0ff4 ffffffff802745a3 00000000000000ff 0000000000000001

 0000000000000001 ffff8800024c15e0 ffffffff802d0ff4 ffffffff8027469d



Code:

89 de 48 8b 40 30 f3 a5 bf 01 00 00 00 ff d0 48 83 c4 20 eb 00 <8b> 44 24 10 39 e8 75 f8 45 85 e4 



Call Trace:

  [<ffffffff80274476>] __smp_call_function_many+0x94 <--

  [<ffffffff802d0ff4>] do_drain+0x5a

  [<ffffffff802d0ff4>] do_drain+0x5a

  [<ffffffff802745a3>] smp_call_function_many+0x38

  [<ffffffff802d0ff4>] do_drain+0x5a

  [<ffffffff8027469d>] smp_call_function+0x4e

  [<ffffffff802d0ff4>] do_drain+0x5a

  [<ffffffff8028fe3d>] on_each_cpu+0x10

  [<ffffffff802d0907>] do_tune_cpucache+0xa5

  [<ffffffff80223717>] cache_estimate+0x89

  [<ffffffff802d0d28>] enable_cpucache+0x4f

  [<ffffffff8023abcb>] kmem_cache_create+0x3aa

  [<ffffffff802ffb58>] sysfs_make_dirent+0x1b

  [<ffffffff8065bf94>] utrace_init+0x22

  [<ffffffff8064c7eb>] init+0x1f9

  [<ffffffff80260b2c>] child_rip+0xa

  [<ffffffff8064c5f2>] do_early_param+0x57

  [<ffffffff80260b22>] kernel_thread+0xde

Comment 8 Andrew Jones 2010-03-11 13:29:28 UTC

Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

New Contents:
Xen guests will not boot with a configuration that binds multiple vcpus to a single cpu.

Comment 9 Andrew Jones 2010-03-11 14:08:47 UTC

Note this is a similar, and possibly the same, issue as bug 570056

Comment 10 Paolo Bonzini 2010-03-11 15:47:11 UTC

I would change this bug to the xen component and, for RHEL5.6, disable this configuration.  Anyone disagrees?

Comment 11 Paolo Bonzini 2010-06-02 13:23:27 UTC

Moving to userspace so that we can forbid this configuration.

Comment 14 Paolo Bonzini 2010-06-25 11:38:11 UTC

*** Bug 570056 has been marked as a duplicate of this bug. ***

Comment 16 Miroslav Rezanina 2010-10-01 07:08:57 UTC

Fix built into  xen-3.0.3-117.el5

Comment 18 Yufang Zhang 2010-10-28 03:25:20 UTC

QA verified this bug on xen-3.0.3-117.el5: 

Create a PV guest with cpus set as '0':

# xm cr /tmp/xm-test.cfg cpus=0
Using config file "/tmp/xm-test.cfg".
Using <class 'grub.GrubConf.GrubConfigFile'> to parse /grub/menu.lst
Error: Can't bind more vcpus to single cpu

So change this bug to VERIFIED.

Comment 19 Ian Moroney 2010-10-28 15:40:19 UTC

I've tested this on a HP DL165 G7 machine with 2x 12 core AMD processors and was able to replicate the problem. I tried to assign 4, 6, 8, and 12 cores to a VM.

On a HP ML350 G5 with 1x quad core processor, the problem does not exist. I successfully assigned 8 cores to 1 vm.

Is it possible that the problem is only related to AMD processors?

Comment 20 Yufang Zhang 2010-10-29 03:29:33 UTC

(In reply to comment #19)
> I've tested this on a HP DL165 G7 machine with 2x 12 core AMD processors and
> was able to replicate the problem. I tried to assign 4, 6, 8, and 12 cores to a
> VM.
> 
> On a HP ML350 G5 with 1x quad core processor, the problem does not exist. I
> successfully assigned 8 cores to 1 vm.
> 
> Is it possible that the problem is only related to AMD processors?

Unfortunately, no. I can reproduce this bug on an Intel Q9400 machine(Dell 760). So Intel processors also suffer from such problem (:

Comment 22 errata-xmlrpc 2011-01-13 22:19:30 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0031.html

Note You need to log in before you can comment on or make changes to this bug.