Created attachment 374174 [details] xend.log Description of problem: When bind VCPU to specific CPU (via cpus parameter (cpus = "0") in config file),the PV guest hang on boot time. And there is no output when attach console to PV guest: # xm cr /etc/xen/test_pv -c Using config file "/etc/xen/test_pv". file /root/pv.img Started domain PvDomain # xm li Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 3409 4 r----- 163.3 PvDomain 16 512 4 ------ 1156.6 # xm vcpu-list PvDomain Name ID VCPUs CPU State Time(s) CPU Affinity PvDomain 16 0 0 r-- 1293.3 0 PvDomain 16 1 0 -b- 0.1 0 PvDomain 16 2 0 --- 0.0 0 PvDomain 16 3 0 --- 0.0 0 # virsh vcpuinfo PvDomain VCPU: 0 CPU: 0 State: running CPU time: 1351.0s CPU Affinity: y--- VCPU: 1 CPU: 0 State: idle CPU time: 0.1s CPU Affinity: y--- VCPU: 2 CPU: 0 State: no state CPU time: 0.0s CPU Affinity: y--- VCPU: 3 CPU: 0 State: no state CPU Affinity: y--- Version-Release number of selected component (if applicable): xen-3.0.3-94.el5 How reproducible: Always Steps to Reproduce: 1. In config file of PV guest, add: cpus = "0" vcpus = 4 2. create PV guest with this config file Actual results: The guest hang at boot time. Expected results: The guest boot successfully. Additional info: xend.log uploaded.
Created attachment 374175 [details] config file used to create the pv guest
I can reproduce it, though for me it hangs here: Grant table initialized NET: Registered protocol family 16 Initializing CPU#1 migration_cost=14 migration_cost=14 Initializing CPU#2 migration_cost=14 Brought up 4 CPUs PCI: setting up Xen PCI frontend stub Initializing CPU#3 ACPI: Interpreter disabled. Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI: disabled xen_mem: Initialising balloon driver. usbcore: registered new driver usbfs usbcore: registered new driver hub PCI: System does not support PCI PCI: System does not support PCI NetLabel: Initializing NetLabel: domain hash size = 128 NetLabel: protocols = UNLABELED CIPSOv4 NetLabel: unlabeled traffic allowed by default NET: Registered protocol family 2 What would come later is this: IP route cache hash table entries: 32768 (order: 6, 262144 bytes) TCP established hash table entries: 131072 (order: 9, 2097152 bytes) TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) TCP: Hash tables configured (established 131072 bind 65536) TCP reno registered audit: initializing netlink socket (disabled) type=2000 audit(1259679049.404:1): initialized VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 512 (order 0, 4096 bytes) Initializing Cryptographic API alg: No test for crc32c (crc32c-generic) ksign: Installing public key data Loading keyring - Added public key 23B7022ABAB774D1 - User ID: Red Hat, Inc. (Kernel Module GPG key)
It's broken every time the number of fields in "cpus" does not match the number of vcpus. Moving it to kernel-xen, though if the configuration is declared bogus we may move it back to Xen and fix it in xm.
and this isn't a problem (from boot log):???? PCI: System does not support PCI PCI: System does not support PCI that would suggest a broken bios callback in guest...
Reproduced and grabbed a stack. I don't see anything unusual in the boot messages that were displayed before the hang. This occurs any time we attempt to bind multiple vcpus to a single cpu, i.e. 'cpus=0-1 vcpus=4' and 'cpus=0 vcpus=1' work. # xenctx -s a/System.map-2.6.18-187.el5xen 2 rip: ffffffff80274476 __smp_call_function_many+0x94 rsp: ffff8800024c14e0 rax: 00000001 rbx: ffff8800024c1560 rcx: 00000000 rdx: ffffffffff578000 rsi: ffff8800024c1480 rdi: 00000000 rbp: 00000003 r8: 00000001 r9: ffff8800024c1560 r10: ffffffff80736fe0 r11: 00000002 r12: 00000001 r13: ffff8800024c15e0 r14: ffffffff802d0ff4 r15: ffff8800024c1560 cs: 0000e033 ds: 00000000 fs: 00000000 gs: 00000000 Stack: ffffffff802d0ff4 ffff8800024c15e0 0000000100000001 ffffffff00000001 0000000000000001 ffff88003fc46700 0000000000000001 ffff8800024c15e0 ffffffff802d0ff4 ffffffff802745a3 00000000000000ff 0000000000000001 0000000000000001 ffff8800024c15e0 ffffffff802d0ff4 ffffffff8027469d Code: 89 de 48 8b 40 30 f3 a5 bf 01 00 00 00 ff d0 48 83 c4 20 eb 00 <8b> 44 24 10 39 e8 75 f8 45 85 e4 Call Trace: [<ffffffff80274476>] __smp_call_function_many+0x94 <-- [<ffffffff802d0ff4>] do_drain+0x5a [<ffffffff802d0ff4>] do_drain+0x5a [<ffffffff802745a3>] smp_call_function_many+0x38 [<ffffffff802d0ff4>] do_drain+0x5a [<ffffffff8027469d>] smp_call_function+0x4e [<ffffffff802d0ff4>] do_drain+0x5a [<ffffffff8028fe3d>] on_each_cpu+0x10 [<ffffffff802d0907>] do_tune_cpucache+0xa5 [<ffffffff80223717>] cache_estimate+0x89 [<ffffffff802d0d28>] enable_cpucache+0x4f [<ffffffff8023abcb>] kmem_cache_create+0x3aa [<ffffffff802ffb58>] sysfs_make_dirent+0x1b [<ffffffff8065bf94>] utrace_init+0x22 [<ffffffff8064c7eb>] init+0x1f9 [<ffffffff80260b2c>] child_rip+0xa [<ffffffff8064c5f2>] do_early_param+0x57 [<ffffffff80260b22>] kernel_thread+0xde
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Xen guests will not boot with a configuration that binds multiple vcpus to a single cpu.
Note this is a similar, and possibly the same, issue as bug 570056
I would change this bug to the xen component and, for RHEL5.6, disable this configuration. Anyone disagrees?
Moving to userspace so that we can forbid this configuration.
*** Bug 570056 has been marked as a duplicate of this bug. ***
Fix built into xen-3.0.3-117.el5
QA verified this bug on xen-3.0.3-117.el5: Create a PV guest with cpus set as '0': # xm cr /tmp/xm-test.cfg cpus=0 Using config file "/tmp/xm-test.cfg". Using <class 'grub.GrubConf.GrubConfigFile'> to parse /grub/menu.lst Error: Can't bind more vcpus to single cpu So change this bug to VERIFIED.
I've tested this on a HP DL165 G7 machine with 2x 12 core AMD processors and was able to replicate the problem. I tried to assign 4, 6, 8, and 12 cores to a VM. On a HP ML350 G5 with 1x quad core processor, the problem does not exist. I successfully assigned 8 cores to 1 vm. Is it possible that the problem is only related to AMD processors?
(In reply to comment #19) > I've tested this on a HP DL165 G7 machine with 2x 12 core AMD processors and > was able to replicate the problem. I tried to assign 4, 6, 8, and 12 cores to a > VM. > > On a HP ML350 G5 with 1x quad core processor, the problem does not exist. I > successfully assigned 8 cores to 1 vm. > > Is it possible that the problem is only related to AMD processors? Unfortunately, no. I can reproduce this bug on an Intel Q9400 machine(Dell 760). So Intel processors also suffer from such problem (:
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0031.html