Bug 629523 - Got error when create guest on NUMA host
Summary: Got error when create guest on NUMA host
Keywords:
Status: CLOSED DUPLICATE of bug 669388
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen
Version: 5.6
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Xen Maintainance List
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 514500
TreeView+ depends on / blocked
 
Reported: 2010-09-02 07:56 UTC by Yufang Zhang
Modified: 2011-01-31 03:07 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-01-31 03:07:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
xend.log (169.74 KB, text/plain)
2010-09-02 07:56 UTC, Yufang Zhang
no flags Details
dmesg log (16.01 KB, text/plain)
2010-09-02 07:59 UTC, Yufang Zhang
no flags Details
xend.log on Intel NUMA machine (29.32 KB, text/plain)
2010-09-16 09:49 UTC, Yufang Zhang
no flags Details
xm dmesg on Intel NUMA machine (16.01 KB, text/plain)
2010-09-16 09:52 UTC, Yufang Zhang
no flags Details

Description Yufang Zhang 2010-09-02 07:56:58 UTC
Created attachment 442563 [details]
xend.log

Description of problem:
Got error from xend.log when try to create guest (PV or HVM) on an AMD NUMA host(4 nodes). For the same machine without NUMA enabled, no such issue is hit. 

Version-Release number of selected component (if applicable):
xen-libs-3.0.3-115.el5
xen-libs-3.0.3-105.el5
kernel-xen-devel-2.6.18-214.el5
xen-3.0.3-115.el5
xen-devel-3.0.3-115.el5
kernel-xen-2.6.18-214.el5
xen-debuginfo-3.0.3-115.el5


How reproducible:
Always.

Steps to Reproduce:
1. Enable NUMA on the kernel command line of hypervisor: numa=on
2. Boot the host, check system info: 
# xm info 
host                   : amd-8356-128-4
release                : 2.6.18-214.el5xen
version                : #1 SMP Fri Aug 27 17:54:19 EDT 2010
machine                : x86_64
nr_cpus                : 16
nr_nodes               : 4
sockets_per_node       : 1
cores_per_socket       : 4
threads_per_core       : 1
cpu_mhz                : 2300
hw_caps                : 178bfbff:efd3fbff:00000000:00000110:00802009:00000000:000007ff
total_memory           : 114687
free_memory            : 78719
node_to_cpu            : node0:0,4,8,12
                         node1:1,5,9,13
                         node2:2,6,10,14
                         node3:3,7,11,15
xen_major              : 3
xen_minor              : 1
xen_extra              : .2-214.el5
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          : unavailable
cc_compiler            : gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)
cc_compile_by          : mockbuild
cc_compile_domain      : redhat.com
cc_compile_date        : Fri Aug 27 17:44:12 EDT 2010
xend_config_format     : 2

3. Try to create a guest (either a PV or a HVM) 
  
Actual results:
At step 3, the guest could be created successfully. But in xend.log, we could find error messages like: 

[2010-09-02 15:41:21 xend 5728] ERROR (XendDomain:222) Failed to recreate information for domain 1.  Destroying it in the hope of recovery.
Traceback (most recent call last):
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 216, in refresh
    self._add_domain(
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 265, in recreate
    vm = XendDomainInfo(xeninfo, domid, dompath, True, priv)
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 534, in __init__
    self.validateInfo()
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 836, in validateInfo
    raise VmError('Invalid memory size')
VmError: Invalid memory size
[2010-09-02 15:41:21 xend 5728] ERROR (XendDomain:228) Destruction of 1 failed.
Traceback (most recent call last):
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 225, in refresh
    do_FLR(d, doms[d]['hvm'])
NameError: global name 'do_FLR' is not defined


Expected results:
No errors in xend.log at step 3. 


Additional info:
For the same machine, no such issue is found when we disable NUMA from kernel command line.

Comment 1 Yufang Zhang 2010-09-02 07:59:20 UTC
Created attachment 442565 [details]
dmesg log

Comment 2 Miroslav Rezanina 2010-09-10 08:04:24 UTC
Is this only AMD Host problem or is this reproducible on intel too?

Comment 3 Yufang Zhang 2010-09-15 07:43:14 UTC
We could get access to an Intel NUMA machine tomorrow, by which day we could check if this bug could be reproduced on Intel NUMA machine.

Comment 4 Yufang Zhang 2010-09-16 09:47:45 UTC
This bug can be reproduced on Intel NUMA machine too:

# xm info
host                   : intel-e7450-512-1
release                : 2.6.18-221.el5xen
version                : #1 SMP Mon Sep 13 22:23:30 EDT 2010
machine                : x86_64
nr_cpus                : 48
nr_nodes               : 2
sockets_per_node       : 6
cores_per_socket       : 4
threads_per_core       : 1
cpu_mhz                : 2398
hw_caps                : bfebfbff:20100800:00000000:00000940:000ce3bd:00000000:00000001
total_memory           : 523774
free_memory            : 480719
node_to_cpu            : node0:0-23
                         node1:24-47
xen_major              : 3
xen_minor              : 1
xen_extra              : .2-221.el5
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          : unavailable
cc_compiler            : gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)
cc_compile_by          : mockbuild
cc_compile_domain      : redhat.com
cc_compile_date        : Mon Sep 13 22:00:53 EDT 2010
xend_config_format     : 2

After creating a guest, we could see error from xend.log:

[2010-09-16 17:41:56 xend.XendDomainInfo 8551] INFO (XendDomainInfo:265) Recreating domain 1, UUID 1efb30c3-86fd-9dd7-4934-9b72b6a833fc.
[2010-09-16 17:41:56 xend 8551] ERROR (XendDomain:222) Failed to recreate information for domain 1.  Destroying it in the hope of recovery.
Traceback (most recent call last):
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 216, in refresh
    self._add_domain(
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 289, in recreate
    vm = XendDomainInfo(xeninfo, domid, dompath, True, priv)
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 558, in __init__
    self.validateInfo()
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 860, in validateInfo
    raise VmError('Invalid memory size')
VmError: Invalid memory size
[2010-09-16 17:41:56 xend 8551] ERROR (XendDomain:228) Destruction of 1 failed.
Traceback (most recent call last):
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 225, in refresh
    do_FLR(d, doms[d]['hvm'])
NameError: global name 'do_FLR' is not defined

Comment 5 Yufang Zhang 2010-09-16 09:49:55 UTC
Created attachment 447702 [details]
xend.log on Intel NUMA machine

Comment 6 Yufang Zhang 2010-09-16 09:52:03 UTC
Created attachment 447703 [details]
xm dmesg on Intel NUMA machine

Comment 7 Miroslav Rezanina 2010-12-13 13:49:15 UTC
Hi Yufang,

can you provide me access to NUMA machine? I try to get machines from this bz but do not have rights for it.

Comment 13 Laszlo Ersek 2011-01-28 16:46:44 UTC
If the only problem captured in this bug is the do_FLR error message, then I strongly suggest to close it as a dupe of bug 669388.

Comment 14 Qixiang Wan 2011-01-31 03:07:58 UTC
yes, the problem is only the do_FLR error message, so closing this as dup of bug 669388.

*** This bug has been marked as a duplicate of bug 669388 ***


Note You need to log in before you can comment on or make changes to this bug.