Created attachment 442563 [details] xend.log Description of problem: Got error from xend.log when try to create guest (PV or HVM) on an AMD NUMA host(4 nodes). For the same machine without NUMA enabled, no such issue is hit. Version-Release number of selected component (if applicable): xen-libs-3.0.3-115.el5 xen-libs-3.0.3-105.el5 kernel-xen-devel-2.6.18-214.el5 xen-3.0.3-115.el5 xen-devel-3.0.3-115.el5 kernel-xen-2.6.18-214.el5 xen-debuginfo-3.0.3-115.el5 How reproducible: Always. Steps to Reproduce: 1. Enable NUMA on the kernel command line of hypervisor: numa=on 2. Boot the host, check system info: # xm info host : amd-8356-128-4 release : 2.6.18-214.el5xen version : #1 SMP Fri Aug 27 17:54:19 EDT 2010 machine : x86_64 nr_cpus : 16 nr_nodes : 4 sockets_per_node : 1 cores_per_socket : 4 threads_per_core : 1 cpu_mhz : 2300 hw_caps : 178bfbff:efd3fbff:00000000:00000110:00802009:00000000:000007ff total_memory : 114687 free_memory : 78719 node_to_cpu : node0:0,4,8,12 node1:1,5,9,13 node2:2,6,10,14 node3:3,7,11,15 xen_major : 3 xen_minor : 1 xen_extra : .2-214.el5 xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : unavailable cc_compiler : gcc version 4.1.2 20080704 (Red Hat 4.1.2-48) cc_compile_by : mockbuild cc_compile_domain : redhat.com cc_compile_date : Fri Aug 27 17:44:12 EDT 2010 xend_config_format : 2 3. Try to create a guest (either a PV or a HVM) Actual results: At step 3, the guest could be created successfully. But in xend.log, we could find error messages like: [2010-09-02 15:41:21 xend 5728] ERROR (XendDomain:222) Failed to recreate information for domain 1. Destroying it in the hope of recovery. Traceback (most recent call last): File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 216, in refresh self._add_domain( File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 265, in recreate vm = XendDomainInfo(xeninfo, domid, dompath, True, priv) File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 534, in __init__ self.validateInfo() File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 836, in validateInfo raise VmError('Invalid memory size') VmError: Invalid memory size [2010-09-02 15:41:21 xend 5728] ERROR (XendDomain:228) Destruction of 1 failed. Traceback (most recent call last): File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 225, in refresh do_FLR(d, doms[d]['hvm']) NameError: global name 'do_FLR' is not defined Expected results: No errors in xend.log at step 3. Additional info: For the same machine, no such issue is found when we disable NUMA from kernel command line.
Created attachment 442565 [details] dmesg log
Is this only AMD Host problem or is this reproducible on intel too?
We could get access to an Intel NUMA machine tomorrow, by which day we could check if this bug could be reproduced on Intel NUMA machine.
This bug can be reproduced on Intel NUMA machine too: # xm info host : intel-e7450-512-1 release : 2.6.18-221.el5xen version : #1 SMP Mon Sep 13 22:23:30 EDT 2010 machine : x86_64 nr_cpus : 48 nr_nodes : 2 sockets_per_node : 6 cores_per_socket : 4 threads_per_core : 1 cpu_mhz : 2398 hw_caps : bfebfbff:20100800:00000000:00000940:000ce3bd:00000000:00000001 total_memory : 523774 free_memory : 480719 node_to_cpu : node0:0-23 node1:24-47 xen_major : 3 xen_minor : 1 xen_extra : .2-221.el5 xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : unavailable cc_compiler : gcc version 4.1.2 20080704 (Red Hat 4.1.2-48) cc_compile_by : mockbuild cc_compile_domain : redhat.com cc_compile_date : Mon Sep 13 22:00:53 EDT 2010 xend_config_format : 2 After creating a guest, we could see error from xend.log: [2010-09-16 17:41:56 xend.XendDomainInfo 8551] INFO (XendDomainInfo:265) Recreating domain 1, UUID 1efb30c3-86fd-9dd7-4934-9b72b6a833fc. [2010-09-16 17:41:56 xend 8551] ERROR (XendDomain:222) Failed to recreate information for domain 1. Destroying it in the hope of recovery. Traceback (most recent call last): File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 216, in refresh self._add_domain( File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 289, in recreate vm = XendDomainInfo(xeninfo, domid, dompath, True, priv) File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 558, in __init__ self.validateInfo() File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 860, in validateInfo raise VmError('Invalid memory size') VmError: Invalid memory size [2010-09-16 17:41:56 xend 8551] ERROR (XendDomain:228) Destruction of 1 failed. Traceback (most recent call last): File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 225, in refresh do_FLR(d, doms[d]['hvm']) NameError: global name 'do_FLR' is not defined
Created attachment 447702 [details] xend.log on Intel NUMA machine
Created attachment 447703 [details] xm dmesg on Intel NUMA machine
Hi Yufang, can you provide me access to NUMA machine? I try to get machines from this bz but do not have rights for it.
If the only problem captured in this bug is the do_FLR error message, then I strongly suggest to close it as a dupe of bug 669388.
yes, the problem is only the do_FLR error message, so closing this as dup of bug 669388. *** This bug has been marked as a duplicate of bug 669388 ***