Some little-tested recovery code in XendDomain.py is using an unqualified name to access XendDomainInfo.do_FLR. This fails because XendDomainInfo is imported with "import XendDomainInfo". The patch to fix this is trivial, so it should be included.
Backport of upstream c/s 19067:a92ed09b4032 ("xend: Fix do_FLR() scope problem.") http://xenbits.xensource.com/xen-unstable.hg?rev/19067
Created attachment 473369 [details] Qualify do_FLR call with correct scope.
*** Bug 629523 has been marked as a duplicate of this bug. ***
There is a regression after applying this patch: On numa machine, create guest will fail when numa is enabled. $ cat grub.conf ... kernel /xen.gz-2.6.18-194.el5 bootscrub=0 numa=on loglvl=all guest_loglvl=all ... $ cat rhel5-32pv.cfg name = "rhel5-32pv" maxmem = 1024 memory = 1024 vcpus = 1 bootloader = "/usr/bin/pygrub" pae = 1 on_poweroff = "destroy" on_reboot = "restart" on_crash = "restart" vfb = [ 'type=vnc,vncunused=1,keymap=en-us,vnclisten=0.0.0.0' ] disk = [ "tap:aio:/root/RHEL-Server-5.5-32-pv.raw,xvda,w" ] vif = [ "mac=00:16:36:63:05:48,bridge=xenbr4,script=vif-bridge" ] $ xm create rhel5-32pv.cfg Using config file "./rhel5-32pv.cfg". Using <class 'grub.GrubConf.GrubConfigFile'> to parse /grub/menu.lst Error: (3, 'No such process') additional info: [1] the defect exist with both PV and HVM guest [2] there is no problem when numa is disabled
Created attachment 476110 [details] xend log create DomU on NUMA machine after apply the patch DomU memory becomes '0' while performing XendDomainInfo.recreate.
Hello Qixiang, the error you see is not related to this patch, and thus it's most likely not a regression. This bug (669388) was reported by Paolo when I was working on bug 666908 (which is sensitive to the Xen numa setting btw.), and I asked for his help with interpreting the messages in xend.log. Please see bug 666908 comment 9. The patch eliminates an exception in xend at a time when xend is on an error handling / recovery path anyway. Therefore xend's behavior may indeed change, because with the exception absent, a code path that was unreachable before may become reachable now. The error message NameError: global name 'do_FLR' is not defined was previously masking the real error, which is VmError: Invalid memory size With this patch, it may not be masked anymore. So, the "32bit guest on 64bit host with lots of memory" question is unrelated to this scoping bug. To see that, please enable numa (so that the guest creation problem reappears), and then repeat it with a downgraded xend (which doesn't have this patch applied). The guest creation problem should persist, even though xend may react differently to it.
VERIFIED with xen-3.0.3-127.el5. reproduce with xen 120 build. No error message like: "NameError: global name 'do_FLR' is not defined" present in xend.log with 127 build. The issue mentioned in comment 6 was handled in 688162.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: A part of the recovery code in the XendDomain.py source file used an unqualified name to access the XendDomainInfo.do_FLR() function. As a consequence, a 32-bit PV guest sometimes could not start properly on a 64-bit NUMA (Non-Uniform Memory Access) system. With this update, all do_FLR() calls use the correct scope, and this bug no longer occurs.
The NUMA failure was reported only during 5.7 development and does not occur in released versions. However, there is indeed a bug that was fixed by this patch, and I tried to describe it. If you need cause/consequence/fix/result, please put needinfo here.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -A part of the recovery code in the XendDomain.py source file used an unqualified name to access the XendDomainInfo.do_FLR() function. As a consequence, a 32-bit PV guest sometimes could not start properly on a 64-bit NUMA (Non-Uniform Memory Access) system. With this update, all do_FLR() calls use the correct scope, and this bug no longer occurs.+A part of the recovery code in the XendDomain.py source file used an unqualified name to access the XendDomainInfo.do_FLR() function. As a consequence, a network card's virtual functions were not properly reset before a fully-virtualized guest was started. With this update, all do_FLR() calls use the correct scope, and this bug no longer occurs.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-1070.html