Bug 669388

Summary: bogus do_FLR call in XendDomain.py
Product: Red Hat Enterprise Linux 5 Reporter: Paolo Bonzini <pbonzini>
Component: xenAssignee: Laszlo Ersek <lersek>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 5.6CC: mshao, qwan, tom, xen-maint, yuzhang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: xen-3.0.3-122.el5 Doc Type: Bug Fix
Doc Text:
A part of the recovery code in the XendDomain.py source file used an unqualified name to access the XendDomainInfo.do_FLR() function. As a consequence, a network card's virtual functions were not properly reset before a fully-virtualized guest was started. With this update, all do_FLR() calls use the correct scope, and this bug no longer occurs.
Story Points: ---
Clone Of:
: 688162 (view as bug list) Environment:
Last Closed: 2011-07-21 09:16:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 514500, 688162    
Attachments:
Description Flags
Qualify do_FLR call with correct scope.
none
xend log none

Description Paolo Bonzini 2011-01-13 14:48:42 UTC
Some little-tested recovery code in XendDomain.py is using an unqualified name to access XendDomainInfo.do_FLR.  This fails because XendDomainInfo is imported with "import XendDomainInfo".

The patch to fix this is trivial, so it should be included.

Comment 1 Laszlo Ersek 2011-01-13 16:38:59 UTC
Backport of upstream c/s 19067:a92ed09b4032 ("xend: Fix do_FLR() scope problem.")

http://xenbits.xensource.com/xen-unstable.hg?rev/19067

Comment 2 Laszlo Ersek 2011-01-13 16:46:39 UTC
Created attachment 473369 [details]
Qualify do_FLR call with correct scope.

Comment 5 Qixiang Wan 2011-01-31 03:07:58 UTC
*** Bug 629523 has been marked as a duplicate of this bug. ***

Comment 6 Qixiang Wan 2011-01-31 03:28:38 UTC
There is a regression after applying this patch:

On numa machine, create guest will fail when numa is enabled.

$ cat grub.conf
...
	kernel /xen.gz-2.6.18-194.el5 bootscrub=0 numa=on loglvl=all guest_loglvl=all
...

$ cat rhel5-32pv.cfg 
name = "rhel5-32pv"
maxmem = 1024
memory = 1024
vcpus = 1
bootloader = "/usr/bin/pygrub"
pae = 1
on_poweroff = "destroy"
on_reboot = "restart"
on_crash = "restart"
vfb = [ 'type=vnc,vncunused=1,keymap=en-us,vnclisten=0.0.0.0' ]
disk = [ "tap:aio:/root/RHEL-Server-5.5-32-pv.raw,xvda,w" ]
vif = [ "mac=00:16:36:63:05:48,bridge=xenbr4,script=vif-bridge" ]

$ xm create rhel5-32pv.cfg 
Using config file "./rhel5-32pv.cfg".
Using <class 'grub.GrubConf.GrubConfigFile'> to parse /grub/menu.lst
Error: (3, 'No such process')

additional info:
[1] the defect exist with both PV and HVM guest
[2] there is no problem when numa is disabled

Comment 7 Qixiang Wan 2011-01-31 03:30:29 UTC
Created attachment 476110 [details]
xend log

create DomU on NUMA machine after apply the patch

DomU memory becomes '0' while performing XendDomainInfo.recreate.

Comment 9 Laszlo Ersek 2011-01-31 08:47:12 UTC
Hello Qixiang,

the error you see is not related to this patch, and thus it's most likely not a regression.

This bug (669388) was reported by Paolo when I was working on bug 666908 (which is sensitive to the Xen numa setting btw.), and I asked for his help with interpreting the messages in xend.log. Please see bug 666908 comment 9.

The patch eliminates an exception in xend at a time when xend is on an error handling / recovery path anyway. Therefore xend's behavior may indeed change, because with the exception absent, a code path that was unreachable before may become reachable now. The error message

    NameError: global name 'do_FLR' is not defined

was previously masking the real error, which is

    VmError: Invalid memory size

With this patch, it may not be masked anymore.

So, the "32bit guest on 64bit host with lots of memory" question is unrelated to this scoping bug. To see that, please enable numa (so that the guest creation problem reappears), and then repeat it with a downgraded xend (which doesn't have this patch applied). The guest creation problem should persist, even though xend may react differently to it.

Comment 15 Qixiang Wan 2011-04-01 12:16:42 UTC
VERIFIED with xen-3.0.3-127.el5.

reproduce with xen 120 build. No error message like: "NameError: global name 'do_FLR' is not defined" present in xend.log with 127 build. The issue mentioned in comment 6 was handled in 688162.

Comment 16 Tomas Capek 2011-07-13 13:30:22 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
A part of the recovery code in the XendDomain.py source file used an unqualified name to access the XendDomainInfo.do_FLR() function. As a consequence, a 32-bit PV guest sometimes could  not start properly on a 64-bit NUMA (Non-Uniform Memory Access) system. With this update, all do_FLR() calls use the correct scope, and this bug no longer occurs.

Comment 17 Paolo Bonzini 2011-07-13 15:02:10 UTC
The NUMA failure was reported only during 5.7 development and does not occur in released versions.  However, there is indeed a bug that was fixed by this patch, and I tried to describe it.  If you need cause/consequence/fix/result, please put needinfo here.

Comment 18 Paolo Bonzini 2011-07-13 15:02:10 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-A part of the recovery code in the XendDomain.py source file used an unqualified name to access the XendDomainInfo.do_FLR() function. As a consequence, a 32-bit PV guest sometimes could  not start properly on a 64-bit NUMA (Non-Uniform Memory Access) system. With this update, all do_FLR() calls use the correct scope, and this bug no longer occurs.+A part of the recovery code in the XendDomain.py source file used an unqualified name to access the XendDomainInfo.do_FLR() function. As a consequence, a network card's virtual functions were not properly reset before a fully-virtualized guest was started. With this update, all do_FLR() calls use the correct scope, and this bug no longer occurs.

Comment 19 errata-xmlrpc 2011-07-21 09:16:15 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1070.html

Comment 20 errata-xmlrpc 2011-07-21 12:01:14 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1070.html