Bug 669388 - bogus do_FLR call in XendDomain.py
Summary: bogus do_FLR call in XendDomain.py
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen
Version: 5.6
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: rc
: ---
Assignee: Laszlo Ersek
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 629523 (view as bug list)
Depends On:
Blocks: 514500 688162
TreeView+ depends on / blocked
 
Reported: 2011-01-13 14:48 UTC by Paolo Bonzini
Modified: 2011-07-21 12:01 UTC (History)
5 users (show)

Fixed In Version: xen-3.0.3-122.el5
Doc Type: Bug Fix
Doc Text:
A part of the recovery code in the XendDomain.py source file used an unqualified name to access the XendDomainInfo.do_FLR() function. As a consequence, a network card's virtual functions were not properly reset before a fully-virtualized guest was started. With this update, all do_FLR() calls use the correct scope, and this bug no longer occurs.
Clone Of:
: 688162 (view as bug list)
Environment:
Last Closed: 2011-07-21 09:16:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Qualify do_FLR call with correct scope. (1.04 KB, patch)
2011-01-13 16:46 UTC, Laszlo Ersek
no flags Details | Diff
xend log (5.46 KB, text/plain)
2011-01-31 03:30 UTC, Qixiang Wan
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:1070 0 normal SHIPPED_LIVE xen bug fix and enhancement update 2011-07-21 09:12:56 UTC

Description Paolo Bonzini 2011-01-13 14:48:42 UTC
Some little-tested recovery code in XendDomain.py is using an unqualified name to access XendDomainInfo.do_FLR.  This fails because XendDomainInfo is imported with "import XendDomainInfo".

The patch to fix this is trivial, so it should be included.

Comment 1 Laszlo Ersek 2011-01-13 16:38:59 UTC
Backport of upstream c/s 19067:a92ed09b4032 ("xend: Fix do_FLR() scope problem.")

http://xenbits.xensource.com/xen-unstable.hg?rev/19067

Comment 2 Laszlo Ersek 2011-01-13 16:46:39 UTC
Created attachment 473369 [details]
Qualify do_FLR call with correct scope.

Comment 5 Qixiang Wan 2011-01-31 03:07:58 UTC
*** Bug 629523 has been marked as a duplicate of this bug. ***

Comment 6 Qixiang Wan 2011-01-31 03:28:38 UTC
There is a regression after applying this patch:

On numa machine, create guest will fail when numa is enabled.

$ cat grub.conf
...
	kernel /xen.gz-2.6.18-194.el5 bootscrub=0 numa=on loglvl=all guest_loglvl=all
...

$ cat rhel5-32pv.cfg 
name = "rhel5-32pv"
maxmem = 1024
memory = 1024
vcpus = 1
bootloader = "/usr/bin/pygrub"
pae = 1
on_poweroff = "destroy"
on_reboot = "restart"
on_crash = "restart"
vfb = [ 'type=vnc,vncunused=1,keymap=en-us,vnclisten=0.0.0.0' ]
disk = [ "tap:aio:/root/RHEL-Server-5.5-32-pv.raw,xvda,w" ]
vif = [ "mac=00:16:36:63:05:48,bridge=xenbr4,script=vif-bridge" ]

$ xm create rhel5-32pv.cfg 
Using config file "./rhel5-32pv.cfg".
Using <class 'grub.GrubConf.GrubConfigFile'> to parse /grub/menu.lst
Error: (3, 'No such process')

additional info:
[1] the defect exist with both PV and HVM guest
[2] there is no problem when numa is disabled

Comment 7 Qixiang Wan 2011-01-31 03:30:29 UTC
Created attachment 476110 [details]
xend log

create DomU on NUMA machine after apply the patch

DomU memory becomes '0' while performing XendDomainInfo.recreate.

Comment 9 Laszlo Ersek 2011-01-31 08:47:12 UTC
Hello Qixiang,

the error you see is not related to this patch, and thus it's most likely not a regression.

This bug (669388) was reported by Paolo when I was working on bug 666908 (which is sensitive to the Xen numa setting btw.), and I asked for his help with interpreting the messages in xend.log. Please see bug 666908 comment 9.

The patch eliminates an exception in xend at a time when xend is on an error handling / recovery path anyway. Therefore xend's behavior may indeed change, because with the exception absent, a code path that was unreachable before may become reachable now. The error message

    NameError: global name 'do_FLR' is not defined

was previously masking the real error, which is

    VmError: Invalid memory size

With this patch, it may not be masked anymore.

So, the "32bit guest on 64bit host with lots of memory" question is unrelated to this scoping bug. To see that, please enable numa (so that the guest creation problem reappears), and then repeat it with a downgraded xend (which doesn't have this patch applied). The guest creation problem should persist, even though xend may react differently to it.

Comment 15 Qixiang Wan 2011-04-01 12:16:42 UTC
VERIFIED with xen-3.0.3-127.el5.

reproduce with xen 120 build. No error message like: "NameError: global name 'do_FLR' is not defined" present in xend.log with 127 build. The issue mentioned in comment 6 was handled in 688162.

Comment 16 Tomas Capek 2011-07-13 13:30:22 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
A part of the recovery code in the XendDomain.py source file used an unqualified name to access the XendDomainInfo.do_FLR() function. As a consequence, a 32-bit PV guest sometimes could  not start properly on a 64-bit NUMA (Non-Uniform Memory Access) system. With this update, all do_FLR() calls use the correct scope, and this bug no longer occurs.

Comment 17 Paolo Bonzini 2011-07-13 15:02:10 UTC
The NUMA failure was reported only during 5.7 development and does not occur in released versions.  However, there is indeed a bug that was fixed by this patch, and I tried to describe it.  If you need cause/consequence/fix/result, please put needinfo here.

Comment 18 Paolo Bonzini 2011-07-13 15:02:10 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-A part of the recovery code in the XendDomain.py source file used an unqualified name to access the XendDomainInfo.do_FLR() function. As a consequence, a 32-bit PV guest sometimes could  not start properly on a 64-bit NUMA (Non-Uniform Memory Access) system. With this update, all do_FLR() calls use the correct scope, and this bug no longer occurs.+A part of the recovery code in the XendDomain.py source file used an unqualified name to access the XendDomainInfo.do_FLR() function. As a consequence, a network card's virtual functions were not properly reset before a fully-virtualized guest was started. With this update, all do_FLR() calls use the correct scope, and this bug no longer occurs.

Comment 19 errata-xmlrpc 2011-07-21 09:16:15 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1070.html

Comment 20 errata-xmlrpc 2011-07-21 12:01:14 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1070.html


Note You need to log in before you can comment on or make changes to this bug.