Bug 873134

Summary: setting current memory equal to max will end with domain start as current > max
Product: Red Hat Enterprise Linux 6 Reporter: Wayne Sun <gsun>
Component: libvirtAssignee: Laine Stump <laine>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.4CC: acathrow, dyasny, dyuan, honzhang, mzhan, rwu, zhpeng
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.10.2-9.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 07:11:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 881827    

Description Wayne Sun 2012-11-05 08:42:08 UTC
Description of problem:
When setting domain memory as max == current with KiB in domain xml, the domain started with current memory>  max memory, destroy and restart the domain will working fine, but start domain after done managed save will fail.

Version-Release number of selected component (if applicable):
libvirt-0.10.2-6.el6.x86_64
qemu-kvm-0.12.1.2-2.316.el6.x86_64
kernel-2.6.32-330.el6.x86_64

How reproducible:
100%
1. virsh edit a domain
# virsh edit libvirt_test_api
...
  <memory unit='KiB'>4000000</memory>
  <currentMemory unit='KiB'>4000000</currentMemory>
  <vcpu placement='static'>4</vcpu>
...

2. start domain and check xml
# virsh start libvirt_test_api
Domain started

# virsh list
 Id    Name                           State
----------------------------------------------------
 1     libvirt_test_api               running

# virsh dumpxml libvirt_test_api
...
  <memory unit='KiB'>4000768</memory>
  <currentMemory unit='KiB'>4001792</currentMemory>
  <vcpu placement='static'>4</vcpu>
...

do destroy and start domain will success

3. do managed save
# virsh managedsave libvirt_test_api

Domain libvirt_test_api state saved by libvirt

# virsh start libvirt_test_api
error: Failed to start domain libvirt_test_api
error: XML error: current memory '4001792k' exceeds maximum '4000768k'


Actual results:
setting current == max will end domain start with current>  max, then domain will fail to start after done managed save

Expected results:
current should be equal or less than the max value, which will not block managed save 

Additional info:

Comment 2 Laine Stump 2012-11-16 22:14:35 UTC
Fix has been pushed upstream:

commit 89204fca7f193c7cf48f941bf2917c1a0e71096c
Author: Laine Stump <laine>
Date:   Fri Nov 16 10:53:04 2012 -0500

    qemu: allow larger discrepency between memory & currentMemory in domain xml
    
    This resolves:
    
      https://bugzilla.redhat.com/show_bug.cgi?id=873134
    
    The reported problem is that an attempt to restore a saved domain that
    was configured with <currentMemory> and <memory> set to some (same for
    both) number that's not a multiple of 4096KiB results in an error like
    this:
    
      error: Failed to start domain libvirt_test_api
      error: XML error: current memory '4001792k' exceeds maximum '4000768k'
    
    (in this case, currentMemory was set to 4000000KiB).
    
    The reason for this failure is:
    
    1) a saved image contains the "live xml" of the domain at the time of
    the save.
    
    2) the live xml of a running domain gets its currentMemory
    (a.k.a. cur_balloon) directly from the qemu monitor rather than from
    the configuration of the domain.
    
    3) the value reported by qemu is (sometimes) not exactly what was
    originally given to qemu when the domain was started, but is rounded
    up to [some indeterminate granularity] - in some versions of qemu that
    granularity is apparently 1MiB, and in others it is 4MiB.
    
    4) When the XML is parsed to setup the state of the restored domain,
    the XML parser for <currentMemory> compares it to <memory> (which is
    the maximum allowed memory size for the domain) and if <currentMemory>
    is greater than the next 1024KiB boundary above <memory>, it spits out
    an error and fails.
    
    For example (from the BZ) if you start qemu on RHEL6 with both
    <currentMemory> and <memory> of 4000000 (this number is in KiB),
    libvirt's dominfo or dumpxml will report "4001792" back (rounded up to
    next 4MiB) for 10-20 seconds after the start, then revert to reporting
    "4000000". On Fedora 16 (which uses qemu-1.0), it will instead report
    "4000768" (rounded up to next 1MiB). On Fedora 17 (qemu-1.2), it seems
    to always report "4000000". ("4000000" is of course okay, and
    "4000768" is also okay since that's the next 1024KiB boundary above
    "4000000" and the parser was already allowing for that. But "4001792
    is *not* okay and produces the error message.)
    
    This patch solves the problem by changing the allowed "fudge factor"
    when parsing from 1024KiB to 4096KiB to match the maximum up-rounding
    that could be done in qemu.
    
    (I had earlier thought to fix this by up-rounding <memory> in the
    dumpxml that's put into the saved image, but that wouldn't have fixed
    the case where the save image was produced by an "unfixed"
    libvirtd.)

Comment 5 hongming 2012-11-21 02:57:18 UTC
Verify it as follows.The result is expected. Move its status to VERIFIED.

# rpm -q libvirt qemu-kvm
libvirt-0.10.2-9.el6.x86_64
qemu-kvm-0.12.1.2-2.334.el6.x86_64

# virsh edit rhel6
.....
  <memory unit='KiB'>2000000</memory>
  <currentMemory unit='KiB'>2000000</currentMemory>
.....

Domain rhel6 XML configuration edited.

# virsh start rhel6
Domain rhel6 started

# virsh managedsave rhel6 

Domain rhel6 state saved by libvirt

# virsh start rhel6
Domain rhel6 started

# virsh dumpxml rhel6
<domain type='kvm' id='36'>
.....
  <memory unit='KiB'>2000896</memory>
  <currentMemory unit='KiB'>2000000</currentMemory>
.....
</domain>

Comment 6 errata-xmlrpc 2013-02-21 07:11:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0276.html