Bug 433307

Summary: Xen restore tries to allocate "maxmem" if specified in config file
Product: Red Hat Enterprise Linux 5 Reporter: monbernard <michel.monbernard>
Component: xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 5.1CC: clalance, minovotn, mrezanin, pbonzini, xen-maint, yuzhang
Target Milestone: rc   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-10-20 12:38:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 514498    
Attachments:
Description Flags
xm save/restore commands and xend.log none

Description monbernard 2008-02-18 15:30:04 UTC
Description of problem: 
save/restore of a domain with memory and maxmem specified in the config file.
save is OK, but restore can fail if there is not "maxmem" memory available on
the machine (restore tries to allocate "maxmem" instead of "memory" for the domain)

Version-Release number of selected component (if applicable):
RHEL5.1 (2.6.18-53.el5xen)

How reproducible:


Steps to Reproduce:
1.a machine with 4GB of memory
2.create a domain vt11u2 with memory=1024 maxmem=4096
3.save domain vt11u2
4.restore domain vt11u2
  
Actual results:
restore fails with memory allocation error


Expected results:
restore should be OK

Additional info:
see attachement console-and-xend-log.txt
we can see restore tries to allocate 4GB instead of 1GB.

Best regards

Comment 1 monbernard 2008-02-18 15:30:04 UTC
Created attachment 295174 [details]
xm save/restore commands and xend.log

Comment 3 Chris Lalancette 2008-02-28 03:32:13 UTC
I seem to remember having fixed this bug a while back, but either the fix didn't
work completely or this is ia64 specific.  Do you only see this problem on ia64,
or does it happen on i386/x86_64 as well?

Chris Lalancette

Comment 4 monbernard 2008-02-29 09:11:54 UTC
I see this problem only on ia64.
It works fine on i386/x86_64.

Best regards

Comment 5 Chris Lalancette 2008-02-29 14:12:14 UTC
Ah, OK, great.  Just making sure; thanks for the info.

Chris Lalancette

Comment 6 RHEL Program Management 2008-06-02 20:18:03 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 12 Michal Novotny 2010-03-17 14:35:43 UTC
(In reply to comment #4)
> I see this problem only on ia64.
> It works fine on i386/x86_64.
> 
> Best regards    

There's a special HVM build handling for ia64 platform. The function called here in case of ia64 is "xc_ia64_save_to_nvram" which appears to be failing according to xend.log attached to this BZ. This function is called in "domain_destroy_hook" in python code for case of domain shutdown and/or destroy. Since ia64 platform requires much more memory in really many cases (in comparison with other supported platforms), it's likely that the memory allocation of 450 MiB for dom0 is really not enough and that ia64 requires much more to work properly. The "xc_ia64_save_to_nvram" fails on "xc_domain_getinfo" call which is the call to get information about the domains and therefore I think that dom0 *does not* have enough memory to get it which is most likely since much higher ia64 memory requirements.

Michal

Comment 13 Michal Novotny 2010-03-17 14:40:27 UTC
Jes,
could you please have a look and comment this ? Am I right in the comment #12? If so, feel free to close this BZ.

Michal

Comment 14 Jes Sorensen 2010-03-18 08:14:40 UTC
Michal,

I don't think I am the right one to help with this one. It's been nearly three years since I even tried to compile Xen the last time, and back then there was no HVM support for ia64.

450MB on ia64 is really not much, and probably too small for any OS to boot up, however I don't understand where you get that number from? I thought it was supposed to be booted with 1GB? and 1GB itself is very low for ia64.

Cheers,
Jes

Comment 15 Michal Novotny 2010-03-18 09:44:33 UTC
Jes,
thanks for your input, what do you mean there was no HVM support for ia64 ? No guest-firmware? Now xen HVM guests on ia64 requires xen-ia64-guest-firmware package to be installed on the system to be able to install HVM guests. This guest firmware is closed-source by Intel so no real modifications are possible in it so everything that is a firmware issue is something we can't fix.

Well, since all the operating systems running on ia64 requires enormous amount of memory, I think 450 MiB is not exact value and the exact value is 434 MiB here and it's really very little and it the minimum memory for dom0 should be increased to at least 1 GiB. The value of 434 MiB could be seen in attachment 295174 [details]:

[root@virtu11 ~]# xm list
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0      434     1 r-----  16043.5
manu                                       4     1023     1 -b----   2442.0

There is an option (dom0-min-mem) in the xend configuration file, i.e. /etc/xen/xend-config.sxp. Which is set to 256 by default, the value is in MiBs. Could you please try settings this to 1024 (MiBs == 1 GiB) and rebooting the system and trying again and provide us test results after having this tested?

Thanks,
Michal

Comment 16 Paolo Bonzini 2010-06-30 14:47:34 UTC
I'm not sure why you think this bug should be related to the amount of memory given to dom0.  The xend.log file is clear:

[2008-02-18 17:54:50 xend 3324] INFO (XendCheckpoint:351) Failed allocation for dom 9: 262144 extents of order 0
[2008-02-18 17:54:50 xend 3324] INFO (XendCheckpoint:351) ERROR Internal error: Failed to allocate memory for 4194304 KB to dom 9.

If you still don't believe that, you can lower maxmem progressively from 4096 and see when the domain starts correctly.  If it starts around 4096 - dom0-mem, then the bug subject is correct.  If it starts around 2048 or so, then the bug subject is incorrect and dom0 memory might be a clue.

(The test can be worthwhile in any case, because it can help understanding the behavior of Xen when restoring succeeds but memory != maxmem).

Comment 17 Michal Novotny 2010-07-01 10:33:13 UTC
(In reply to comment #16)
> I'm not sure why you think this bug should be related to the amount of memory
> given to dom0.  The xend.log file is clear:
> 
> [2008-02-18 17:54:50 xend 3324] INFO (XendCheckpoint:351) Failed allocation for
> dom 9: 262144 extents of order 0
> [2008-02-18 17:54:50 xend 3324] INFO (XendCheckpoint:351) ERROR Internal error:
> Failed to allocate memory for 4194304 KB to dom 9.
> 
> If you still don't believe that, you can lower maxmem progressively from 4096
> and see when the domain starts correctly.  If it starts around 4096 - dom0-mem,
> then the bug subject is correct.  If it starts around 2048 or so, then the bug
> subject is incorrect and dom0 memory might be a clue.
> 
> (The test can be worthwhile in any case, because it can help understanding the
> behavior of Xen when restoring succeeds but memory != maxmem).    

Well, this is the issue only on ia64 and honestly I don't know what's going on on ia64 platform since I was having hard time to reserve an ia64 box. Nevertheless I was thinking something similar like you do but according to comment 4 it's the issue only on ia64 and not other platforms so I think this is the bug in some platform-specific code.

Michal

Comment 18 Miroslav Rezanina 2010-10-14 12:23:35 UTC
Checking the code shows that restore on ia64 platform is handled in following way:

1. reserve maxmem pages
2. restore page from stored image
3. free unused pages

That's the reason why we see this problem. It is not bug, it is defined behavior on ia64. I asked upstrem why it is handled this way.

Comment 19 Miroslav Rezanina 2010-10-20 12:38:04 UTC
Described behavior is expected one and is handle same way in upstream. We are not going to change it - closing this bz.