634502 – PV domain xc_save reports error messages in xend.log on ia64 platform when maxmem > memory

Bug 634502 - PV domain xc_save reports error messages in xend.log on ia64 platform when maxmem > memory

Summary: PV domain xc_save reports error messages in xend.log on ia64 platform when ma...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel-xen
Sub Component:
Version:	5.6
Hardware:	ia64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Xen Maintainance List
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:	622413
Blocks:	514490
TreeView+	depends on / blocked

Reported:	2010-09-16 07:37 UTC by Paolo Bonzini
Modified:	2011-05-31 12:43 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:	622413
Environment:
Last Closed:	2011-05-31 12:43:11 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Paolo Bonzini 2010-09-16 07:37:23 UTC

+++ This bug was initially created as a clone of Bug #622413; see also bug #504278 +++

Created attachment 437556 [details]
rhel5u5-ia64-pv

Description of problem:
On ia64 platform, the rhel5.5 pv guest will show error messages in xend.log when its maxmem > memory. "xm save" can still be successful despite the errors.

Version-Release number of selected component (if applicable):
xen-3.0.3-86.el5
kernel-xen-2.6.18-211.el5

How reproducible:
Always

Steps to Reproduce:
1.Edit a rhel5.5 pv guest to set maxmem > memory
2.Create the rhel5.5 pv guest:
  [host]# xm create rhel5u5-ia64-pv
3.Do "xm save" for this rhel5.5 pv guest
  [host]# xm save rhel5u5-ia64-pv save_rhel
  
xend.log gets messages like this

INFO (XendCheckpoint:375) cannot map mfn page 20000 gpfn 20000: Invalid argument

and 'xm dmesg' have corresponding messages like this

(XEN) /builddir/build/BUILD/kernel-2.6.18/xen/include/asm/mm.h:180:d0 Error pfn 
7c0fa: rd=f000000007c58080, od=0000000000000000, caf=0000000000000000, taf=00000
00000000000

You get more or less of those messages depending on the 'memory' parameter. For example (one message per unmapped page).

memory=256  ->  9 messages
memory=512  -> 17 messages
memory=1024 -> 33 messages
memory=2048 -> 65 messages

Furthermore the first mfn of the series of unmapped pages changes with 'memory'.

memory=256  ->  4000
memory=512  ->  8000
memory=1024 -> 10000
memory=2048 -> 20000


So there's a hole in the page map that gets created when 'memory' < 'maxmem' and it's size and location depend on the value of 'memory'. Still digging to try to find out why.

--- Additional comment from drjones on 2010-08-13 04:08:17 EDT ---

(In reply to comment #10)
> Today I try rhel5.5 HVM guest on ia64 platform with xen 115 build, "xm save"
> still fail even if maxmem=memory.

I'm not sure if hvm guests have ever worked for save+restore on ia64, or if it's even supposed to be supported. We could look for old bugs or try again with older versions to see if that's a regression, but if so, that's a different bug. This bug will focus on the PV save+restore.

--- Additional comment from drjones on 2010-08-13 09:43:54 EDT ---


A little more data to end the week on

--------------------------------------------------------------------------
maxmem=512, memory=256

(XEN) pte present from 3bf8 (3bf8) to 4009. 411 pages.
(XEN) invalid mfns from 4009 to 8000. 3ff7 pages.
(XEN) pte present total   = 00003fcb
(XEN) invalid mfns total  = 00004035
(XEN) total               = 00008000 (total pages = 00008000)

and have these xend logs

[2010-08-13 09:21:13 xend 2928] INFO (XendCheckpoint:375) cannot map mfn page
4000 gpfn 4000: Invalid argument
[2010-08-13 09:21:13 xend 2928] INFO (XendCheckpoint:375) cannot map mfn page
4001 gpfn 4001: Invalid argument
[2010-08-13 09:21:13 xend 2928] INFO (XendCheckpoint:375) cannot map mfn page
4002 gpfn 4002: Invalid argument
[2010-08-13 09:21:13 xend 2928] INFO (XendCheckpoint:375) cannot map mfn page
4003 gpfn 4003: Invalid argument
[2010-08-13 09:21:13 xend 2928] INFO (XendCheckpoint:375) cannot map mfn page
4004 gpfn 4004: Invalid argument
[2010-08-13 09:21:13 xend 2928] INFO (XendCheckpoint:375) cannot map mfn page
4005 gpfn 4005: Invalid argument
[2010-08-13 09:21:13 xend 2928] INFO (XendCheckpoint:375) cannot map mfn page
4006 gpfn 4006: Invalid argument
[2010-08-13 09:21:13 xend 2928] INFO (XendCheckpoint:375) cannot map mfn page
4007 gpfn 4007: Invalid argument
[2010-08-13 09:21:13 xend 2928] INFO (XendCheckpoint:375) cannot map mfn page
4008 gpfn 4008: Invalid argument

--------------------------------------------------------------------------
maxmem=256 == memory=256

(XEN) pte present from 3bf8 (3bf8) to 4000. 408 pages.
(XEN) pte present total   = 00003f9f
(XEN) invalid mfns total  = 00000061
(XEN) total               = 00004000 (total pages = 00004000)

No xend logs

--------------------------------------------------------------------------

So everything would work if the invalid mfns started at 4000, because there's
already code that handles that in libxc. So the question is why do they start
at 4009?

Note, the "memory=256 -> 4000" is clear to me now. Before I was thinking 4k
pages and it didn't make as much sense, but the ia64 is using 16k pages
(0x4000*16k = 256M).

--- Additional comment from drjones on 2010-08-13 10:32:47 EDT ---

This is probably due to the page directory. The number of "problematic" pages relative to the amount of the 'memory' var adds up.

PAGE_SHIFT = 14
PTRS_PER_PGD = (1<<(PAGE_SHIFT-3)) = 2k
2k * 16k = 32M
256M/32M = 8

Still need to figure out why it's "problematic".

--- Additional comment from bburns on 2010-08-31 09:08:48 EDT ---

Moved to consider for 5.7.

Comment 1 Andrew Jones 2011-05-31 12:43:11 UTC

While this was an interesting problem that'd I'd like to get to the bottom of, we haven't see any reports of it from outside of Red Hat QE. Likely because as long as memory == maxmem there isn't a problem. I'm closing this as WONTFIX as part of our effort to eliminate low priority issues. It can be reopened in the event the priority is increased.

Note You need to log in before you can comment on or make changes to this bug.