+++ This bug was initially created as a clone of Bug #622413; see also bug #504278 +++ Created attachment 437556 [details] rhel5u5-ia64-pv Description of problem: On ia64 platform, the rhel5.5 pv guest will show error messages in xend.log when its maxmem > memory. "xm save" can still be successful despite the errors. Version-Release number of selected component (if applicable): xen-3.0.3-86.el5 kernel-xen-2.6.18-211.el5 How reproducible: Always Steps to Reproduce: 1.Edit a rhel5.5 pv guest to set maxmem > memory 2.Create the rhel5.5 pv guest: [host]# xm create rhel5u5-ia64-pv 3.Do "xm save" for this rhel5.5 pv guest [host]# xm save rhel5u5-ia64-pv save_rhel xend.log gets messages like this INFO (XendCheckpoint:375) cannot map mfn page 20000 gpfn 20000: Invalid argument and 'xm dmesg' have corresponding messages like this (XEN) /builddir/build/BUILD/kernel-2.6.18/xen/include/asm/mm.h:180:d0 Error pfn 7c0fa: rd=f000000007c58080, od=0000000000000000, caf=0000000000000000, taf=00000 00000000000 You get more or less of those messages depending on the 'memory' parameter. For example (one message per unmapped page). memory=256 -> 9 messages memory=512 -> 17 messages memory=1024 -> 33 messages memory=2048 -> 65 messages Furthermore the first mfn of the series of unmapped pages changes with 'memory'. memory=256 -> 4000 memory=512 -> 8000 memory=1024 -> 10000 memory=2048 -> 20000 So there's a hole in the page map that gets created when 'memory' < 'maxmem' and it's size and location depend on the value of 'memory'. Still digging to try to find out why. --- Additional comment from drjones on 2010-08-13 04:08:17 EDT --- (In reply to comment #10) > Today I try rhel5.5 HVM guest on ia64 platform with xen 115 build, "xm save" > still fail even if maxmem=memory. I'm not sure if hvm guests have ever worked for save+restore on ia64, or if it's even supposed to be supported. We could look for old bugs or try again with older versions to see if that's a regression, but if so, that's a different bug. This bug will focus on the PV save+restore. --- Additional comment from drjones on 2010-08-13 09:43:54 EDT --- A little more data to end the week on -------------------------------------------------------------------------- maxmem=512, memory=256 (XEN) pte present from 3bf8 (3bf8) to 4009. 411 pages. (XEN) invalid mfns from 4009 to 8000. 3ff7 pages. (XEN) pte present total = 00003fcb (XEN) invalid mfns total = 00004035 (XEN) total = 00008000 (total pages = 00008000) and have these xend logs [2010-08-13 09:21:13 xend 2928] INFO (XendCheckpoint:375) cannot map mfn page 4000 gpfn 4000: Invalid argument [2010-08-13 09:21:13 xend 2928] INFO (XendCheckpoint:375) cannot map mfn page 4001 gpfn 4001: Invalid argument [2010-08-13 09:21:13 xend 2928] INFO (XendCheckpoint:375) cannot map mfn page 4002 gpfn 4002: Invalid argument [2010-08-13 09:21:13 xend 2928] INFO (XendCheckpoint:375) cannot map mfn page 4003 gpfn 4003: Invalid argument [2010-08-13 09:21:13 xend 2928] INFO (XendCheckpoint:375) cannot map mfn page 4004 gpfn 4004: Invalid argument [2010-08-13 09:21:13 xend 2928] INFO (XendCheckpoint:375) cannot map mfn page 4005 gpfn 4005: Invalid argument [2010-08-13 09:21:13 xend 2928] INFO (XendCheckpoint:375) cannot map mfn page 4006 gpfn 4006: Invalid argument [2010-08-13 09:21:13 xend 2928] INFO (XendCheckpoint:375) cannot map mfn page 4007 gpfn 4007: Invalid argument [2010-08-13 09:21:13 xend 2928] INFO (XendCheckpoint:375) cannot map mfn page 4008 gpfn 4008: Invalid argument -------------------------------------------------------------------------- maxmem=256 == memory=256 (XEN) pte present from 3bf8 (3bf8) to 4000. 408 pages. (XEN) pte present total = 00003f9f (XEN) invalid mfns total = 00000061 (XEN) total = 00004000 (total pages = 00004000) No xend logs -------------------------------------------------------------------------- So everything would work if the invalid mfns started at 4000, because there's already code that handles that in libxc. So the question is why do they start at 4009? Note, the "memory=256 -> 4000" is clear to me now. Before I was thinking 4k pages and it didn't make as much sense, but the ia64 is using 16k pages (0x4000*16k = 256M). --- Additional comment from drjones on 2010-08-13 10:32:47 EDT --- This is probably due to the page directory. The number of "problematic" pages relative to the amount of the 'memory' var adds up. PAGE_SHIFT = 14 PTRS_PER_PGD = (1<<(PAGE_SHIFT-3)) = 2k 2k * 16k = 32M 256M/32M = 8 Still need to figure out why it's "problematic". --- Additional comment from bburns on 2010-08-31 09:08:48 EDT --- Moved to consider for 5.7.
While this was an interesting problem that'd I'd like to get to the bottom of, we haven't see any reports of it from outside of Red Hat QE. Likely because as long as memory == maxmem there isn't a problem. I'm closing this as WONTFIX as part of our effort to eliminate low priority issues. It can be reopened in the event the priority is increased.