Bug 316371

Summary: 32-bit PAE HV hardware limitation > 4GB memory
Product: Red Hat Enterprise Linux 5 Reporter: Bhavna Sarathy <bnagendr>
Component: kernel-xenAssignee: Bhavna Sarathy <bnagendr>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: high    
Version: 5.1CC: bburns, bstein, ddomingo, frank.arnold, poelstra, rdoty, thomas.woller, xen-maint
Target Milestone: ---Keywords: OtherQA
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2008-0314 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-21 14:57:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 222082, 247190, 253746, 391221    
Attachments:
Description Flags
patch that overcomes the 4GB guest PAE limitation
none
PAE sanitize e820 below 4GB
none
Fail attempts to add pages to guest pseudophys memory map above 4GB when running with AMD NPT on PAE host none

Description Bhavna Sarathy 2007-10-03 02:31:31 UTC
Description of problem:
In 32bit PAE hypervisor, nested paging can only translate 32bit guest virtual 
address. If guest is PAE with >4GB memory, its page table entry (virtual 
address) will index >4GB space. That will crash the guests because of the 
wrong translation. Keir Fraser will fix this in 3.2+ and indicated that we can 
disclose it in User Manual. A patch was pushed into SuSE to fix the problem. 
We want to know RedHat's position on this issue. 

Anyway it is not a hardware/software bug. It is just some hardware limitation 
needing special attention.

Comment 1 Bhavna Sarathy 2007-10-03 18:31:51 UTC
Release notes for RHEL5.1: 

RHEL 5.1 supports Rapid Virtualization Indexing (RVI) in both 64-bit, 32-bit and
32-bit PAE kernels. There is a hardware limitation for 32bit PAE hypervisor
wherein RVI can only translate 32bit guest virtual addresses. If guest is
running PAE kernel with >3840 MB memory, a wrong address translation will
result, and can  crash the guests.  Users are suggested to use 64-bit kernel if
they want to run guests with more than 4GB physical memory under RVI. 

End of release notes

The patch that works in xen-unstable, and 3.1.1 is having trouble working in the
RHEL5.1 Xen code base.

Comment 2 Bhavna Sarathy 2007-10-03 19:04:53 UTC
Let's use Nested paging in the release notes as RVI is marketing jargon and
technical folks may not have heard of it.  

Comment 3 Bhavna Sarathy 2007-10-23 18:14:14 UTC
Brian, does 5.1 contain the release notes?  This is still being debugged and can
be moved to R5.2.  But the release notes is for 5.1.

Comment 4 Don Domingo 2007-10-24 02:35:09 UTC
added to RHEL5.1 release notes updates:

<quote>
Rapid Virtualization Indexing (RVI) is now supported on 64-bit, 32-bit, and
32-bit PAE kernels. However, RVI can only translate 32-bit guest virtual
addresses on the 32-bit PAE hypervisor.

As such, if a guest is running a PAE kernel with more than 3840MB of RAM, a
wrong address translation error will occur. This can crash the guest.

It is recommended that you use the 64-bit kernel if you intend to run guests
with more than 4GB of physical RAM under RVI.
</quote>

please advise if any further revisions are required. thanks!

Comment 5 Daniel Berrangé 2007-10-24 13:53:06 UTC
We definitely need to get the fix for this issue from Kier / Xen 3.2.0 into the
5.2 tree.  Without this fix we can't enable NPT by default. If someone has the
patch please attach it to this ticket.


Comment 6 Bhavna Sarathy 2007-10-25 19:52:00 UTC
Created attachment 237931 [details]
patch that overcomes the 4GB guest PAE limitation 

This patch works on HV versions 3.0.4 and above.   This patch doesn't work very
well with the 3.1 RHEL5.1 code base, not sure just yet if the 3.1.1 upgrade
made a difference.   Comments regarding the patch are very welcome.

Comment 7 Bhavna Sarathy 2007-10-25 20:34:35 UTC
The patch basically removed e820 map entry beyond 4GB space and guests will not
see >4GB physical space.   Is this acceptable and does this work with 3.1.1?

Comment 8 Chris Lalancette 2007-10-25 20:52:20 UTC
Hm.  Unfortunately this makes life pretty poor for users; either they have to
choose HAP, and have all of their guests are truncated to 4GB, or they have to
not use HAP to get larger guests.  If it's a bug in the silicon, then there is
nothing  we can do about it; however, it would be nice if we could make life a
little bit better for users.  At one point I asked Tom if it was possible to
make HAP a per-guest configuration option; he seemed to think it was possible,
but didn't really have a use.  This problem with the silicon seems to argue for
that use; if you make HAP a per-guest option, then you can enable HAP by
default, and then let the users choose on a guest-by-guest basis whether they
want > 4GB or HAP in that guest.  Also, we shouldn't silently truncate the guest
map; at the very least we should have some sort of printk() saying as much.

Chris Lalancette

Comment 9 Bhavna Sarathy 2007-10-26 13:32:13 UTC
Created attachment 239131 [details]
PAE sanitize e820 below 4GB

Comment 10 Bhavna Sarathy 2007-10-26 13:36:32 UTC
The attached file is the patch for fixing >4GB issue under Xen. It is
directly applicable on RHEL 5.1 tree Snapshot 52. The following testing has been
done:

1. HAP ON
- bigsmp linux kernel with 5000MB memory
** boot well; guest saw ~3800MB physical memory (/proc/meminfo)

- bigsmp linux kernel with 3500MB memory
** boot well; guest saw 3500MB physical memory

- WinXP with 5000MB physical memory
** boot well; windows saw ~3800MB physical memory

- WinXP with 3000MB physical memory
** boot well; windows XP saw 3000MB physical memory

2. HAP OFF
- bigsmp linux kernel with 5000MB memory
** boot well; guest saw 5000MB physical memory

- bigsmp linux kernel with 3500MB memory
** boot well; guest saw 3500MB physical memory

- WinXP with 5000MB physical memory
** boot well; windows saw ~3800MB physical memory

- WinXP with 3000MB physical memory
** boot well; windows XP saw 3000MB physical memory

In summary, it works well as expected.

By far, removing entries in e820 is the best solution we can think of
for a 3.1 HV.  Another possibility is to disable PAE bit for guest CPUID,
but this approach had issues with bigsmp Linux guest.



Comment 11 Bhavna Sarathy 2007-10-26 13:39:12 UTC
(In reply to comment #8)
> Hm.  Unfortunately this makes life pretty poor for users; either they have to
> choose HAP, and have all of their guests are truncated to 4GB, or they have to
> not use HAP to get larger guests.  If it's a bug in the silicon, then there is
> nothing  we can do about it; however, it would be nice if we could make life a
> little bit better for users.  At one point I asked Tom if it was possible to
> make HAP a per-guest configuration option; he seemed to think it was possible,
> but didn't really have a use.  This problem with the silicon seems to argue for
> that use; if you make HAP a per-guest option, then you can enable HAP by
> default, and then let the users choose on a guest-by-guest basis whether they
> want > 4GB or HAP in that guest.  Also, we shouldn't silently truncate the guest
> map; at the very least we should have some sort of printk() saying as much.
> 
> Chris Lalancette

NP is not per guest, we thought of it in the very beginning and suggested it to
Keir at XS.  But the per guest idea was killed by Keir.  He indicated that NP
would be enabled by default (which is the case right now) since it had many
advantages over shadow paging.  So he felt there was no need for a per-guest
configuration.

Comment 12 Russell Doty 2007-10-26 13:57:19 UTC
Should we also take the approach that the system will automatically change the
guest memory allocation down to 4GB and notify the user that this has been done?

Comment 13 Chris Lalancette 2007-10-26 14:53:51 UTC
(In reply to comment #11)
> NP is not per guest, we thought of it in the very beginning and suggested it to
> Keir at XS.  But the per guest idea was killed by Keir.  He indicated that NP
> would be enabled by default (which is the case right now) since it had many
> advantages over shadow paging.  So he felt there was no need for a per-guest
> configuration.

Understood, but at the time, Keir probably wasn't aware of this particular
limitation.  I think it makes sense to:

1)  Try to do per-guest HAP upstream.  That is, default to whatever is specified
on the HV command-line, but let guests override it.  That way customers can do
per-guest either HAP or 4GB on 32-bit.

2)  If upstream won't accept per-guest HAP, then fail domain creation with >
4GB, with an error message for the user.  That way we won't get support calls
saying "I gave my guest 6GB, but it only came up with 4GB".

Chris Lalancette

Comment 14 Bhavna Sarathy 2007-10-26 16:30:59 UTC
Keir should be aware of the silicon limitation since customers can use 64-bit
hypervisor to run their >4GB PAE guests it's not as big a deal.  Feel free to 
talk to him to see if he is aware of the limitation and if he has changed him mind.

I posted the email to virtuallist as you suggested, unfortunately it's showing
up mangled.   Are you representing the Red Hat virt team view? 

Comment 15 Bhavna Sarathy 2007-11-01 15:38:23 UTC
I want to sum up all the discussions both internal and external.

Keir Fraser has said that NP is enabled by default in 64-bit and 32-bit 
HVs but hesitates to enable NP in 32-bit PAE as it reduces functionality.

AMD has decided that since this is design limitation with 32-bit PAE and
customers would naturally want to use > 4GB memory with PAE, NP can be 
disabled by default.  Customers will have the option of enabling it.

We have submitted patches to xen-unstable that do not create >4GB guests which 
Bill, Chris have had a chance to review already.   Keir did not want to add a 
descriptive printk such as 

"Guest creation failed while using hardware assisted paging, please ensure
 your guest physical memory is below 4GB, or switch over to 64-bit HV".

Red Hat will have to add the verbose printk and carry the patch.

Patches in xen-unstable: 
http://xenbits.xensource.com/xen-unstable.hg?rev/c7d5d229f191

http://xenbits.xensource.com/xen-unstable.hg?rev/2717128cbdd1

Bill, since you are working on rebasing to the bugfix release 3.1.2,
could you incorporate these patches as well?

This is build fix that Keir put in
http://xenbits.xensource.com/xen-unstable.hg?rev/e2d76fb12ae2

Please evaluate if this is a xen-unstable build fix or we will need it
in R5.2.


Comment 16 Daniel Berrangé 2007-11-01 15:49:39 UTC
I agree with Kier that adding a hypervisor printk is not much use, as this is
invisible to the end user.

Is there a way for us to determine from userspace that  HAP is enabled ?  If so,
then we should add a check in XenD for HAP and > 4 GB PAE guest. This would
allow XenD to return an nice error message directly to the application, which
would immediately be seen by the user.


Comment 17 Chris Lalancette 2007-11-01 15:55:40 UTC
Dan,
     Agreed that a user-space print is the absolute best way to go, if there is
a way to do it.  If not, a hypervisor printk can be picked up by "xm dmesg",
which at the very least allows support to review the logs and make
recommendations without bothering Engineering.

Chris Lalancette

Comment 18 Bhavna Sarathy 2007-11-02 14:39:47 UTC
Bill will incorporate the patches into the rebase or submit as discussed.

Comment 19 Bhavna Sarathy 2007-12-05 21:30:47 UTC
Created attachment 278811 [details]
Fail attempts to add pages to guest pseudophys memory map above
4GB when running with AMD NPT on PAE host

Changeset 16279 has the fix that we would want in R5.2.  Nested Paging will be
enabled by default for 64-bit.	If users choose to use NP as the default then
we want this patch so prevent a guest crash.

Comment 21 Don Zickus 2008-01-23 22:07:26 UTC
in 2.6.18-73.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 22 Don Domingo 2008-01-23 23:51:13 UTC
added to RHEl5.2 release notes under "Resolved Issues":

<quote>
A wrong address translation (which can lead to a crashed guest) no longer occurs
if a guest is running a PAE kernel with more than 3,840MB of RAM. As such, you
no longer need to use the 64-bit kernel if you intend to run guests with more
than 4GB of physical RAM under Rapid Virtualization Indexing (RVI).
</quote>
please advise if any revisions are required. thanks!

Comment 24 Tom Woller 2008-03-13 13:42:03 UTC
one note. unstable and 3.2.1 base now has "per domain HAP support", (although 
we are having issues with >4Gig guests for shadow paging).  we are NOT asking 
for a backport, :) just fyi.

Comment 25 John Poelstra 2008-03-21 03:54:25 UTC
Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot1--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you

Comment 26 Don Domingo 2008-04-02 02:12:54 UTC
Hi,
the RHEL5.2 release notes will be dropped to translation on April 15, 2008, at
which point no further additions or revisions will be entertained.

a mockup of the RHEL5.2 release notes can be viewed at the following link:
http://intranet.corp.redhat.com/ic/intranet/RHEL5u2relnotesmockup.html

please use the aforementioned link to verify if your bugzilla is already in the
release notes (if it needs to be). each item in the release notes contains a
link to its original bug; as such, you can search through the release notes by
bug number.

Cheers,
Don

Comment 27 John Poelstra 2008-04-02 21:36:30 UTC
Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot3--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you


Comment 28 John Poelstra 2008-04-09 22:43:10 UTC
Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot4--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you


Comment 29 Frank Arnold 2008-04-14 18:47:16 UTC
Trying to start a guest with hap enabled and giving it more than 4GB of memory
fails, as expected.

xm create fails and prints...
Error: (1, 'Internal error', 'Could not allocate memory for HVM guest.\n (16 =
Device or resource busy)')

xm dmesg log states...
(XEN) p2m.c:675: Dom1 failed to populate memory beyond 4GB: remove \047hap\047
Xen boot parameter.

Comment 30 Don Domingo 2008-04-15 01:22:12 UTC
as per Comment#29, please advise if we need to retract "Issue Resolved" release
note for this bug (quoted in Comment#22). Deadline for RHEL5.2 release notes is
close of business hours today. thanks!

Comment 31 Bhavna Sarathy 2008-04-15 01:31:54 UTC
Please retract, no need for release notes.

Comment 32 Don Domingo 2008-04-15 02:08:24 UTC
thanks Bhavana. reinstating old "Known Issue" text, appearing in RHEL5.2 under
Feature Updates => Virtualization => Known Issues:

<quote>
Rapid Virtualization Indexing (RVI) is supported on 64-bit, 32-bit, and
32-bit PAE kernels. However, RVI can only translate 32-bit guest virtual
addresses on the 32-bit PAE hypervisor.

As such, if a guest is running a PAE kernel with more than 3840MB of RAM, a
wrong address translation error will occur. This can crash the guest.

It is recommended that you use the 64-bit kernel if you intend to run guests
with more than 4GB of physical RAM under RVI.
</quote>

please advise if any further revisions are required. thanks!

Comment 33 Bhavna Sarathy 2008-04-16 13:57:55 UTC
Since we have sorted out the 4GB PAE issue and fixed the crash that's mentioned
above, this should not be under "Known Issues".  The 4GB PAE is a hardware
limitation and is not a bug.  If you must add release notes, then this would be
more appropriate.

<quote>
Rapid Virtualization Indexing (RVI) is supported on 64-bit, 32-bit, and
32-bit PAE kernels. However, RVI can only translate 32-bit guest virtual
addresses on the 32-bit PAE hypervisor.

As such, if a guest is running a PAE kernel with more than 3840MB of RAM, 
the host will print out an error message "Dom x failed to populate memory 
beyond 4GB: remove hap Xen boot parameter."

It is recommended that you use the 64-bit kernel if you intend to run guests
with more than 4GB of physical RAM under RVI.
</quote>



Comment 34 Don Domingo 2008-04-16 22:26:48 UTC
thanks Bhavana, that clears it up for me. as such, i'm removing this from the
release notes; i intend to push this to a kbase instead. 

clearing all RHEL5.2 relnotes flags. 

Comment 36 errata-xmlrpc 2008-05-21 14:57:00 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html