Bug 504187 - RHEL3 FV guests with >4 GB of memory panic when running x86_64 kernel (em32t runs OK)
RHEL3 FV guests with >4 GB of memory panic when running x86_64 kernel (em32t ...
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen (Show other bugs)
5.3
All Linux
high Severity high
: rc
: 5.5
Assigned To: Xen Maintainance List
Virtualization Bugs
:
Depends On:
Blocks: 499522 489024 5.4/TechnicalNotes
  Show dependency treegraph
 
Reported: 2009-06-04 12:45 EDT by Ian McLeod
Modified: 2010-10-23 05:58 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
The Red Hat Enterprise Linux 3 kernel does not include SWIOTLB support. SWIOTLB support is required for Red Hat Enterprise Linux 3 guests to support more than 4GB of memory on AMD Opteron and Athlon-64 processors. Consequently, Red Hat Enterprise Linux 3 guests are limited to 4GB of memory on AMD processors.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-11-19 09:16:14 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
boot messages (8.27 KB, text/plain)
2009-06-04 12:56 EDT, Ian McLeod
no flags Details
/proc/cpuinfo when running the x86_64 (AMD) kernel (1.01 KB, text/plain)
2009-06-08 08:16 EDT, Ian McLeod
no flags Details
/proc/cpuinfo when running the em32t (Intel) kernel (1.04 KB, text/plain)
2009-06-08 08:17 EDT, Ian McLeod
no flags Details
lspci -vvv from the RHEL 5.3 DOM0 (40.68 KB, application/octet-stream)
2009-06-08 10:25 EDT, Ian McLeod
no flags Details

  None (edit)
Description Ian McLeod 2009-06-04 12:45:12 EDT
Description of problem:

RHEL3 FV x86_64 guest running on underlying AMD chips panics during boot when guest is given more than ~4 GB of memory.  This does not happen if the underlying CPU is Intel.

The panic also does not occur if the Intel-specific 64 bit RHEL3 kernel is used inside of the FV guest.

The error seen at boot time when running the AMD RHEL3 kernel is:

Freeing unused kernel memory: 232k freed
INIT: version 2.85 booting
Kernel panic: pci_map_single: high address but no IOMMU.

Will attach the full boot messages as a file.


Version-Release number of selected component (if applicable):

5.3 Xen components
RHEL3 U9 with latest errata kernel in guest.

No paravirt drivers loaded at time of crash.
Comment 1 Ian McLeod 2009-06-04 12:56:02 EDT
Created attachment 346569 [details]
boot messages
Comment 4 Daniel Berrange 2009-06-05 10:41:27 EDT
One idea we had is that perhaps the AMD64 guest kernel is seeing a CPU flag it shouldn't, causing it to trip up later.  Could you capture the /proc/cpuinfo from the guest, when running x86_64 and again when running emt64 kernel,  while on the AMD x86_64 host. And also attach the host's /proc/cpuinfo
Comment 5 Ian McLeod 2009-06-08 08:16:54 EDT
Created attachment 346858 [details]
/proc/cpuinfo when running the x86_64 (AMD) kernel
Comment 6 Ian McLeod 2009-06-08 08:17:47 EDT
Created attachment 346859 [details]
/proc/cpuinfo when running the em32t (Intel) kernel
Comment 7 Ian McLeod 2009-06-08 08:18:25 EDT
Dan,

requested cpuinfo attached
Comment 8 Bhavna Sarathy 2009-06-08 10:00:58 EDT
what system was this issue seen on?    From the panic is appears to be a non-IOMMU system?   Please attach lspci output.
thx
Comment 9 Daniel Berrange 2009-06-08 10:16:50 EDT
The only difference in CPU flags between those two files was the extra 'sse3' flag, so that idea doesn't seem to be relevant here after all.
Comment 10 Ian McLeod 2009-06-08 10:25:58 EDT
Created attachment 346885 [details]
lspci -vvv from the RHEL 5.3 DOM0
Comment 11 Bhavna Sarathy 2009-06-19 13:04:20 EDT
This panic was caused by agp-gart code in linux kernel 2.4.24. It seems 
that, not only Xen but also vmware has this issue as well. Pls refer to
(http://kb.vmware.com/selfservice/viewContent.do?externalId=2269&sliceId=1)

Try iommu={off, soft} in guest grub, and if this does not work, then update guest kernel to enable swiotlb.  To force swiotlb, the kernel must been build with swiotlb support.
Comment 12 Bhavna Sarathy 2009-06-26 13:45:11 EDT
Any updates here??

Mount RHEL3 images and put a 2.4.37 linux kernel which has swiotlb support compiled into it.  We were able to boot the guest with 5G guest memory using the new kernel now.  To check whether swiotlb has been enabled, find following message from dmesg of guest:
"PCI-DMA: Using SWIOTLB"
Comment 14 Ian McLeod 2009-07-20 13:19:11 EDT
No combination of the suggested iommu options prevent the hang.  Putting a 2.4.37 kernel into a RHEL3 userspace is not a viable option in this case.
Comment 15 Ryan Lerch 2009-07-23 19:25:36 EDT
This Bug blocks the Technical Notes 5.4 tracker (bug 513501)
It will be documented as a Known Issue in the 5.4 Release Documentation (in the Technical Notes document)
Comment 16 Ryan Lerch 2009-08-18 00:19:46 EDT
This Bug has been marked as requiring a Known Issue note in the Technical Notes
for 5.4.

Would it be possible to get a draft for this in the release notes field above?

cheers,
ryanlerch
Comment 17 Bhavna Sarathy 2009-08-19 14:43:59 EDT
RHEL3 guest does not have SWIOTLB support; limit guest memory <= 4GB while running on AMD processors.
Comment 18 Russell Doty 2009-08-19 15:08:52 EDT
I created a draft release note based on comment 17; please review.
Comment 19 Russell Doty 2009-08-19 15:08:52 EDT
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
The RHEL3 kernel does not include SWIOTLB support, which is necessary to allow RHEL 3 guests to support more than 4GB of memory on AMD Opteron and Athlon-64 processors. RHEL 3 guests are limited to  4GB or less of memory while running on AMD processors.
Comment 21 Ryan Lerch 2009-08-20 18:46:55 EDT
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1 @@
-The RHEL3 kernel does not include SWIOTLB support, which is necessary to allow RHEL 3 guests to support more than 4GB of memory on AMD Opteron and Athlon-64 processors. RHEL 3 guests are limited to  4GB or less of memory while running on AMD processors.+The Red Hat Enterprise Linux 3 kernel does not include SWIOTLB support. SWIOTLB support is required for Red Hat Enterprise Linux 3 guests to support more than 4GB of memory on AMD Opteron and Athlon-64 processors. Consequently, Red Hat Enterprise Linux 3 guests are limited to 4GB of memory on AMD processors.
Comment 22 Russell Doty 2009-08-21 14:15:31 EDT
Oops, I think we are overlooking a critical element!

Since this is a PAE issue, it only impacts RHEL 3 32 bit guests, not 64 bit guests. RHEL 3 64 bit guests support more memory, 128GB according to the system limits.

AMD, please confirm that this is only an issue for 32 bit guests.

Recommend changing wording in release note to "RHEL 3 X86 32 bit guests".
Comment 23 Russell Doty 2009-08-21 16:31:54 EDT
AMD has corrected my assertions in comment 22; this is _not_ a 32 bit or PAE issue. Please disregard comment #22 and make no changes to the release note.
Comment 27 Ronald Pacheco 2009-11-19 09:16:14 EST
After extensive engineering analysis by Red Hat and AMD, the root cause of this issue is the known limitations of RHEL 3, which was documented with the RHEL 5.4 release notes.

Here is the text we published:

The Red Hat Enterprise Linux 3 kernel does not include SWIOTLB support. SWIOTLB support is required for Red Hat Enterprise Linux 3 guests to support more than 4GB of memory on AMD Opteron and Athlon-64 processors. Consequently, Red Hat Enterprise Linux 3 guests are limited to 4GB of memory on AMD processors.


The ultimate resolution to this bug is a product enhancement to RHEL3.  Unfortunately, we do not deliver product enhancements when a product is in Production Phase 3 (see http://www.redhat.com/security/updates/errata/).
Comment 29 Paolo Bonzini 2010-04-08 11:50:40 EDT
This bug was closed during 5.5 development and it's being removed from the internal tracking bugs (which are now for 5.6).
Comment 30 Chris Lalancette 2010-07-19 09:37:38 EDT
Clearing out old flags for reporting purposes.

Chris Lalancette

Note You need to log in before you can comment on or make changes to this bug.