Bug 233543
Summary: | Random panics running as a paravirtualized guest of RHEL 5.0 | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Mark Plaksin <happy> | ||||
Component: | kernel-xen | Assignee: | Chris Lalancette <clalance> | ||||
Status: | CLOSED ERRATA | QA Contact: | |||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 4.5 | CC: | clalance, daniel.fosselius, ddutile, larsaj, xen-maint | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | RHBA-2007-0791 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2007-11-15 16:22:45 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 234251 | ||||||
Attachments: |
|
Description
Mark Plaksin
2007-03-23 01:12:53 UTC
Please provide following info: -- xen config file (from /etc/xen/) -- /var/log/xen/xend.log -- /var/log/xen/xend-debug.log 32-bit guest on 32-bit hypervisor/kernel or 64-bit guest on 64-bit hypervisor/kernel total memory in system (config file should show guest mem allocation) cat /proc/cpuinfo TIA... Don We moved on long ago. When this happened I talked to Red Hat support and they said "probably fixed in the soon-to-be-released 4.5 but you can't have that to test it out." So I gave up. I'd resolve the bug but I'm not sure what status is appropriate. Please leave the bug open; I think I now have a fix, and I will need it for tracking. Thanks! Chris Lalancette I believe we need a combination of this c/s: http://xenbits.xensource.com/xen-unstable.hg?cs=c6efd6c2feaa Along with fixing up "xen_pfn_to_cr3" in drivers/xen/core/smpboot.c to fix this properly. Chris Lalancette This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. OK. I was able to figure out how to reliably reproduce it here: 1) 16GB box 2) Create one guest that is large (say, 7200MB) 3) "xm info | grep free_memory" 4) Create a second guest that is exactly the size from the last command 5) OOPs! Now that I can do it reliably, I'll try out a few things to see whether this is truly fixed already. Chris Lalancette Tested so far on RHEL 5.0 dom0, xm info reports 15359MB total memory, dom0 clamped to 512MB of memory: a) 2 RHEL 4.5 guests, one 7200MB, the second 7524MB to make sure free_memory == 0. Result: first domain starts properly, second panic's with stack trace from earlier in this BZ. b) 1 RHEL 4.5, 1 4.6 guest, same sizes as above. Result: first one starts properly, second panic's when trying to execve() init. c) 2 RHEL 5 guests, same sizes as above. Result: both boot OK. So it seems while we are coming up towards limits in the HV, this may still be a problem with the RHEL-4 kernel. I'm tracking down the latest failure in the 4.6 kernel, I'll update when I have more. Chris Lalancette OK. I've narrowed this one down to some of the start-of-day code for the guest. In particular, it's not always telling the HV the correct address of startup_32; I think this manifests itself on large memory because of some wraparound or something like that, but I haven't confirmed 100% yet. Regardless, even with a fix like 5.0 has, I'm still having minor problems. The patch should end up being fairly simple, I just have to work through the remainder of the problem. Chris Lalancette Created attachment 159394 [details]
Fix for the > 4GB issue
My last update was kind of correct, but now I have a much better idea about
what is going on now. Basically there are two bugs here:
1) We are not telling the hypervisor that it is allowed to put our pagetable
stuff over 4GB. I believe this is restricting the amount of low memory it has
available for this.
2) We are not correctly saving and restoring the entire cr3 value on task
switch. This causes some bits to be lost and bad things to happen.
Both of these problems should be fixed by the attached patch.
Chris Lalancette
*** Bug 247545 has been marked as a duplicate of this bug. *** committed in stream U6 build 55.23. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/ (In reply to comment #17) > committed in stream U6 build 55.23. A test kernel with this patch is available > from http://people.redhat.com/~jbaron/rhel4/ > That solved the problem! Thanks a million! An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0791.html *** Bug 246702 has been marked as a duplicate of this bug. *** |