vmalloc_sync_all() needs to iterate over PMDs not PGDs in the Xen+PAE case. Upstream fix is http://hg.uk.xensource.com/linux-2.6.18-xen.hg?cs=34ebf92ad28d This only impacts RHEL5 not RHEL4u5, as far as I can tell.
change QA contact
Created attachment 288921 [details] linux-2.6.18-xen 136:34ebf92ad28d ported to 2.6.18-53.el5
Can you provide a test case that shows a failure, so the fix can be verified? TIA.
Sorry but I don't really know much about the specifics of this one. The ML thread which caused the fix to be made starts at http://marc.info/?l=xen-devel&m=118122881300249. I suspect that to repro you just have to be unlucky at boot time.
The function vmalloc_sync_all in arch/i386/mm/fault-xen.c is under an "#ifndef CONFIG_X86_PAE", meaning that that function is never used in the RHEL5 code base, which only has Xen kernels with PAE. One reason for this is that the kernel PMD is shared between all PGDs in an i386 PAE configuration, so there is nothing to sync. include/asm-i386/mach-xen/asm/pgtable-3level.h has this define: #define vmalloc_sync_all() ((void)0) This means that the changeset is not needed for RHEL5 (which explains why we never ran into vmalloc_sync_all related bugs).
You are right, sorry for not noticing this sooner. http://xenbits.xensource.com/xen-unstable.hg?rev/c5ff7671b9f2 made vmalloc_sync_all mean something for PAE but that was only required because of http://xenbits.xensource.com/xen-unstable.hg?rev/c4ed5b740a8d and you have neither patch in RHEL 5. Since c4ed5b740a8d was just a cleanup (I think) all this can be ignored.