233151 – crash fails to read RHEL-5 FV core dump files collected from xm dump-core

Bug 233151 - crash fails to read RHEL-5 FV core dump files collected from xm dump-core

Summary: crash fails to read RHEL-5 FV core dump files collected from xm dump-core

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	crash
Sub Component:
Version:	5.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Dave Anderson
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-03-20 17:59 UTC by Chris Lalancette
Modified:	2009-06-11 12:06 UTC (History)
CC List:	3 users (show)
Fixed In Version:	RHBA-2007-0553
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-11-07 19:09:59 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2007:0553	0	normal	SHIPPED_LIVE	crash bug fix update and enhancement	2007-10-30 15:13:31 UTC

Description Chris Lalancette 2007-03-20 17:59:37 UTC

Description of problem:

The summary pretty much says it all.  I have a RHEL-5 x86_64 dom0 running a
RHEL-5 x86_64 fully virtualized guest.  When I run "xm dump-core -C <dom>", it
properly dumps a core to /var/lib/xen/dump.  However, trying to view this core
with crash with the command:

crash /usr/lib/debug/lib/modules/2.6.18-8.el5/vmlinux
2007-0320-1343.48-rhel5fv.11.core

Gives the following output:

crash 4.0-3.20
Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006  Fujitsu Limited
Copyright (C) 2006  VA Linux Systems Japan K.K.
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

crash: /usr/lib/debug/lib/modules/2.6.18-8.el5/vmlinux and
2007-0320-1343.48-rhel5fv.11.core do not match!

Usage:
  crash [-h [opt]][-v][-s][-i file][-d num] [-S] [mapfile] [namelist] [dumpfile]
Enter "crash -h" for details.

Running with "-d3" on the crash command line shows gobley-gook for the
linux_banner screen.

Comment 1 Chris Lalancette 2007-03-20 18:02:36 UTC

Oh, and as mentioned by Dave; running crash like the following:

# crash --machdep phys_base=0x200000
/usr/lib/debug/lib/modules/2.6.18-8.el5/vmlinux 2007-0320-1343.48-rhel5fv.11.core

Will actually make it work.  This has to do with the relocatable kernel, and
crash not realizing it is relocatable in the xen dump-core case.

Chris Lalancette

Comment 2 Dave Anderson 2007-03-20 18:09:20 UTC

> Will actually make it work.  This has to do with the relocatable kernel, and
> crash not realizing it is relocatable in the xen dump-core case.

Right -- example:

# /usr/bin/crash --machdep phys_base=0x200000 vm*

crash 4.0-3.21
Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

NOTE: setting phys_base to: 0x200000

GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

      KERNEL: vmlinux
    DUMPFILE: vmcore
        CPUS: 1
        DATE: Tue Mar 20 13:43:46 2007
      UPTIME: 00:01:18
LOAD AVERAGE: 1.18, 0.51, 0.18
       TASKS: 79
    NODENAME: dhcp83-123.boston.redhat.com
     RELEASE: 2.6.18-8.el5
     VERSION: #1 SMP Fri Jan 26 14:15:14 EST 2007
     MACHINE: x86_64  (2600 Mhz)
      MEMORY: 488 MB
       PANIC: ""
         PID: 0
     COMMAND: "swapper"
        TASK: ffffffff802d1ae0  [THREAD_INFO: ffffffff803ba000]
         CPU: 0
       STATE: TASK_RUNNING (ACTIVE)
     WARNING: panic task not found

crash>

Comment 3 Dave Anderson 2007-03-20 18:29:55 UTC

The support for determining the actual physical address base in relocatable
x86_64 kernels is currently done in the appropriate manner below:

1. live kernels -> determine from /proc/iomem "Kernel code:" value 
2. compressed diskdump format -> physical base is contained in dumpfile header
3. kdump vmcore format -> the kernel text's PT_LOAD segment contains the actual
   physical base address of the unity-mapped text segment.

There is no support for xendump dumpfiles, so it defaults to using the old
non-relocatable standard base physical address of zero.  By using the
"--machdep phys_base=0x200000" command line option, that value is used as
an override.

As it turns out, there is nothing in the xendump header that indicates that
the kernel has been relocated.  The vmlinux file does have clues that the
kernel is a relocatable kernel, but it's not entirely clear to me how the
kernel decides where to relocate itself physically.  The 2MB offset is
"typical", and it may be such that if the memory exists there -- as it would
true in a fully-virtualized xen environment, that the 2MB offset could be
hardwired.

TBD...

Comment 4 Dave Anderson 2007-03-20 18:52:53 UTC

Our relocatable RHEL5 x86_64 kernels have a .text start virtual address of
__START_KERNEL_map (fffffff80000000).  If it's a RHEL5 xen PV kernel, the
address is fffffff80200000, because of this:

/* XEN x86_64 don't work with relocations yet quintela */
#ifdef CONFIG_X86_64_XEN
  . = __START_KERNEL_map + 0x200000;
#else
  . = __START_KERNEL_map;
#endif
  phys_startup_64 = startup_64 - LOAD_OFFSET;
  _text = .;                    /* Text and read-only data */

So in this case, it's a non-xen, relocatable, kernel xendump -- with a .text
starting address of fffffff80000000.  So even though we can narrow it down,
and hardwire the phys_base to 2MB, I still don't know where it's determined
that it magically offsets to 2MB...

Comment 5 Dave Anderson 2007-03-20 19:20:54 UTC

Sent a query off to Vivek Goyal to find out how and where the 2MB 
physical base is determined (plus adding him to the cc list).

Comment 6 Dave Anderson 2007-03-20 20:05:01 UTC

> There is no support for xendump dumpfiles, so it defaults to using the old
> non-relocatable standard base physical address of zero.

Just to be clear, crash defaults to using the old "phys_base" offset value of
zero, which for "old" non-relocatable kernels, means that the physical address
associated with a unity-mapped __START_KERNEL_map can be determined directly
without having to further manipulate it with a "phys_base" offset.  So that
the address translation code can be "common", it ends up harmlessly adding an
offset of zero (phys_base) to the stripped unity-mapped address.

In other words, the "old" non-relocatable kernels would be actually loaded at
physical address 2MB, but in those kernels, the __START_KERNEL_map value
of 0xfffffff80200000 was unity-mapped, and therefore reflected the hardwired
base physical address of 0x200000.  (And that is still done in RHEL5 PV xen
kernels).

But the (FV) relocatable kernel has a __START_KERNEL_map value that is no
longer unity-mapped, and so a "phys_base" value needs to be (1) determined,
and (2) applied to the offset part of the __START_KERNEL_map identifier.

Comment 7 Dave Anderson 2007-03-21 14:32:10 UTC

Here is my response to the query I sent to Vivek:

Subject: Re: relocatable x86_64 kernel question
   Date: Wed, 21 Mar 2007 09:20:51 -0500
   From: Dave Anderson <anderson>
     To: vgoyal.com, clalance

Vivek Goyal wrote: 

> On Tue, Mar 20, 2007 at 02:16:54PM -0500, Dave Anderson wrote: 
> > 
> > Hi Vivek, 
> > 
> > I've got an interesting question... 
> > 
> > For the first time, we've got a customer who has decided to 
> > run a standard RHEL5 relocatable x86_64 kernel as a 
> > fully-virtualized xen guest.  (I don't know why they 
> > don't run the supplied para-virtualized RHEL5 kernel as 
> > the guest, since the performance would be much better). 
> > 
> > In any case, since it's such an odd-ball use of xen, we 
> > never tested xendumps of fully-virtualized RHEL5 kernels, 
> > i.e., only RHEL3 and RHEL4 fully-virtualized kernels were 
> > tested.  And as it turns out, the crash utility fails to 
> > determine what the "phys_base" value is, and defaults 
> > to zero.  That's because, during initialization, the crash 
> > x86_64_calc_phys_base() function only has code to 
> > determine the "phys_base" value from: 
> > 
> >  1. /proc/iomem for live systems. 
> >  2. compressed diskdump files have the phys_base 
> >     value in the header. 
> >  3. kdump vmcore's have the PT_LOAD segment p_paddr 
> >     field for the __START_KERNEL_MAP section. 
> > 
> > But there's no code for xendumps (since we never expected 
> > to ever see such a thing), so it defaults to zero.  And BTW, 
> > there's nothing in the xendump header that yields any clues. 
> > 
> > Anyway, it's easy enough to work around by passing the 
> > physical base address on the crash command line like so: 
> > 
> > # crash --machdep phys_base=0x200000 vmlinux xendump 
> > 
> > And I used 2MB because that's typical.  But therein 
> > lies my question... 
> > 
> > How does the "typical" 2MB phys_base get selected?  Both 
> > in a "normal" environment, and here in a xen environment? 
> > 

> Hi Dave, 

> I remember that Don had disabled relocatable kernel support for 
> Xen kernels as somehow they were not able to boot xen kernels 
> with relocatable kernel patches. So he did not apply relocatable 
> kernel patch while building xen kernels. Has it been enabled now? 
>  

No -- the RHEL5 vmlinux.lds.S file has this: 

/* XEN x86_64 don't work with relocations yet quintela */ 
#ifdef CONFIG_X86_64_XEN 
  . = __START_KERNEL_map + 0x200000; 
#else 
  . = __START_KERNEL_map; 
#endif 
  phys_startup_64 = startup_64 - LOAD_OFFSET; 
  _text = .;                    /* Text and read-only data */ 

So, for xen kernels, "_text" is ffffffff80200000, and the load 
the load location appears to be hardwired to 2MB, as was always 
the case before relocatable kernels came into existence. 

I see in the kernel code that physical page 0 cannot be 
used for a kernel load location because, from head.S: 

 * Page 0 is deliberately kept safe, since System Management Mode code in 
 * laptops may need to access the BIOS data stored there.  This is also 
 * useful for future device drivers that either access the BIOS via VM86 
 * mode. 

And therefore the first usable 2MB "big-page" mapping for the 
kernel text/data would obviously be at physical address 2MB. 

From my limited understanding, the RHEL5 relocatable kernel code 
does know where it's loaded, but the very first thing it does 
is to determines where it is, calculating the "phys_base" value 
from that.  The kernel finds itself running in "startup_64", which 
is the label of the first address in the text section, and 
immediately does this: 

        /* Compute the delta between the address I am compiled to run at and
the 
         * address I am actually running at. 
         */ 
        leaq    _text(%rip), %rbp 
        subq    $_text - __START_KERNEL_map, %rbp 

and %rbp is left containing the delta.  Later on, this is 
done to (possibly) modify "phys_base", which was initialized 
to zero: 

        /* Fixup phys_base */ 
        addq    %rbp, phys_base(%rip) 

However, the xen kernel has a different "startup_64" function, 
and "phys_base" does not exist as a symbol.  So, it runs at 
the address that was compiled to run at, i.e., unity-mapped 
at __START_KERNEL_map. 

But this is not a xen kernel issue, but rather where does 
the relocatable kernel get loaded in a xen environment. 
I know it's at 2MB, but I'm curious as to where that 
decision is made. 

> > Note that in a xen environment, the RHEL5 kernel is unaware 
> > of the the real physical memory environment, since the xen 
> > hypervisor simply supplies it a "pseudo-physical" flat-memory 
> > model of a configured size that starts at a pseudo-physical 
> > address of 0.  And it sets phys_base to 2MB.  Where and how 
> > does that get done? 
> > 
>
> For native kernels, in RHEL5, we are building the kernel for 
> physical address 0 and we generally load that kernel at physical 
> address 2MB. That's why phys_base gets filled with 2MB value. 

Right... 

> I am not sure about Xen kernels. I have never looked into those. But 
> by going through the same logic, standard RHEL5 kernel has been built 
> for 0 MB address and probably Xen loader loads it at 2MB address (in 
> pseudo physical address space) and that's why phy_base is 2MB. 

Right -- the loader makes the determination.  But I 
was wondering if you knew where/how that was done. 

It appears that 2MB used to be the hardwired default 
for x86_64 kernels (and still is for xen kernels), but 
relocatable kernels allow the loader to put it wherever 
it wants, and the kernel itself immediately determines 
the phys_base offset.  But, the *primary* kernel still 
seems to get loaded by default at 2MB -- although the kernel 
now has to "figure that out".  It's only the secondary kexec 
kernel the ever gets loaded at somewhere other than 2MB, 
i.e., at the "crashkernel" address. 

Thinking about that way, it seems that the crash utility 
wouldn't really need to figure out the phys_base offset, 
if it's a given that the primary kernel will *always* be 
loaded at 2MB.  But that's the root of my question, is 
it even possible that the primary kernel could *ever* be 
loaded at a phys_base offset other than 2MB? 

> Frankly speaking, I don't have too much of idea about how all this 
> pseudo physical address to actual physical address thing work. 
>
> How does the ELF core header of the non-paravirtualized guest look 
> like? In this case also, shouldn't PT_LOAD program header give hints 
> about phys_base, similar to native ones? 

Well -- that's the issue at hand -- it's not an ELF 
dumpfile, but a xendump dumpfile, and it doesn't yield 
any clues about the load address.  (They are changing 
the xendump format to ELF upstream, but they are not 
using PT_LOAD program headers to describe memory, but 
rather a unique usage of section headers...) 

In any case, I believe that it's safe to assume that 
2MB will be the load location for fully-virtualized 
RHEL5 kernels running in a xen environment. 

Thanks, 
  Dave 

> Thanks 
> Vivek

Comment 8 Dave Anderson 2007-03-21 15:12:01 UTC

Further clarification/correction w/respect to "traditional" x86_64
load locations.  In all x86_64 kernels, START_KERNEL_map is
equal to ffffffff80000000.  That is the virtual address of the
first 2MB big-page that contains static kernel text/data.

In RHEL3 and RHEL4 kernels, the kernel was loaded at 1 MB,
and the first physical "big-page" at physical address 0 was
mapped at ffffffff80000000.  However, the physical address load
location of the kernel text was hardwired into the unity-mapped
virtual address space, i.e., "_text" symbol was equal to ffffffff80100000.

In RHEL5 xen kernels, a similar thing is done, but the kernel
is loaded at 2MB, the physical address load location is hardwired
into the unity-mapped virtual address space, and so the xen "_text"
value is ffffffff80200000.

In RHEL5 relocatable kernels, the "_text" symbol is ffffffff80000000,
but it does not take the "phys_base" offset into account.  So the
kernel has to immediately determine where it's running, and calculate
the "phys_base" offset.

This all being the case, with respect to the crash utility, I am
proceeding with this assumption:

  If it's a xendump file, and the "phys_base" symbol exists, and the
  "_text" symbol is equal to __START_KERNEL_map (no hardwired offset),
  then we know that it's a relocatable kernel running as a xen guest.
  In that case, apply a crash utility "phys_base" of 2MB.

Comment 9 Dave Anderson 2007-03-21 15:55:33 UTC

I'm going with this crash patch:

--- x86_64.c.orig       2007-03-20 12:37:12.000000000 -0400
+++ x86_64.c    2007-03-21 10:35:01.000000000 -0400
@@ -4250,6 +4250,21 @@ x86_64_calc_phys_base(void)

                return;
        }
+
+       if (XENDUMP_DUMPFILE() && (text_start == __START_KERNEL_map)) {
+               /*
+                *  Xen kernels are not relocable (yet) and don't have the
+                *  "phys_base" entry point, so this must be a xendump of a
+                *  fully-virtualized relocatable kernel.  No clues exist in
+                *  the xendump header, so hardwire phys_base to 2MB and hope
+                *  for the best.
+                */
+               machdep->machspec->phys_base = 0x200000;
+               if (CRASHDEBUG(1))
+                       fprintf(fp,
+                           "default relocatable default phys_base: %lx\n",
+                               machdep->machspec->phys_base);
+       }
 }

Comment 10 RHEL Program Management 2007-03-21 21:44:04 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 11 Dave Anderson 2007-04-11 18:23:49 UTC

Fix for RHEL5 x86_64 FV xendumps is in upstream version 4.0-3.23:

  http://people.redhat.com/anderson

Comment 12 Dave Anderson 2007-04-11 19:08:49 UTC

(In reply to comment #11)
> Fix for RHEL5 x86_64 FV xendumps is in upstream version 4.0-3.23:
> 
>   http://people.redhat.com/anderson
> 
> 

Make that 4.0-3.22...

Comment 17 errata-xmlrpc 2007-11-07 19:09:59 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0553.html

Note You need to log in before you can comment on or make changes to this bug.