Bug 678308 - kexec kernel crashes due to use of reserved memory range
Summary: kexec kernel crashes due to use of reserved memory range
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kexec-tools
Version: 5.3
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Cong Wang
QA Contact: Han Pingtian
URL:
Whiteboard:
Depends On:
Blocks: 682085
TreeView+ depends on / blocked
 
Reported: 2011-02-17 14:37 UTC by Takuma Umeya
Modified: 2018-11-14 14:31 UTC (History)
8 users (show)

Fixed In Version: kexec-tools-1_102pre-131_el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-02-21 03:17:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
the panic message (2.19 KB, text/plain)
2011-02-17 14:37 UTC, Takuma Umeya
no flags Details
Patch against latest upstream (8.16 KB, patch)
2011-03-01 06:08 UTC, Cong Wang
no flags Details | Diff
Patch against latest RHEL5 (13.90 KB, patch)
2011-03-01 09:08 UTC, Cong Wang
no flags Details | Diff
Proposed patch to fix the issue on PAE (3.27 KB, patch)
2011-04-19 15:34 UTC, Cong Wang
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2012:0152 0 normal SHIPPED_LIVE Moderate: kexec-tools security, bug fix, and enhancement update 2012-02-21 07:24:50 UTC

Description Takuma Umeya 2011-02-17 14:37:44 UTC
Created attachment 479340 [details]
the panic message

Description of problem:
kexec kernel crashes during its boot. It considers what was marked as reserved range during the normal boot as "usable" and this apparently crashed the kernel. The customer hit this issue with 5.3, but further investigation by the vendor shows this symptom is not resolved in RHEL 5.6. The hardware is BladeSymphony 320A5, which is certified with kbase exception with 3rd party devices. 

Version-Release number of selected component (if applicable):
- OS: RHEL5.4.6(x86_64) 2.6.18-164.15.1.el5 (was present with 5.6 as well). 
- Related package: kexec-tools-1.102pre-77.el5.3 

How reproducible:
Always

Steps to Reproduce:
1. Setup kdump
2. Run "service kdump start"
3. Run "echo c > /proc/sysrq-trigger" 
  
Actual results:
kexec kernel crashes. 

Expected results:
Should succeed in capturing vmcore. 

Additional info:

Comment 2 Neil Horman 2011-02-17 15:52:45 UTC
That range gets added unilaterally because some bios-es erroneously report it as reserved, but still required it to be usable for booting to work.  I recommend we add a command line switch to enable/disable unilateral adding of this range.  Amerigo, can you implement that?

Comment 8 Vivek Goyal 2011-02-22 14:34:57 UTC
(In reply to comment #2)
> That range gets added unilaterally because some bios-es erroneously report it
> as reserved, but still required it to be usable for booting to work.  I
> recommend we add a command line switch to enable/disable unilateral adding of
> this range.  Amerigo, can you implement that?

If Bios's erroneously report it as reserved and if kernel requires first 640KB to boot, then first kernel boot will also fail? Or kernel has a mechanism to determine that BIOS is wrong and override it selectively? If kernel has a mechanism then we can use that in kexec-tools also.

I had added 640K block because different BIOSes were reporting first 640K or 1MB in different format. 

To me it makes sense to trust the BIOS information and use that memory map. If BIOS says that first 640KB is reserved, so be it. A command line switch to change
this behavior will be more useful when BIOS is giving us wrong information and
we want to override it. But that will lead us back to the question of how first
kernel itself manges to boot if BIOS is buggy on those machines.

Making a command line will make the whole operation more manual and somebody will have to do that job of parsing the memory map anyway and pass the command line option accordingly. So why not let kexec-tools do that.

And we are trusting BIOS for rest of the meory map and pass as it is to second kernel anyway. So why not trust BIOS for first 640KB.

Comment 9 Cong Wang 2011-02-22 15:23:13 UTC
Vivek, I don't think we have any way to know if BIOS is wrong, kernel is responsible for this, not the userspace.

To me, it is totally the kernel's job to provide a correct memory map for kexec-tools to use, otherwise it should be a kernel bug, so I am fine with the patch in comment #1.

I am wondering why this problem isn't exposed before, seems the current code works fine for most cases. Since you wrote the original code, you should know better than me here. :)

Comment 11 Vivek Goyal 2011-02-22 15:55:38 UTC
(In reply to comment #9)
> Vivek, I don't think we have any way to know if BIOS is wrong, kernel is
> responsible for this, not the userspace.

I think in general we should be able to trust /proc/iomem. In the past there
was a discussion that we should export BIOS provided memory map raw to user
space and then /proc/iomem is the BIOS map + kernel modifications. I think BIOS
raw memory map moved to debugfs and some firmware entries there. I would not remember though.
> 
> To me, it is totally the kernel's job to provide a correct memory map for
> kexec-tools to use, otherwise it should be a kernel bug, so I am fine with the
> patch in comment #1.


I do not think patch in comment #1 will work as it is for all the situations.

    - We required first 640K for kernel to boot (atleast in the past). So until
      and unless we have a information that this is no more required, we should
      not get rid of it by default. IIUC, above patch will not provide first
      640K of memory to second kernel on all the machines.

    - So we need to modify the patch in such a way so that all the non-reserved
      memory in first 640K in /proc/iomem is given to second kernel for use. And
      in the process we need to make sure we have taken care of right vmcore ELF
      header generation and have taken care of copying right backup area in
      purgatory. 

> 
> I am wondering why this problem isn't exposed before, seems the current code
> works fine for most cases. Since you wrote the original code, you should know
> better than me here. :)


I think because this is the first machine we have encountered where accessing
first 640KB is a problem and it is supposed to be reserved area. On may laptop
/proc/iomem looks as follows.

00000000-00000fff : reserved
00001000-0009efff : System RAM
0009f000-0009ffff : reserved

So Most of the first 640KB is usable except some memory at the beginning and
some at the end.

Having said that, I think fixing all this in generic manner will be little
more changes which should first get soaked upstream and then pulled into
rhel5.

So I would say that lets put generic fix in kexec-tools upstream and if it turns out to be big and sounds risky then for rhel5, you can push a command
line kind of shortcut where first 640K is bypassed if user passes some option.

So this command line fix is only can be a workaround for rhel5 and not a generic fix for upstream.

Comment 17 Cong Wang 2011-03-01 06:08:04 UTC
Created attachment 481527 [details]
Patch against latest upstream

Comment 21 Cong Wang 2011-03-01 09:08:51 UTC
Created attachment 481566 [details]
Patch against latest RHEL5

 kexec/arch/i386/crashdump-x86.c          |   81 ++++++++++++++++++++++---------
 kexec/arch/i386/include/arch/options.h   |    4 +
 kexec/arch/i386/kexec-x86.h              |    1 
 kexec/arch/x86_64/crashdump-x86_64.c     |   68 ++++++++++++++++++++++----
 kexec/arch/x86_64/include/arch/options.h |    5 +
 kexec/arch/x86_64/kexec-x86_64.c         |   14 ++++-
 kexec/kexec.h                            |    2 
 purgatory/arch/i386/crashdump_backup.c   |   17 +++++-
 8 files changed, 157 insertions(+), 35 deletions(-)

Comment 45 Dave Anderson 2011-04-18 12:29:56 UTC
> RHEL6 doesn't have this bug on the same machine.

I'm not sure I agree.  If the PMD page used for translating vmalloc
addresses is not one of the suspect physical page(s), then crash
wouldn't complain during initialization.  There would just be
a "corruption" lurking, and depending upon how the page(s) get
used, you might never see a problem. 

Again, it was by luck that the crash utility happened to bump into
that page, because vmalloc address translations happened to use
physical address 12000.  

If you do a "vtop" on a RHEL6 vmalloc address in a RHEL6 dumpfile,
what does it show?

Comment 46 Vivek Goyal 2011-04-18 13:03:02 UTC
Amerigo,

It might be a good idea to introduce some debug capability in kexec so that
one can easily print out the elf headers generated for second kernel. I am specifically interested in knowing that we setup backup elf header properly
to point to right backup region in reserved area.

Also we need to debug vmcore to make sure it is parsing headers properly and getting contents from backup area properly. May be run crash on live system and on vmcore and compare the contents of some page in backup area.

Comment 47 Dave Anderson 2011-04-18 13:25:41 UTC
(In reply to comment #46)
> Amerigo,
> 
> It might be a good idea to introduce some debug capability in kexec so that
> one can easily print out the elf headers generated for second kernel. I am
> specifically interested in knowing that we setup backup elf header properly
> to point to right backup region in reserved area.
> 
> Also we need to debug vmcore to make sure it is parsing headers properly and
> getting contents from backup area properly. May be run crash on live system and
> on vmcore and compare the contents of some page in backup area.

I just reserved an i386 RHEL6 machine, and was planning to do just that.

I'll write a little crash command to copy specific parts of memory to a
file just before crashing the system, and compare them to what's gets
copied to the dumpfile.  

Note that I *did* do that with physical page 12000 on a non-PAE RHEL5
i386 machine, and I did see a slight change in the page contents.
It wasn't perceived as a problem however, because that page is not
used as a PMD page in a non-PAE kernel.  And that is why I don't believe
that it's "not a bug" on RHEL6. 

However, the problem is that it is possible that the pages may get
used by the live system immediately prior to the forced crash (say,
during during the live-copy-file-creation activity), because it is
not clear what the page(s) were being used for.  But it would still be interesting to see what happens.

What memory range should I compare?  Is this the only range that
is of interest:
 
  00010000-0009afff : System RAM

Comment 48 Vivek Goyal 2011-04-18 14:17:28 UTC
Dave,

Yes we are interested only in first 640K. So on your system 00010000-0009afff will be of interest.

I understand that contents of this range may change. Still if there are some pages which match the contents then we know that we are not completely off.

Comment 49 Dave Anderson 2011-04-18 14:43:22 UTC
(In reply to comment #48)
> Dave,
> 
> Yes we are interested only in first 640K. So on your system 00010000-0009afff
> will be of interest.
> 
> I understand that contents of this range may change. Still if there are some
> pages which match the contents then we know that we are not completely off.

Ok, I've initially just written a "test" command that calculates and displays
a checksum of the contents of each page in that range.  Doing it twice in a
row on a live system shows that none of the pages changed, even if I stop and
start the crash utility.  I just created a crash input file that does this:

 test > one
 test > two
 !echo c > /proc/sysrq-trigger

I'll compare "one" against "two", and then "two" against the contents of the
dumpfile.  Should be interesting...

Comment 50 Dave Anderson 2011-04-18 18:13:29 UTC
Running RHEL5 PAE kernel, I did the test above, where each page in the
range was read and a checksum of the page's contents calculated.  I did
the test twice in a row, and then immediately crashed the system.  And
then I did the same test on the subsequent vmcore.

The contents of the pages were identical when reading them twice on the
live system: 

 # diff one two
 # 

But some of the pages were different in the dumpfile:

 # diff two dump
 5,6c5,6
 < 11000: 34a14b53  12000: 3f3e4330  13000: 4dd2159c  14000: 7c680459  
 < 15000: d2dd22fe  16000: 4d834b52  17000: 6b0304a1  18000: 204e4649  
 ---
 > 11000: 00000000  12000: 3e67e6bc  13000: edb20429  14000: fc6945bc  
 > 15000: 00000000  16000: 4d834b52  17000: 6b0304a1  18000: 204e4649  
 39c39
 < 99000: 8421966b  9a000: 00000000  9b000: 00000000  9c000: 00000000  
 ---
 > 99000: 8421966b  9a000: 00000000  9b000: 00000000  9c000: 88128a6f  
 #

So these pages were different:

  11000 12000 13000 14000 15000 and 9c000
 
Since there was no difference between "one" and "two" files, there
is no obvious explanation for them changing in the dumpfile.

Comment 51 Dave Anderson 2011-04-18 18:37:59 UTC
I tested the RHEL5 non-PAE kernel the same way.  In that kernel, the
pages did not change on the live system on two subsequent reads.  And
while physical page 12000 is not crucial to the crash utility (not a
PMD page) in that kernel, several pages did change in the dump:

 # diff one two
 # diff two dump
 5c5
 < 11000: 34a15b53  12000: 4a3a19f5  13000: 4b3a596e  14000: d157ae0c  
 ---
 > 11000: 00000000  12000: 6a1b29e5  13000: 00000000  14000: d157ae0c  
 39c39
 < 99000: 8421966b  9a000: 00000000  9b000: 00000000  9c000: 00000000  
 ---
 > 99000: 8421966b  9a000: 00000000  9b000: 00000000  9c000: 88128a6f  
 #

changed pages: 11000 12000 13000 9c000

Comment 52 Vivek Goyal 2011-04-18 19:05:27 UTC
Because some of the pages are matching checksum, so may be we are not setting the backup region pointer properly in elf header and may be vmcore is reading pages from backup region from original memory (now belongs to second kernel), and may be second kernel has modiefied some of the pages.

Amerigo, few printk in vmcore and kexec-tools will be handy to figure this out.

Comment 53 Dave Anderson 2011-04-18 19:28:09 UTC
With respect to RHEL6, the results are bizarre...

The machine has kexec-tools-2.0.0-186.el6.i686, the 2.6.32-125.el6.i686 
(non-PAE) kernel, and /etc/kdump.conf configured with:
 
  core_collector makedumpfile -c

so that all pages would get dumped, and has this in /proc/iomem:

  00001000-0009ffff : System RAM

While running live, the memory checksums were identical, and there is
data in most of the pages:

# diff one two
# head one
01000: 59e700eb  02000: 00000000  03000: 00000000  04000: 00000000  
05000: 00000000  06000: 047fdefc  07000: 3132f33a  08000: e070817b  
09000: 00000000  0a000: fc681471  0b000: ffe1a2bb  0c000: fffffd00  
0d000: 00000000  0e000: 00000000  0f000: 00000000  10000: 00000000  
11000: dc994904  12000: 9e375fc4  13000: aebb8a81  14000: d38cc304  
15000: 35cebdc8  16000: 4c80ed74  17000: e61c0a14  18000: 8b84872c  
19000: 277165a2  1a000: 918d9732  1b000: 856b743a  1c000: 1227e188  
1d000: e07901e8  1e000: 4000065c  1f000: afe21057  20000: f24329e0  
21000: 845a40ba  22000: 7f031102  23000: ac127978  24000: 88d66c5c  
25000: a6fa36ba  26000: cbe13841  27000: a0212f95  28000: 00000000  
# head two
01000: 59e700eb  02000: 00000000  03000: 00000000  04000: 00000000  
05000: 00000000  06000: 047fdefc  07000: 3132f33a  08000: e070817b  
09000: 00000000  0a000: fc681471  0b000: ffe1a2bb  0c000: fffffd00  
0d000: 00000000  0e000: 00000000  0f000: 00000000  10000: 00000000  
11000: dc994904  12000: 9e375fc4  13000: aebb8a81  14000: d38cc304  
15000: 35cebdc8  16000: 4c80ed74  17000: e61c0a14  18000: 8b84872c  
19000: 277165a2  1a000: 918d9732  1b000: 856b743a  1c000: 1227e188  
1d000: e07901e8  1e000: 4000065c  1f000: afe21057  20000: f24329e0  
21000: 845a40ba  22000: 7f031102  23000: ac127978  24000: 88d66c5c  
25000: a6fa36ba  26000: cbe13841  27000: a0212f95  28000: 00000000  
#

But after crashing the system, the whole region is zero-filled:

crash> test
01000: 00000000  02000: 00000000  03000: 00000000  04000: 00000000  
05000: 00000000  06000: 00000000  07000: 00000000  08000: 00000000  
09000: 00000000  0a000: 00000000  0b000: 00000000  0c000: 00000000  
0d000: 00000000  0e000: 00000000  0f000: 00000000  10000: 00000000  
11000: 00000000  12000: 00000000  13000: 00000000  14000: 00000000  
15000: 00000000  16000: 00000000  17000: 00000000  18000: 00000000  
19000: 00000000  1a000: 00000000  1b000: 00000000  1c000: 00000000  
1d000: 00000000  1e000: 00000000  1f000: 00000000  20000: 00000000  
21000: 00000000  22000: 00000000  23000: 00000000  24000: 00000000  
25000: 00000000  26000: 00000000  27000: 00000000  28000: 00000000  
29000: 00000000  2a000: 00000000  2b000: 00000000  2c000: 00000000  
2d000: 00000000  2e000: 00000000  2f000: 00000000  30000: 00000000  
31000: 00000000  32000: 00000000  33000: 00000000  34000: 00000000  
35000: 00000000  36000: 00000000  37000: 00000000  38000: 00000000  
39000: 00000000  3a000: 00000000  3b000: 00000000  3c000: 00000000  
3d000: 00000000  3e000: 00000000  3f000: 00000000  40000: 00000000  
41000: 00000000  42000: 00000000  43000: 00000000  44000: 00000000  
45000: 00000000  46000: 00000000  47000: 00000000  48000: 00000000  
49000: 00000000  4a000: 00000000  4b000: 00000000  4c000: 00000000  
4d000: 00000000  4e000: 00000000  4f000: 00000000  50000: 00000000  
51000: 00000000  52000: 00000000  53000: 00000000  54000: 00000000  
55000: 00000000  56000: 00000000  57000: 00000000  58000: 00000000  
59000: 00000000  5a000: 00000000  5b000: 00000000  5c000: 00000000  
5d000: 00000000  5e000: 00000000  5f000: 00000000  60000: 00000000  
61000: 00000000  62000: 00000000  63000: 00000000  64000: 00000000  
65000: 00000000  66000: 00000000  67000: 00000000  68000: 00000000  
69000: 00000000  6a000: 00000000  6b000: 00000000  6c000: 00000000  
6d000: 00000000  6e000: 00000000  6f000: 00000000  70000: 00000000  
71000: 00000000  72000: 00000000  73000: 00000000  74000: 00000000  
75000: 00000000  76000: 00000000  77000: 00000000  78000: 00000000  
79000: 00000000  7a000: 00000000  7b000: 00000000  7c000: 00000000  
7d000: 00000000  7e000: 00000000  7f000: 00000000  80000: 00000000  
81000: 00000000  82000: 00000000  83000: 00000000  84000: 00000000  
85000: 00000000  86000: 00000000  87000: 00000000  88000: 00000000  
89000: 00000000  8a000: 00000000  8b000: 00000000  8c000: 00000000  
8d000: 00000000  8e000: 00000000  8f000: 00000000  90000: 00000000  
91000: 00000000  92000: 00000000  93000: 00000000  94000: 00000000  
95000: 00000000  96000: 00000000  97000: 00000000  98000: 00000000  
99000: 00000000  9a000: 00000000  9b000: 00000000  9c000: 00000000  
9d000: 00000000  9e000: 00000000  9f000: 00000000  
crash> 

How can that be?

Comment 54 Vivek Goyal 2011-04-18 19:38:10 UTC
I suspect that for some reason no copying has taken place in backup region on i386 hence all zeros. That's the bug I had pointed to in previous comments and I had fixed in your kexec-tools dave. If you use that version of kexec-tools, it might help a bit.

Comment 55 Dave Anderson 2011-04-18 19:52:53 UTC
(In reply to comment #54)
> I suspect that for some reason no copying has taken place in backup region on
> i386 hence all zeros. That's the bug I had pointed to in previous comments and
> I had fixed in your kexec-tools dave. If you use that version of kexec-tools,
> it might help a bit.

Kind of strange that there would be a difference in behaviour between the
RHEL5 and RHEL6 versions of kexec-tools, no?

Comment 56 Vivek Goyal 2011-04-18 20:16:15 UTC
Yes it is strange. I think we are looking at more than 1 bug and different bugs are hitting rhel5 and rhel6.

Comment 57 Dave Anderson 2011-04-18 20:29:57 UTC
It gets stranger...

I just ran the same test on a RHEL5 x86_64 machine:

  kernel-2.6.18-238.el5
  kexec-tools-1.102pre-131.el5

with these two entries at the top of /proc/iomem:

  00010000-0009d3ff : System RAM
  0009d400-0009ffff : reserved

On the live system, there were two pages that changed
while crash was doing the reads (14000 and 15000):

# diff one two
5,6c5,6
< 11000: 1f4f8580  12000: e003c600  13000: eed07500  14000: 8d923aa8  
< 15000: 17604046  16000: 0fa1c70e  17000: 4003d600  18000: 4003d600  
---
> 11000: 1f4f8580  12000: e003c600  13000: eed07500  14000: 8d9239fe  
> 15000: 1736911c  16000: 0fa1c70e  17000: 4003d600  18000: 4003d600  
#

But comparing them with the subsequent dumpfile showed huge
differences.  Note that the "deadbeef" entries in the "two"
file were because those pages are not RAM, and therefore
were not readable on the live system (see the /proc/iomem
above).  So I'm not entirely sure why they even show up
and are readable in the vmcore?

But more questionable are all of the differences with the dozens
of other "modified" pages:

# diff two dump
1,9c1,9
< 01000: deadbeef  02000: deadbeef  03000: deadbeef  04000: deadbeef  
< 05000: deadbeef  06000: deadbeef  07000: deadbeef  08000: deadbeef  
< 09000: deadbeef  0a000: deadbeef  0b000: deadbeef  0c000: deadbeef  
< 0d000: deadbeef  0e000: deadbeef  0f000: deadbeef  10000: 00036129  
< 11000: 1f4f8580  12000: e003c600  13000: eed07500  14000: 8d9239fe  
< 15000: 1736911c  16000: 0fa1c70e  17000: 4003d600  18000: 4003d600  
< 19000: 00f3f2fd  1a000: ffffff20  1b000: fffffe00  1c000: ffffff00  
< 1d000: 00000000  1e000: 00000000  1f000: 00000000  20000: fffbfff3  
< 21000: 00000000  22000: 00000000  23000: 00000000  24000: 3965e7be  
---
> 01000: 1f4f8580  02000: e003c600  03000: eed07500  04000: 8d923a5a  
> 05000: 17547805  06000: 0fa1c70e  07000: 4003d600  08000: 4003d600  
> 09000: 00f3f2fd  0a000: ffffff20  0b000: fffffe00  0c000: ffffff00  
> 0d000: 00000000  0e000: 00000000  0f000: 00000000  10000: fffbfff3  
> 11000: 00000000  12000: 00000000  13000: 00000000  14000: 3965e7be  
> 15000: 00000000  16000: 00000000  17000: 00000000  18000: 00000000  
> 19000: 00000000  1a000: 00000000  1b000: 00000000  1c000: a4a05682  
> 1d000: 00000000  1e000: 00000000  1f000: 00000000  20000: 00000000  
> 21000: 00000000  22000: 00000000  23000: 00000000  24000: 00000000  
11c11
< 29000: 00000000  2a000: 00000000  2b000: 00000000  2c000: a4a05682  
---
> 29000: 00000000  2a000: 00000000  2b000: 00000000  2c000: 00000000  
17c17
< 41000: 00000000  42000: 00000000  43000: 00000000  44000: 00000000  
---
> 41000: 00000000  42000: 00000000  43000: 96e10a82  44000: 00000000  
21,22c21,22
< 51000: 00000000  52000: 00000000  53000: 96e10a82  54000: 00000000  
< 55000: 00000000  56000: 00000000  57000: 00000000  58000: 00000000  
---
> 51000: 00000000  52000: 00000000  53000: 00000000  54000: 00000000  
> 55000: 781f3f6c  56000: a4e103b5  57000: 324bf759  58000: 746fcb85  
24,40c24,40
< 5d000: 00000000  5e000: 00000000  5f000: 00000000  60000: 00000000  
< 61000: 00000000  62000: 00000000  63000: 00000000  64000: 00000000  
< 65000: 781f3f6c  66000: a4e103b5  67000: 324bf759  68000: 746fcb85  
< 69000: 00000000  6a000: 00000000  6b000: 00000000  6c000: 00000000  
< 6d000: 00000000  6e000: 00000000  6f000: 00000000  70000: 61827eee  
< 71000: 2bea141a  72000: 73fced95  73000: 37a88a30  74000: 2fcf1ab7  
< 75000: 4ff9ab91  76000: 78a806d5  77000: 04ec04f8  78000: 881a47c4  
< 79000: 1fc4ddc2  7a000: 857bc1ad  7b000: 562203d5  7c000: 97d0ab30  
< 7d000: 419df3dd  7e000: 24d3e9bd  7f000: e1c36deb  80000: 00000000  
< 81000: 00000000  82000: 00000000  83000: 00000000  84000: 00000000  
< 85000: 00000000  86000: 00000000  87000: 00000000  88000: 00000000  
< 89000: 00000000  8a000: 00000000  8b000: 9d31dbd1  8c000: 00000000  
< 8d000: 00000000  8e000: 00000000  8f000: 00000000  90000: 9881149e  
< 91000: 6cb15c7e  92000: 00000000  93000: 00000000  94000: 00000000  
< 95000: 00000000  96000: 00000000  97000: 00000000  98000: a3a20fea  
< 99000: 3fa13608  9a000: 00000000  9b000: 00000000  9c000: 00000000  
< 9d000: deadbeef  9e000: deadbeef  9f000: deadbeef  
---
> 5d000: 00000000  5e000: 00000000  5f000: 00000000  60000: 61827eee  
> 61000: 2bea141a  62000: 73fced95  63000: 37a88a30  64000: 2fcf1ab7  
> 65000: 4ff9ab91  66000: 78a806d5  67000: 04ec04f8  68000: 881a47c4  
> 69000: 1fc4ddc2  6a000: 857bc1ad  6b000: 562203d5  6c000: 97d0ab30  
> 6d000: 419df3dd  6e000: 24d3e9bd  6f000: e1c36deb  70000: 00000000  
> 71000: 00000000  72000: 00000000  73000: 00000000  74000: 00000000  
> 75000: 00000000  76000: 00000000  77000: 00000000  78000: 00000000  
> 79000: 00000000  7a000: 00000000  7b000: 9d31dbd1  7c000: 00000000  
> 7d000: 00000000  7e000: 00000000  7f000: 00000000  80000: 9881149e  
> 81000: 6cb15c7e  82000: 00000000  83000: 00000000  84000: 00000000  
> 85000: 00000000  86000: 00000000  87000: 00000000  88000: a3a20fea  
> 89000: 3fa13608  8a000: 00000000  8b000: 00000000  8c000: 00000000  
> 8d000: 00000000  8e000: 64aa1d1a  8f000: e7c10a27  90000: 24328a51  
> 91000: 8367eb29  92000: 83710699  93000: 87e26201  94000: 87eb9d61  
> 95000: 838c9cd9  96000: 8395944a  97000: 839eafb9  98000: 83a7cb29  
> 99000: 83b0e699  9a000: 8822b201  9b000: 882bed61  9c000: 83cc7cd9  
> 9d000: 83d5744a  9e000: 83de8fb9  9f000: 83e7ab29  
# 

In any case, I'm going to step back and let Amerigo look into this.

FYI, the change to the crash utility's test.c file to make the "test"
command show the above is this:

diff -r1.4 test.c
44a45,68
> {
>       int i;
>       ulonglong paddr;
>       ulong *p, *start, *end;
>       uint32_t chksum;
>       char buf[4096];
> 
>       for (i = 0, paddr = 0x1000; paddr < (0x9ffff+1); paddr += 4096) {
>                 if (!readmem(paddr, PHYSADDR, buf, PAGESIZE(),
>                     "check page", QUIET|RETURN_ON_ERROR)) {
>                       chksum = 0xdeadbeef;
>               } else {
>                       start = (ulong *)&buf[0];
>                       end = (ulong *)&buf[4096];
>                       chksum = 0;
> 
>                       for (p = start; p < end; p++)
>                               chksum += *p;
>               }
>               fprintf(fp, "%05llx: %08lx  ", paddr, chksum);
>               if ((++i % 4) == 0)
>                       fprintf(fp, "\n");
>       }
> }

Comment 58 Dave Anderson 2011-04-18 21:02:05 UTC
> In any case, I'm going to step back and let Amerigo look into this.

But before I do, for my own sanity, I reverted the kexec-tools back 
to kexec-tools-1.102pre-126.el5, and ran the test again on the same
RHEL5 x86_64 machine.  

Similar to the previous run, pages 14000 and 15000 changed during
the live system reads.  So those two pages are quite active, and 
continually undergoing modification:
 
# diff one two
5,6c5,6
< 11000: 1f507580  12000: e003c600  13000: ed64d500  14000: 8d91fde5  
< 15000: 07a537f1  16000: 0fa1c70e  17000: 4003d600  18000: 4003d600  
---
> 11000: 1f507580  12000: e003c600  13000: ed64d500  14000: 8d91fda7  
> 15000: 0ced3a56  16000: 0fa1c70e  17000: 4003d600  18000: 4003d600  
#

But the difference between the live system and the dumpfile -- ignoring
the "deadbeef" entries -- the only differences were with those same two 
pages at 14000 and 15000:

# diff two dump
1,6c1,6
< 01000: deadbeef  02000: deadbeef  03000: deadbeef  04000: deadbeef  
< 05000: deadbeef  06000: deadbeef  07000: deadbeef  08000: deadbeef  
< 09000: deadbeef  0a000: deadbeef  0b000: deadbeef  0c000: deadbeef  
< 0d000: deadbeef  0e000: deadbeef  0f000: deadbeef  10000: 00036129  
< 11000: 1f507580  12000: e003c600  13000: ed64d500  14000: 8d91fda7  
< 15000: 0ced3a56  16000: 0fa1c70e  17000: 4003d600  18000: 4003d600  
---
> 01000: 74780c21  02000: a4d62ea7  03000: 55874745  04000: 3e96a858  
> 05000: fe64aad0  06000: 9ede145c  07000: 004050c6  08000: a9419b6c  
> 09000: bb432d72  0a000: 0350aedc  0b000: 9d4fe8aa  0c000: 47c84b9e  
> 0d000: 19f5acf8  0e000: 01a7faf4  0f000: c6048f06  10000: 00036129  
> 11000: 1f507580  12000: e003c600  13000: ed64d500  14000: 8d91fddf  
> 15000: 14cb5590  16000: 0fa1c70e  17000: 4003d600  18000: 4003d600  
40c40
< 9d000: deadbeef  9e000: deadbeef  9f000: deadbeef  
---
> 9d000: 9c3e9946  9e000: c45712ee  9f000: 8372a59e  
#

This patch is clearly a disaster...

Comment 59 Dave Anderson 2011-04-19 14:31:45 UTC
> Because some of the pages are matching checksum, so may be we are not setting
> the backup region pointer properly in elf header and may be vmcore is reading
> pages from backup region from original memory (now belongs to second kernel),
> and may be second kernel has modiefied some of the pages.
>
> Amerigo, few printk in vmcore and kexec-tools will be handy to figure this
> out.

I retested the x86_64 version, and like before, the 0x1000-0x9ffff region
in the dumpfile is drastically different than the pre-crash memory, where
well over 50 pages are different.  I can understand that one or two pages
may be modified just prior to the crash occurring, but that is ridiculous.

To be completely confident of a fix, I would suggest doing something
like this in the a test kernel:

 sys_kexec_load()
   allocate a 640K region (it would have to be an order-8 alloc_pages()
   request), and keep a symbolic pointer to the region.

 crash_kexec()
   just prior to calling machine_kexec(), copy the region memory into
   the allocated buffer

And then compare the two regions in the subsequent dumpfile, which
should be identical.

I'll try building/testing a kernel patch as a proof of concept.

Comment 60 Cong Wang 2011-04-19 15:34:21 UTC
Created attachment 493217 [details]
Proposed patch to fix the issue on PAE

Comment 61 Vivek Goyal 2011-04-19 15:47:19 UTC
Amerigo,

Can you explain a bit what was the problem and how did it get solved by replacing these local variables with so many elf32, elf64 and elf variables.

vivek

Comment 62 Cong Wang 2011-04-19 15:54:39 UTC
(In reply to comment #61)
Yeah, because in kexec/crashdump-elf.c we do:

                if (mstart == elf_info->backup_src_start
                    && mend == elf_info->backup_src_end)
                        phdr->p_offset  = info->backup_start;

so if elf_info->backup_src_start is not correctly initialized, the ->p_offset will not be translated to the our backup area, that is, ->backup_start.

Comment 63 Vivek Goyal 2011-04-19 16:05:32 UTC
Ok, so elf_info was not being filled in properly.

Why are we storing this info at two places? kexec_info and crash_elf_info. Can't we store it in kexec_info and use everywhere?

Comment 64 Dave Anderson 2011-04-19 18:57:52 UTC
> Proposed patch to fix the issue on PAE

I have rebuilt kexec-tools with this patch, and tested it on 
an i386 PAE kernel, and it works fine.  That's good...

However, I did the same thing with an x86_64 kernel, applied
the patch to its kexec-tools, and I still see unexplainable
differences between the live system and dumpfile.  

Note that this x86_64 machine shows:

 # head /proc/iomem
 00010000-0009d3ff : System RAM
 0009d400-0009ffff : reserved
 ...

so the first 16 pages are not even RAM.  Therefore, on the live
system, I cannot read them, so "deadbeef" has been filled
in as a marker in the "two.fix" file.  But I can read them
from the dumpfile (?) -- which should raise a red flag -- but
not only that, there are huge differences in the other pages
in the 640K region: 

# diff two.fix dump.fix
1,9c1,9
< 01000: deadbeef  02000: deadbeef  03000: deadbeef  04000: deadbeef  
< 05000: deadbeef  06000: deadbeef  07000: deadbeef  08000: deadbeef  
< 09000: deadbeef  0a000: deadbeef  0b000: deadbeef  0c000: deadbeef  
< 0d000: deadbeef  0e000: deadbeef  0f000: deadbeef  10000: 00036129  
< 11000: 5f6a2580  12000: e003c600  13000: ed6bd500  14000: 8d91a577  
< 15000: 114b5956  16000: 0fa1c70e  17000: 4003d600  18000: 4003d600  
< 19000: 01160bdd  1a000: ffffff20  1b000: fffffe00  1c000: ffffff00  
< 1d000: 00000000  1e000: 00000000  1f000: 00000000  20000: fffbfff3  
< 21000: 00000000  22000: 00000000  23000: 00000000  24000: 80fb2fbe  
---
> 01000: 5f6a2580  02000: e003c600  03000: ed6bd500  04000: 8d91a582  
> 05000: 147acb8e  06000: 0fa1c70e  07000: 4003d600  08000: 4003d600  
> 09000: 01160bdd  0a000: ffffff20  0b000: fffffe00  0c000: ffffff00  
> 0d000: 00000000  0e000: 00000000  0f000: 00000000  10000: fffbfff3  
> 11000: 00000000  12000: 00000000  13000: 00000000  14000: 80fb2fbe  
> 15000: 00000000  16000: 00000000  17000: 00000000  18000: 00000000  
> 19000: 00000000  1a000: 00000000  1b000: 00000000  1c000: a4a05682  
> 1d000: 00000000  1e000: 00000000  1f000: 00000000  20000: 00000000  
> 21000: 00000000  22000: 00000000  23000: 00000000  24000: 00000000  
11c11
< 29000: 00000000  2a000: 00000000  2b000: 00000000  2c000: a4a05682  
---
> 29000: 00000000  2a000: 00000000  2b000: 00000000  2c000: 00000000  
17c17
< 41000: 00000000  42000: 00000000  43000: 00000000  44000: 00000000  
---
> 41000: 00000000  42000: 00000000  43000: 96e10a82  44000: 00000000  
21,22c21,22
< 51000: 00000000  52000: 00000000  53000: 96e10a82  54000: 00000000  
< 55000: 00000000  56000: 00000000  57000: 00000000  58000: 00000000  
---
> 51000: 00000000  52000: 00000000  53000: 00000000  54000: 00000000  
> 55000: 781f3f6c  56000: a4e103b5  57000: 324bf759  58000: 7470b818  
24,40c24,40
< 5d000: 00000000  5e000: 00000000  5f000: 00000000  60000: 00000000  
< 61000: 00000000  62000: 00000000  63000: 00000000  64000: 00000000  
< 65000: 781f3f6c  66000: a4e103b5  67000: 324bf759  68000: 7470b818  
< 69000: 00000000  6a000: 00000000  6b000: 00000000  6c000: 00000000  
< 6d000: 00000000  6e000: 00000000  6f000: 00000000  70000: 61827eee  
< 71000: 2bea141a  72000: 73fced95  73000: 37a88a30  74000: 2fcf1ab7  
< 75000: 4ff9ab91  76000: 78a806d5  77000: f4a08f3c  78000: 881a47c4  
< 79000: 1fc4ddc2  7a000: 857bc1ad  7b000: 562203d5  7c000: 97d0ab30  
< 7d000: 419df3dd  7e000: 24d3e9bd  7f000: e1c36deb  80000: 00000000  
< 81000: 00000000  82000: 00000000  83000: 00000000  84000: 00000000  
< 85000: 00000000  86000: 00000000  87000: 00000000  88000: 00000000  
< 89000: 00000000  8a000: 00000000  8b000: 9d31dbd1  8c000: 00000000  
< 8d000: 00000000  8e000: 00000000  8f000: 00000000  90000: 9881149e  
< 91000: 6cb15c7e  92000: 00000000  93000: 00000000  94000: 00000000  
< 95000: 00000000  96000: 00000000  97000: 00000000  98000: 287e0fea  
< 99000: 3fa13608  9a000: 00000000  9b000: 00000000  9c000: 00000000  
< 9d000: deadbeef  9e000: deadbeef  9f000: deadbeef  
---
> 5d000: 00000000  5e000: 00000000  5f000: 00000000  60000: 61827eee  
> 61000: 2bea141a  62000: 73fced95  63000: 37a88a30  64000: 2fcf1ab7  
> 65000: 4ff9ab91  66000: 78a806d5  67000: f4a08f3c  68000: 881a47c4  
> 69000: 1fc4ddc2  6a000: 857bc1ad  6b000: 562203d5  6c000: 97d0ab30  
> 6d000: 419df3dd  6e000: 24d3e9bd  6f000: e1c36deb  70000: 00000000  
> 71000: 00000000  72000: 00000000  73000: 00000000  74000: 00000000  
> 75000: 00000000  76000: 00000000  77000: 00000000  78000: 00000000  
> 79000: 00000000  7a000: 00000000  7b000: 9d31dbd1  7c000: 00000000  
> 7d000: 00000000  7e000: 00000000  7f000: 00000000  80000: 9881149e  
> 81000: 6cb15c7e  82000: 00000000  83000: 00000000  84000: 00000000  
> 85000: 00000000  86000: 00000000  87000: 00000000  88000: 287e0fea  
> 89000: 3fa13608  8a000: 00000000  8b000: 00000000  8c000: 00000000  
> 8d000: 00000000  8e000: 64c2bd1a  8f000: e7c10a27  90000: 24328a51  
> 91000: 8367eb29  92000: 83710699  93000: 87e26201  94000: 87eb9d61  
> 95000: 838c9cd9  96000: 8395944a  97000: 839eafb9  98000: 83a7cb29  
> 99000: 83b0e699  9a000: 8822b201  9b000: 882bed61  9c000: 83cc7cd9  
> 9d000: 83d5744a  9e000: 83de8fb9  9f000: 83e7ab29  
# 

I'm going to retest that x86_64 machine with 
kexec-tools-1.102pre-126.el5 and see what the
region looks like before-and-after.

Comment 65 Dave Anderson 2011-04-19 19:16:19 UTC
> I'm going to retest that x86_64 machine with 
> kexec-tools-1.102pre-126.el5 and see what the
> region looks like before-and-after.

OK, with kexec-tools-1.102pre-126.el5, I see the following:

# diff two dump
1,6c1,6
< 01000: deadbeef  02000: deadbeef  03000: deadbeef  04000: deadbeef  
< 05000: deadbeef  06000: deadbeef  07000: deadbeef  08000: deadbeef  
< 09000: deadbeef  0a000: deadbeef  0b000: deadbeef  0c000: deadbeef  
< 0d000: deadbeef  0e000: deadbeef  0f000: deadbeef  10000: 00036129  
< 11000: 1f4f4580  12000: e003c600  13000: ed6bd500  14000: 8d916668  
< 15000: 10c429f0  16000: 0fa1c70e  17000: 4003d600  18000: 4003d600  
---
> 01000: b759da3b  02000: a4d62ea7  03000: 55874745  04000: 3e96a858  
> 05000: fe64aad0  06000: 9ede145c  07000: 004050c6  08000: a9419b6c  
> 09000: bb432d72  0a000: 0350aedc  0b000: 9d4fe8aa  0c000: 47c84b9e  
> 0d000: 19f5acf8  0e000: 01a7faf4  0f000: c6048f06  10000: 00036129  
> 11000: 1f4f4580  12000: e003c600  13000: ed6bd500  14000: 8d916666  
> 15000: 12ec9bd2  16000: 0fa1c70e  17000: 4003d600  18000: 4003d600  
40c40
< 9d000: deadbeef  9e000: deadbeef  9f000: deadbeef  
---
> 9d000: 9c3e9946  9e000: c45712ee  9f000: 8372a59e  
#

So with the "old" kexec-tools, we still see the non-RAM (deadbeef)
pages somehow being made available (with some kind of data) in the
vmcore.  So at least that issue is not regressing...

And the other differences above are also when taking two subsequent
readings on the live system, where the pages at 14000 and 15000 are
constantly churning:

# diff one two
6c6
< 15000: 10bdbcb6  16000: 0fa1c70e  17000: 4003d600  18000: 4003d600  
---
> 15000: 10c429f0  16000: 0fa1c70e  17000: 4003d600  18000: 4003d600  
#

So AFAICT -- with 131.el5 plus the newest patch -- the memory in 
the x86_64 640K region is very different in the dumpfile than it
was live.

Comment 66 Dave Anderson 2011-04-19 19:55:21 UTC
As far as RHEL6 i386 is concerned, I don't know how it's possible to
create an ELF vmcore with RHEL6?  I've set this in /etc/kdump.conf: 

  ext4 /dev/mapper/vg_dellpe295002-lv_root
  core_collector cp --sparse=always

But it still creates a compressed and filtered "-c -d 31" dumpfile.

Anyway, with kexec-tools-2.0.0-186.el6.i686 and this:

  # head /proc/iomem
  00000000-00000fff : reserved
  00001000-0009ffff : System RAM
  ...

the 640K range does not change on a live system between two
consecutive reads of the 640K range:  

 # diff one two
 #

But when I look at the dumpfile, I see the following differences,
where in the live system there is data in some of the pages, but
in the dumpfile they are either returning all zeroes, or are "deadbeef",
which means that the page is marked as not available in the dumpfile:

# diff two dump
1,40c1,40
< 01000: 69a4c39d  02000: 00000000  03000: 00000000  04000: 00000000  
< 05000: 00000000  06000: 047fdefc  07000: 3132f33a  08000: e070817b  
< 09000: 00000000  0a000: fc681471  0b000: 7fdda29f  0c000: fffffd00  
< 0d000: 00000000  0e000: 00000000  0f000: 00000000  10000: 00000000  
< 11000: dc994904  12000: 9e375fc4  13000: aebb8a81  14000: d38cc304  
< 15000: 35cebdc8  16000: 4c80ed74  17000: e61c0a14  18000: 8b84872c  
< 19000: 277165a2  1a000: 918d9732  1b000: 856b743a  1c000: 1227e188  
< 1d000: e07901e8  1e000: 4000065c  1f000: afe21057  20000: f24329e0  
< 21000: 845a40ba  22000: 7f031102  23000: ac127978  24000: 88d66c5c  
< 25000: a6fa36ba  26000: cbe13841  27000: a0212f95  28000: 00000000  
< 29000: 00000000  2a000: 00000000  2b000: 00000000  2c000: 00000000  
< 2d000: 00000000  2e000: 00000000  2f000: 00000000  30000: 00000000  
< 31000: 1b4910ad  32000: 8b31f46d  33000: 00000000  34000: 00000000  
< 35000: 00000000  36000: 00000000  37000: 00000000  38000: 00000000  
< 39000: 00000000  3a000: 00000000  3b000: 00000000  3c000: 00000000  
< 3d000: 00000000  3e000: 00000000  3f000: 00000000  40000: 00000000  
< 41000: 00000000  42000: 00000000  43000: 00000000  44000: 00000000  
< 45000: 00000000  46000: 00000000  47000: 00000000  48000: 00000000  
< 49000: 00000000  4a000: 00000000  4b000: 00000000  4c000: 00000000  
< 4d000: 00000000  4e000: 00000000  4f000: 00000000  50000: 00000000  
< 51000: 00000000  52000: 00000000  53000: 00000000  54000: 00000000  
< 55000: 00000000  56000: 00000000  57000: 00000000  58000: 9e0724c0  
< 59000: 00000000  5a000: 00000000  5b000: 00000000  5c000: 00000000  
< 5d000: 00000000  5e000: 00000000  5f000: 00000000  60000: 00000000  
< 61000: 00000000  62000: 00000000  63000: 00000000  64000: 03bf159b  
< 65000: dc1d3011  66000: f5edc5f1  67000: 6ee897d0  68000: 637597f5  
< 69000: 00000000  6a000: 00000000  6b000: 00000000  6c000: 00000000  
< 6d000: 00000000  6e000: 00000000  6f000: 00000000  70000: 2972a1fe  
< 71000: 8bc254fc  72000: ec3c695a  73000: cdec06d3  74000: 36906ac5  
< 75000: c993dd62  76000: aef525c6  77000: 9aaf5c1c  78000: 2ca6bbdb  
< 79000: 845a40ba  7a000: 7f031102  7b000: 0b35592d  7c000: 88d66c5c  
< 7d000: a6fa36ba  7e000: 0997e3bc  7f000: a81ce446  80000: 00000000  
< 81000: 00000000  82000: 00000000  83000: 00000000  84000: 00000000  
< 85000: 00000000  86000: 00000000  87000: 595b041b  88000: 00000000  
< 89000: 00000000  8a000: 00000000  8b000: 00000000  8c000: 00000000  
< 8d000: 00000000  8e000: 00000000  8f000: 00000000  90000: 128abf4e  
< 91000: 890c948f  92000: f31f6236  93000: 27005a2a  94000: 6e2cfe18  
< 95000: c5c152d4  96000: 00000000  97000: 00000000  98000: 2a90a338  
< 99000: 8dbdf809  9a000: 00000000  9b000: 00000000  9c000: 00000000  
< 9d000: 00000000  9e000: 00000000  9f000: db524b1d  
---
> 01000: 00000000  02000: 00000000  03000: 00000000  04000: 00000000  
> 05000: 00000000  06000: 00000000  07000: 00000000  08000: 00000000  
> 09000: 00000000  0a000: 00000000  0b000: deadbeef  0c000: deadbeef  
> 0d000: deadbeef  0e000: deadbeef  0f000: deadbeef  10000: deadbeef  
> 11000: deadbeef  12000: deadbeef  13000: deadbeef  14000: deadbeef  
> 15000: deadbeef  16000: deadbeef  17000: deadbeef  18000: deadbeef  
> 19000: deadbeef  1a000: deadbeef  1b000: deadbeef  1c000: deadbeef  
> 1d000: deadbeef  1e000: deadbeef  1f000: deadbeef  20000: deadbeef  
> 21000: deadbeef  22000: deadbeef  23000: deadbeef  24000: deadbeef  
> 25000: deadbeef  26000: deadbeef  27000: deadbeef  28000: deadbeef  
> 29000: deadbeef  2a000: deadbeef  2b000: deadbeef  2c000: deadbeef  
> 2d000: deadbeef  2e000: deadbeef  2f000: deadbeef  30000: deadbeef  
> 31000: deadbeef  32000: deadbeef  33000: deadbeef  34000: deadbeef  
> 35000: deadbeef  36000: deadbeef  37000: deadbeef  38000: deadbeef  
> 39000: deadbeef  3a000: deadbeef  3b000: deadbeef  3c000: deadbeef  
> 3d000: deadbeef  3e000: deadbeef  3f000: deadbeef  40000: deadbeef  
> 41000: deadbeef  42000: deadbeef  43000: deadbeef  44000: deadbeef  
> 45000: deadbeef  46000: deadbeef  47000: deadbeef  48000: deadbeef  
> 49000: deadbeef  4a000: deadbeef  4b000: deadbeef  4c000: deadbeef  
> 4d000: deadbeef  4e000: deadbeef  4f000: deadbeef  50000: deadbeef  
> 51000: deadbeef  52000: deadbeef  53000: deadbeef  54000: deadbeef  
> 55000: deadbeef  56000: deadbeef  57000: deadbeef  58000: deadbeef  
> 59000: deadbeef  5a000: deadbeef  5b000: deadbeef  5c000: deadbeef  
> 5d000: deadbeef  5e000: deadbeef  5f000: deadbeef  60000: deadbeef  
> 61000: deadbeef  62000: deadbeef  63000: deadbeef  64000: deadbeef  
> 65000: deadbeef  66000: deadbeef  67000: deadbeef  68000: deadbeef  
> 69000: deadbeef  6a000: deadbeef  6b000: deadbeef  6c000: deadbeef  
> 6d000: deadbeef  6e000: deadbeef  6f000: deadbeef  70000: deadbeef  
> 71000: deadbeef  72000: deadbeef  73000: deadbeef  74000: deadbeef  
> 75000: deadbeef  76000: deadbeef  77000: deadbeef  78000: deadbeef  
> 79000: deadbeef  7a000: deadbeef  7b000: deadbeef  7c000: deadbeef  
> 7d000: deadbeef  7e000: deadbeef  7f000: deadbeef  80000: deadbeef  
> 81000: deadbeef  82000: deadbeef  83000: deadbeef  84000: deadbeef  
> 85000: deadbeef  86000: deadbeef  87000: deadbeef  88000: deadbeef  
> 89000: deadbeef  8a000: deadbeef  8b000: deadbeef  8c000: deadbeef  
> 8d000: deadbeef  8e000: deadbeef  8f000: deadbeef  90000: deadbeef  
> 91000: deadbeef  92000: deadbeef  93000: deadbeef  94000: deadbeef  
> 95000: deadbeef  96000: deadbeef  97000: deadbeef  98000: deadbeef  
> 99000: deadbeef  9a000: deadbeef  9b000: deadbeef  9c000: deadbeef  
> 9d000: deadbeef  9e000: deadbeef  9f000: 00000000  
# 

So that's why I tried to configure an ELF vmcore with no filtering,
to determine whether the memory would be passed through.

On the other hand, the first few pages are of interest:

# diff two dump
1,40c1,40
< 01000: 69a4c39d  02000: 00000000  03000: 00000000  04000: 00000000  
< 05000: 00000000  06000: 047fdefc  07000: 3132f33a  08000: e070817b  
< 09000: 00000000  0a000: fc681471  0b000: 7fdda29f  0c000: fffffd00
...
> 01000: 00000000  02000: 00000000  03000: 00000000  04000: 00000000  
> 05000: 00000000  06000: 00000000  07000: 00000000  08000: 00000000  
> 09000: 00000000  0a000: 00000000  0b000: deadbeef  0c000: deadbeef

If the vmcore were being created "correctly", why doesn't the dumpfile
have the same data at 01000, 06000, 07000, 08000 and 0a000?  And for
that matter why aren't the other 00000000 pages being filtered out,
given that DUMP_EXCLUDE_ZERO is being used?

In any case, I'm happy that this configuration works with the
new patch:

  (1) RHEL5 i386
 
But I'm still not convinced that these configurations work:

  (1) RHEL5 x86_64  (even with the new patch)
  (2) RHEL6 i386 

Note that I haven't tested RHEL6 x86_64.

Comment 74 errata-xmlrpc 2012-02-21 03:17:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0152.html


Note You need to log in before you can comment on or make changes to this bug.