Created attachment 479340 [details] the panic message Description of problem: kexec kernel crashes during its boot. It considers what was marked as reserved range during the normal boot as "usable" and this apparently crashed the kernel. The customer hit this issue with 5.3, but further investigation by the vendor shows this symptom is not resolved in RHEL 5.6. The hardware is BladeSymphony 320A5, which is certified with kbase exception with 3rd party devices. Version-Release number of selected component (if applicable): - OS: RHEL5.4.6(x86_64) 2.6.18-164.15.1.el5 (was present with 5.6 as well). - Related package: kexec-tools-1.102pre-77.el5.3 How reproducible: Always Steps to Reproduce: 1. Setup kdump 2. Run "service kdump start" 3. Run "echo c > /proc/sysrq-trigger" Actual results: kexec kernel crashes. Expected results: Should succeed in capturing vmcore. Additional info:
That range gets added unilaterally because some bios-es erroneously report it as reserved, but still required it to be usable for booting to work. I recommend we add a command line switch to enable/disable unilateral adding of this range. Amerigo, can you implement that?
(In reply to comment #2) > That range gets added unilaterally because some bios-es erroneously report it > as reserved, but still required it to be usable for booting to work. I > recommend we add a command line switch to enable/disable unilateral adding of > this range. Amerigo, can you implement that? If Bios's erroneously report it as reserved and if kernel requires first 640KB to boot, then first kernel boot will also fail? Or kernel has a mechanism to determine that BIOS is wrong and override it selectively? If kernel has a mechanism then we can use that in kexec-tools also. I had added 640K block because different BIOSes were reporting first 640K or 1MB in different format. To me it makes sense to trust the BIOS information and use that memory map. If BIOS says that first 640KB is reserved, so be it. A command line switch to change this behavior will be more useful when BIOS is giving us wrong information and we want to override it. But that will lead us back to the question of how first kernel itself manges to boot if BIOS is buggy on those machines. Making a command line will make the whole operation more manual and somebody will have to do that job of parsing the memory map anyway and pass the command line option accordingly. So why not let kexec-tools do that. And we are trusting BIOS for rest of the meory map and pass as it is to second kernel anyway. So why not trust BIOS for first 640KB.
Vivek, I don't think we have any way to know if BIOS is wrong, kernel is responsible for this, not the userspace. To me, it is totally the kernel's job to provide a correct memory map for kexec-tools to use, otherwise it should be a kernel bug, so I am fine with the patch in comment #1. I am wondering why this problem isn't exposed before, seems the current code works fine for most cases. Since you wrote the original code, you should know better than me here. :)
(In reply to comment #9) > Vivek, I don't think we have any way to know if BIOS is wrong, kernel is > responsible for this, not the userspace. I think in general we should be able to trust /proc/iomem. In the past there was a discussion that we should export BIOS provided memory map raw to user space and then /proc/iomem is the BIOS map + kernel modifications. I think BIOS raw memory map moved to debugfs and some firmware entries there. I would not remember though. > > To me, it is totally the kernel's job to provide a correct memory map for > kexec-tools to use, otherwise it should be a kernel bug, so I am fine with the > patch in comment #1. I do not think patch in comment #1 will work as it is for all the situations. - We required first 640K for kernel to boot (atleast in the past). So until and unless we have a information that this is no more required, we should not get rid of it by default. IIUC, above patch will not provide first 640K of memory to second kernel on all the machines. - So we need to modify the patch in such a way so that all the non-reserved memory in first 640K in /proc/iomem is given to second kernel for use. And in the process we need to make sure we have taken care of right vmcore ELF header generation and have taken care of copying right backup area in purgatory. > > I am wondering why this problem isn't exposed before, seems the current code > works fine for most cases. Since you wrote the original code, you should know > better than me here. :) I think because this is the first machine we have encountered where accessing first 640KB is a problem and it is supposed to be reserved area. On may laptop /proc/iomem looks as follows. 00000000-00000fff : reserved 00001000-0009efff : System RAM 0009f000-0009ffff : reserved So Most of the first 640KB is usable except some memory at the beginning and some at the end. Having said that, I think fixing all this in generic manner will be little more changes which should first get soaked upstream and then pulled into rhel5. So I would say that lets put generic fix in kexec-tools upstream and if it turns out to be big and sounds risky then for rhel5, you can push a command line kind of shortcut where first 640K is bypassed if user passes some option. So this command line fix is only can be a workaround for rhel5 and not a generic fix for upstream.
Created attachment 481527 [details] Patch against latest upstream
Created attachment 481566 [details] Patch against latest RHEL5 kexec/arch/i386/crashdump-x86.c | 81 ++++++++++++++++++++++--------- kexec/arch/i386/include/arch/options.h | 4 + kexec/arch/i386/kexec-x86.h | 1 kexec/arch/x86_64/crashdump-x86_64.c | 68 ++++++++++++++++++++++---- kexec/arch/x86_64/include/arch/options.h | 5 + kexec/arch/x86_64/kexec-x86_64.c | 14 ++++- kexec/kexec.h | 2 purgatory/arch/i386/crashdump_backup.c | 17 +++++- 8 files changed, 157 insertions(+), 35 deletions(-)
> RHEL6 doesn't have this bug on the same machine. I'm not sure I agree. If the PMD page used for translating vmalloc addresses is not one of the suspect physical page(s), then crash wouldn't complain during initialization. There would just be a "corruption" lurking, and depending upon how the page(s) get used, you might never see a problem. Again, it was by luck that the crash utility happened to bump into that page, because vmalloc address translations happened to use physical address 12000. If you do a "vtop" on a RHEL6 vmalloc address in a RHEL6 dumpfile, what does it show?
Amerigo, It might be a good idea to introduce some debug capability in kexec so that one can easily print out the elf headers generated for second kernel. I am specifically interested in knowing that we setup backup elf header properly to point to right backup region in reserved area. Also we need to debug vmcore to make sure it is parsing headers properly and getting contents from backup area properly. May be run crash on live system and on vmcore and compare the contents of some page in backup area.
(In reply to comment #46) > Amerigo, > > It might be a good idea to introduce some debug capability in kexec so that > one can easily print out the elf headers generated for second kernel. I am > specifically interested in knowing that we setup backup elf header properly > to point to right backup region in reserved area. > > Also we need to debug vmcore to make sure it is parsing headers properly and > getting contents from backup area properly. May be run crash on live system and > on vmcore and compare the contents of some page in backup area. I just reserved an i386 RHEL6 machine, and was planning to do just that. I'll write a little crash command to copy specific parts of memory to a file just before crashing the system, and compare them to what's gets copied to the dumpfile. Note that I *did* do that with physical page 12000 on a non-PAE RHEL5 i386 machine, and I did see a slight change in the page contents. It wasn't perceived as a problem however, because that page is not used as a PMD page in a non-PAE kernel. And that is why I don't believe that it's "not a bug" on RHEL6. However, the problem is that it is possible that the pages may get used by the live system immediately prior to the forced crash (say, during during the live-copy-file-creation activity), because it is not clear what the page(s) were being used for. But it would still be interesting to see what happens. What memory range should I compare? Is this the only range that is of interest: 00010000-0009afff : System RAM
Dave, Yes we are interested only in first 640K. So on your system 00010000-0009afff will be of interest. I understand that contents of this range may change. Still if there are some pages which match the contents then we know that we are not completely off.
(In reply to comment #48) > Dave, > > Yes we are interested only in first 640K. So on your system 00010000-0009afff > will be of interest. > > I understand that contents of this range may change. Still if there are some > pages which match the contents then we know that we are not completely off. Ok, I've initially just written a "test" command that calculates and displays a checksum of the contents of each page in that range. Doing it twice in a row on a live system shows that none of the pages changed, even if I stop and start the crash utility. I just created a crash input file that does this: test > one test > two !echo c > /proc/sysrq-trigger I'll compare "one" against "two", and then "two" against the contents of the dumpfile. Should be interesting...
Running RHEL5 PAE kernel, I did the test above, where each page in the range was read and a checksum of the page's contents calculated. I did the test twice in a row, and then immediately crashed the system. And then I did the same test on the subsequent vmcore. The contents of the pages were identical when reading them twice on the live system: # diff one two # But some of the pages were different in the dumpfile: # diff two dump 5,6c5,6 < 11000: 34a14b53 12000: 3f3e4330 13000: 4dd2159c 14000: 7c680459 < 15000: d2dd22fe 16000: 4d834b52 17000: 6b0304a1 18000: 204e4649 --- > 11000: 00000000 12000: 3e67e6bc 13000: edb20429 14000: fc6945bc > 15000: 00000000 16000: 4d834b52 17000: 6b0304a1 18000: 204e4649 39c39 < 99000: 8421966b 9a000: 00000000 9b000: 00000000 9c000: 00000000 --- > 99000: 8421966b 9a000: 00000000 9b000: 00000000 9c000: 88128a6f # So these pages were different: 11000 12000 13000 14000 15000 and 9c000 Since there was no difference between "one" and "two" files, there is no obvious explanation for them changing in the dumpfile.
I tested the RHEL5 non-PAE kernel the same way. In that kernel, the pages did not change on the live system on two subsequent reads. And while physical page 12000 is not crucial to the crash utility (not a PMD page) in that kernel, several pages did change in the dump: # diff one two # diff two dump 5c5 < 11000: 34a15b53 12000: 4a3a19f5 13000: 4b3a596e 14000: d157ae0c --- > 11000: 00000000 12000: 6a1b29e5 13000: 00000000 14000: d157ae0c 39c39 < 99000: 8421966b 9a000: 00000000 9b000: 00000000 9c000: 00000000 --- > 99000: 8421966b 9a000: 00000000 9b000: 00000000 9c000: 88128a6f # changed pages: 11000 12000 13000 9c000
Because some of the pages are matching checksum, so may be we are not setting the backup region pointer properly in elf header and may be vmcore is reading pages from backup region from original memory (now belongs to second kernel), and may be second kernel has modiefied some of the pages. Amerigo, few printk in vmcore and kexec-tools will be handy to figure this out.
With respect to RHEL6, the results are bizarre... The machine has kexec-tools-2.0.0-186.el6.i686, the 2.6.32-125.el6.i686 (non-PAE) kernel, and /etc/kdump.conf configured with: core_collector makedumpfile -c so that all pages would get dumped, and has this in /proc/iomem: 00001000-0009ffff : System RAM While running live, the memory checksums were identical, and there is data in most of the pages: # diff one two # head one 01000: 59e700eb 02000: 00000000 03000: 00000000 04000: 00000000 05000: 00000000 06000: 047fdefc 07000: 3132f33a 08000: e070817b 09000: 00000000 0a000: fc681471 0b000: ffe1a2bb 0c000: fffffd00 0d000: 00000000 0e000: 00000000 0f000: 00000000 10000: 00000000 11000: dc994904 12000: 9e375fc4 13000: aebb8a81 14000: d38cc304 15000: 35cebdc8 16000: 4c80ed74 17000: e61c0a14 18000: 8b84872c 19000: 277165a2 1a000: 918d9732 1b000: 856b743a 1c000: 1227e188 1d000: e07901e8 1e000: 4000065c 1f000: afe21057 20000: f24329e0 21000: 845a40ba 22000: 7f031102 23000: ac127978 24000: 88d66c5c 25000: a6fa36ba 26000: cbe13841 27000: a0212f95 28000: 00000000 # head two 01000: 59e700eb 02000: 00000000 03000: 00000000 04000: 00000000 05000: 00000000 06000: 047fdefc 07000: 3132f33a 08000: e070817b 09000: 00000000 0a000: fc681471 0b000: ffe1a2bb 0c000: fffffd00 0d000: 00000000 0e000: 00000000 0f000: 00000000 10000: 00000000 11000: dc994904 12000: 9e375fc4 13000: aebb8a81 14000: d38cc304 15000: 35cebdc8 16000: 4c80ed74 17000: e61c0a14 18000: 8b84872c 19000: 277165a2 1a000: 918d9732 1b000: 856b743a 1c000: 1227e188 1d000: e07901e8 1e000: 4000065c 1f000: afe21057 20000: f24329e0 21000: 845a40ba 22000: 7f031102 23000: ac127978 24000: 88d66c5c 25000: a6fa36ba 26000: cbe13841 27000: a0212f95 28000: 00000000 # But after crashing the system, the whole region is zero-filled: crash> test 01000: 00000000 02000: 00000000 03000: 00000000 04000: 00000000 05000: 00000000 06000: 00000000 07000: 00000000 08000: 00000000 09000: 00000000 0a000: 00000000 0b000: 00000000 0c000: 00000000 0d000: 00000000 0e000: 00000000 0f000: 00000000 10000: 00000000 11000: 00000000 12000: 00000000 13000: 00000000 14000: 00000000 15000: 00000000 16000: 00000000 17000: 00000000 18000: 00000000 19000: 00000000 1a000: 00000000 1b000: 00000000 1c000: 00000000 1d000: 00000000 1e000: 00000000 1f000: 00000000 20000: 00000000 21000: 00000000 22000: 00000000 23000: 00000000 24000: 00000000 25000: 00000000 26000: 00000000 27000: 00000000 28000: 00000000 29000: 00000000 2a000: 00000000 2b000: 00000000 2c000: 00000000 2d000: 00000000 2e000: 00000000 2f000: 00000000 30000: 00000000 31000: 00000000 32000: 00000000 33000: 00000000 34000: 00000000 35000: 00000000 36000: 00000000 37000: 00000000 38000: 00000000 39000: 00000000 3a000: 00000000 3b000: 00000000 3c000: 00000000 3d000: 00000000 3e000: 00000000 3f000: 00000000 40000: 00000000 41000: 00000000 42000: 00000000 43000: 00000000 44000: 00000000 45000: 00000000 46000: 00000000 47000: 00000000 48000: 00000000 49000: 00000000 4a000: 00000000 4b000: 00000000 4c000: 00000000 4d000: 00000000 4e000: 00000000 4f000: 00000000 50000: 00000000 51000: 00000000 52000: 00000000 53000: 00000000 54000: 00000000 55000: 00000000 56000: 00000000 57000: 00000000 58000: 00000000 59000: 00000000 5a000: 00000000 5b000: 00000000 5c000: 00000000 5d000: 00000000 5e000: 00000000 5f000: 00000000 60000: 00000000 61000: 00000000 62000: 00000000 63000: 00000000 64000: 00000000 65000: 00000000 66000: 00000000 67000: 00000000 68000: 00000000 69000: 00000000 6a000: 00000000 6b000: 00000000 6c000: 00000000 6d000: 00000000 6e000: 00000000 6f000: 00000000 70000: 00000000 71000: 00000000 72000: 00000000 73000: 00000000 74000: 00000000 75000: 00000000 76000: 00000000 77000: 00000000 78000: 00000000 79000: 00000000 7a000: 00000000 7b000: 00000000 7c000: 00000000 7d000: 00000000 7e000: 00000000 7f000: 00000000 80000: 00000000 81000: 00000000 82000: 00000000 83000: 00000000 84000: 00000000 85000: 00000000 86000: 00000000 87000: 00000000 88000: 00000000 89000: 00000000 8a000: 00000000 8b000: 00000000 8c000: 00000000 8d000: 00000000 8e000: 00000000 8f000: 00000000 90000: 00000000 91000: 00000000 92000: 00000000 93000: 00000000 94000: 00000000 95000: 00000000 96000: 00000000 97000: 00000000 98000: 00000000 99000: 00000000 9a000: 00000000 9b000: 00000000 9c000: 00000000 9d000: 00000000 9e000: 00000000 9f000: 00000000 crash> How can that be?
I suspect that for some reason no copying has taken place in backup region on i386 hence all zeros. That's the bug I had pointed to in previous comments and I had fixed in your kexec-tools dave. If you use that version of kexec-tools, it might help a bit.
(In reply to comment #54) > I suspect that for some reason no copying has taken place in backup region on > i386 hence all zeros. That's the bug I had pointed to in previous comments and > I had fixed in your kexec-tools dave. If you use that version of kexec-tools, > it might help a bit. Kind of strange that there would be a difference in behaviour between the RHEL5 and RHEL6 versions of kexec-tools, no?
Yes it is strange. I think we are looking at more than 1 bug and different bugs are hitting rhel5 and rhel6.
It gets stranger... I just ran the same test on a RHEL5 x86_64 machine: kernel-2.6.18-238.el5 kexec-tools-1.102pre-131.el5 with these two entries at the top of /proc/iomem: 00010000-0009d3ff : System RAM 0009d400-0009ffff : reserved On the live system, there were two pages that changed while crash was doing the reads (14000 and 15000): # diff one two 5,6c5,6 < 11000: 1f4f8580 12000: e003c600 13000: eed07500 14000: 8d923aa8 < 15000: 17604046 16000: 0fa1c70e 17000: 4003d600 18000: 4003d600 --- > 11000: 1f4f8580 12000: e003c600 13000: eed07500 14000: 8d9239fe > 15000: 1736911c 16000: 0fa1c70e 17000: 4003d600 18000: 4003d600 # But comparing them with the subsequent dumpfile showed huge differences. Note that the "deadbeef" entries in the "two" file were because those pages are not RAM, and therefore were not readable on the live system (see the /proc/iomem above). So I'm not entirely sure why they even show up and are readable in the vmcore? But more questionable are all of the differences with the dozens of other "modified" pages: # diff two dump 1,9c1,9 < 01000: deadbeef 02000: deadbeef 03000: deadbeef 04000: deadbeef < 05000: deadbeef 06000: deadbeef 07000: deadbeef 08000: deadbeef < 09000: deadbeef 0a000: deadbeef 0b000: deadbeef 0c000: deadbeef < 0d000: deadbeef 0e000: deadbeef 0f000: deadbeef 10000: 00036129 < 11000: 1f4f8580 12000: e003c600 13000: eed07500 14000: 8d9239fe < 15000: 1736911c 16000: 0fa1c70e 17000: 4003d600 18000: 4003d600 < 19000: 00f3f2fd 1a000: ffffff20 1b000: fffffe00 1c000: ffffff00 < 1d000: 00000000 1e000: 00000000 1f000: 00000000 20000: fffbfff3 < 21000: 00000000 22000: 00000000 23000: 00000000 24000: 3965e7be --- > 01000: 1f4f8580 02000: e003c600 03000: eed07500 04000: 8d923a5a > 05000: 17547805 06000: 0fa1c70e 07000: 4003d600 08000: 4003d600 > 09000: 00f3f2fd 0a000: ffffff20 0b000: fffffe00 0c000: ffffff00 > 0d000: 00000000 0e000: 00000000 0f000: 00000000 10000: fffbfff3 > 11000: 00000000 12000: 00000000 13000: 00000000 14000: 3965e7be > 15000: 00000000 16000: 00000000 17000: 00000000 18000: 00000000 > 19000: 00000000 1a000: 00000000 1b000: 00000000 1c000: a4a05682 > 1d000: 00000000 1e000: 00000000 1f000: 00000000 20000: 00000000 > 21000: 00000000 22000: 00000000 23000: 00000000 24000: 00000000 11c11 < 29000: 00000000 2a000: 00000000 2b000: 00000000 2c000: a4a05682 --- > 29000: 00000000 2a000: 00000000 2b000: 00000000 2c000: 00000000 17c17 < 41000: 00000000 42000: 00000000 43000: 00000000 44000: 00000000 --- > 41000: 00000000 42000: 00000000 43000: 96e10a82 44000: 00000000 21,22c21,22 < 51000: 00000000 52000: 00000000 53000: 96e10a82 54000: 00000000 < 55000: 00000000 56000: 00000000 57000: 00000000 58000: 00000000 --- > 51000: 00000000 52000: 00000000 53000: 00000000 54000: 00000000 > 55000: 781f3f6c 56000: a4e103b5 57000: 324bf759 58000: 746fcb85 24,40c24,40 < 5d000: 00000000 5e000: 00000000 5f000: 00000000 60000: 00000000 < 61000: 00000000 62000: 00000000 63000: 00000000 64000: 00000000 < 65000: 781f3f6c 66000: a4e103b5 67000: 324bf759 68000: 746fcb85 < 69000: 00000000 6a000: 00000000 6b000: 00000000 6c000: 00000000 < 6d000: 00000000 6e000: 00000000 6f000: 00000000 70000: 61827eee < 71000: 2bea141a 72000: 73fced95 73000: 37a88a30 74000: 2fcf1ab7 < 75000: 4ff9ab91 76000: 78a806d5 77000: 04ec04f8 78000: 881a47c4 < 79000: 1fc4ddc2 7a000: 857bc1ad 7b000: 562203d5 7c000: 97d0ab30 < 7d000: 419df3dd 7e000: 24d3e9bd 7f000: e1c36deb 80000: 00000000 < 81000: 00000000 82000: 00000000 83000: 00000000 84000: 00000000 < 85000: 00000000 86000: 00000000 87000: 00000000 88000: 00000000 < 89000: 00000000 8a000: 00000000 8b000: 9d31dbd1 8c000: 00000000 < 8d000: 00000000 8e000: 00000000 8f000: 00000000 90000: 9881149e < 91000: 6cb15c7e 92000: 00000000 93000: 00000000 94000: 00000000 < 95000: 00000000 96000: 00000000 97000: 00000000 98000: a3a20fea < 99000: 3fa13608 9a000: 00000000 9b000: 00000000 9c000: 00000000 < 9d000: deadbeef 9e000: deadbeef 9f000: deadbeef --- > 5d000: 00000000 5e000: 00000000 5f000: 00000000 60000: 61827eee > 61000: 2bea141a 62000: 73fced95 63000: 37a88a30 64000: 2fcf1ab7 > 65000: 4ff9ab91 66000: 78a806d5 67000: 04ec04f8 68000: 881a47c4 > 69000: 1fc4ddc2 6a000: 857bc1ad 6b000: 562203d5 6c000: 97d0ab30 > 6d000: 419df3dd 6e000: 24d3e9bd 6f000: e1c36deb 70000: 00000000 > 71000: 00000000 72000: 00000000 73000: 00000000 74000: 00000000 > 75000: 00000000 76000: 00000000 77000: 00000000 78000: 00000000 > 79000: 00000000 7a000: 00000000 7b000: 9d31dbd1 7c000: 00000000 > 7d000: 00000000 7e000: 00000000 7f000: 00000000 80000: 9881149e > 81000: 6cb15c7e 82000: 00000000 83000: 00000000 84000: 00000000 > 85000: 00000000 86000: 00000000 87000: 00000000 88000: a3a20fea > 89000: 3fa13608 8a000: 00000000 8b000: 00000000 8c000: 00000000 > 8d000: 00000000 8e000: 64aa1d1a 8f000: e7c10a27 90000: 24328a51 > 91000: 8367eb29 92000: 83710699 93000: 87e26201 94000: 87eb9d61 > 95000: 838c9cd9 96000: 8395944a 97000: 839eafb9 98000: 83a7cb29 > 99000: 83b0e699 9a000: 8822b201 9b000: 882bed61 9c000: 83cc7cd9 > 9d000: 83d5744a 9e000: 83de8fb9 9f000: 83e7ab29 # In any case, I'm going to step back and let Amerigo look into this. FYI, the change to the crash utility's test.c file to make the "test" command show the above is this: diff -r1.4 test.c 44a45,68 > { > int i; > ulonglong paddr; > ulong *p, *start, *end; > uint32_t chksum; > char buf[4096]; > > for (i = 0, paddr = 0x1000; paddr < (0x9ffff+1); paddr += 4096) { > if (!readmem(paddr, PHYSADDR, buf, PAGESIZE(), > "check page", QUIET|RETURN_ON_ERROR)) { > chksum = 0xdeadbeef; > } else { > start = (ulong *)&buf[0]; > end = (ulong *)&buf[4096]; > chksum = 0; > > for (p = start; p < end; p++) > chksum += *p; > } > fprintf(fp, "%05llx: %08lx ", paddr, chksum); > if ((++i % 4) == 0) > fprintf(fp, "\n"); > } > }
> In any case, I'm going to step back and let Amerigo look into this. But before I do, for my own sanity, I reverted the kexec-tools back to kexec-tools-1.102pre-126.el5, and ran the test again on the same RHEL5 x86_64 machine. Similar to the previous run, pages 14000 and 15000 changed during the live system reads. So those two pages are quite active, and continually undergoing modification: # diff one two 5,6c5,6 < 11000: 1f507580 12000: e003c600 13000: ed64d500 14000: 8d91fde5 < 15000: 07a537f1 16000: 0fa1c70e 17000: 4003d600 18000: 4003d600 --- > 11000: 1f507580 12000: e003c600 13000: ed64d500 14000: 8d91fda7 > 15000: 0ced3a56 16000: 0fa1c70e 17000: 4003d600 18000: 4003d600 # But the difference between the live system and the dumpfile -- ignoring the "deadbeef" entries -- the only differences were with those same two pages at 14000 and 15000: # diff two dump 1,6c1,6 < 01000: deadbeef 02000: deadbeef 03000: deadbeef 04000: deadbeef < 05000: deadbeef 06000: deadbeef 07000: deadbeef 08000: deadbeef < 09000: deadbeef 0a000: deadbeef 0b000: deadbeef 0c000: deadbeef < 0d000: deadbeef 0e000: deadbeef 0f000: deadbeef 10000: 00036129 < 11000: 1f507580 12000: e003c600 13000: ed64d500 14000: 8d91fda7 < 15000: 0ced3a56 16000: 0fa1c70e 17000: 4003d600 18000: 4003d600 --- > 01000: 74780c21 02000: a4d62ea7 03000: 55874745 04000: 3e96a858 > 05000: fe64aad0 06000: 9ede145c 07000: 004050c6 08000: a9419b6c > 09000: bb432d72 0a000: 0350aedc 0b000: 9d4fe8aa 0c000: 47c84b9e > 0d000: 19f5acf8 0e000: 01a7faf4 0f000: c6048f06 10000: 00036129 > 11000: 1f507580 12000: e003c600 13000: ed64d500 14000: 8d91fddf > 15000: 14cb5590 16000: 0fa1c70e 17000: 4003d600 18000: 4003d600 40c40 < 9d000: deadbeef 9e000: deadbeef 9f000: deadbeef --- > 9d000: 9c3e9946 9e000: c45712ee 9f000: 8372a59e # This patch is clearly a disaster...
> Because some of the pages are matching checksum, so may be we are not setting > the backup region pointer properly in elf header and may be vmcore is reading > pages from backup region from original memory (now belongs to second kernel), > and may be second kernel has modiefied some of the pages. > > Amerigo, few printk in vmcore and kexec-tools will be handy to figure this > out. I retested the x86_64 version, and like before, the 0x1000-0x9ffff region in the dumpfile is drastically different than the pre-crash memory, where well over 50 pages are different. I can understand that one or two pages may be modified just prior to the crash occurring, but that is ridiculous. To be completely confident of a fix, I would suggest doing something like this in the a test kernel: sys_kexec_load() allocate a 640K region (it would have to be an order-8 alloc_pages() request), and keep a symbolic pointer to the region. crash_kexec() just prior to calling machine_kexec(), copy the region memory into the allocated buffer And then compare the two regions in the subsequent dumpfile, which should be identical. I'll try building/testing a kernel patch as a proof of concept.
Created attachment 493217 [details] Proposed patch to fix the issue on PAE
Amerigo, Can you explain a bit what was the problem and how did it get solved by replacing these local variables with so many elf32, elf64 and elf variables. vivek
(In reply to comment #61) Yeah, because in kexec/crashdump-elf.c we do: if (mstart == elf_info->backup_src_start && mend == elf_info->backup_src_end) phdr->p_offset = info->backup_start; so if elf_info->backup_src_start is not correctly initialized, the ->p_offset will not be translated to the our backup area, that is, ->backup_start.
Ok, so elf_info was not being filled in properly. Why are we storing this info at two places? kexec_info and crash_elf_info. Can't we store it in kexec_info and use everywhere?
> Proposed patch to fix the issue on PAE I have rebuilt kexec-tools with this patch, and tested it on an i386 PAE kernel, and it works fine. That's good... However, I did the same thing with an x86_64 kernel, applied the patch to its kexec-tools, and I still see unexplainable differences between the live system and dumpfile. Note that this x86_64 machine shows: # head /proc/iomem 00010000-0009d3ff : System RAM 0009d400-0009ffff : reserved ... so the first 16 pages are not even RAM. Therefore, on the live system, I cannot read them, so "deadbeef" has been filled in as a marker in the "two.fix" file. But I can read them from the dumpfile (?) -- which should raise a red flag -- but not only that, there are huge differences in the other pages in the 640K region: # diff two.fix dump.fix 1,9c1,9 < 01000: deadbeef 02000: deadbeef 03000: deadbeef 04000: deadbeef < 05000: deadbeef 06000: deadbeef 07000: deadbeef 08000: deadbeef < 09000: deadbeef 0a000: deadbeef 0b000: deadbeef 0c000: deadbeef < 0d000: deadbeef 0e000: deadbeef 0f000: deadbeef 10000: 00036129 < 11000: 5f6a2580 12000: e003c600 13000: ed6bd500 14000: 8d91a577 < 15000: 114b5956 16000: 0fa1c70e 17000: 4003d600 18000: 4003d600 < 19000: 01160bdd 1a000: ffffff20 1b000: fffffe00 1c000: ffffff00 < 1d000: 00000000 1e000: 00000000 1f000: 00000000 20000: fffbfff3 < 21000: 00000000 22000: 00000000 23000: 00000000 24000: 80fb2fbe --- > 01000: 5f6a2580 02000: e003c600 03000: ed6bd500 04000: 8d91a582 > 05000: 147acb8e 06000: 0fa1c70e 07000: 4003d600 08000: 4003d600 > 09000: 01160bdd 0a000: ffffff20 0b000: fffffe00 0c000: ffffff00 > 0d000: 00000000 0e000: 00000000 0f000: 00000000 10000: fffbfff3 > 11000: 00000000 12000: 00000000 13000: 00000000 14000: 80fb2fbe > 15000: 00000000 16000: 00000000 17000: 00000000 18000: 00000000 > 19000: 00000000 1a000: 00000000 1b000: 00000000 1c000: a4a05682 > 1d000: 00000000 1e000: 00000000 1f000: 00000000 20000: 00000000 > 21000: 00000000 22000: 00000000 23000: 00000000 24000: 00000000 11c11 < 29000: 00000000 2a000: 00000000 2b000: 00000000 2c000: a4a05682 --- > 29000: 00000000 2a000: 00000000 2b000: 00000000 2c000: 00000000 17c17 < 41000: 00000000 42000: 00000000 43000: 00000000 44000: 00000000 --- > 41000: 00000000 42000: 00000000 43000: 96e10a82 44000: 00000000 21,22c21,22 < 51000: 00000000 52000: 00000000 53000: 96e10a82 54000: 00000000 < 55000: 00000000 56000: 00000000 57000: 00000000 58000: 00000000 --- > 51000: 00000000 52000: 00000000 53000: 00000000 54000: 00000000 > 55000: 781f3f6c 56000: a4e103b5 57000: 324bf759 58000: 7470b818 24,40c24,40 < 5d000: 00000000 5e000: 00000000 5f000: 00000000 60000: 00000000 < 61000: 00000000 62000: 00000000 63000: 00000000 64000: 00000000 < 65000: 781f3f6c 66000: a4e103b5 67000: 324bf759 68000: 7470b818 < 69000: 00000000 6a000: 00000000 6b000: 00000000 6c000: 00000000 < 6d000: 00000000 6e000: 00000000 6f000: 00000000 70000: 61827eee < 71000: 2bea141a 72000: 73fced95 73000: 37a88a30 74000: 2fcf1ab7 < 75000: 4ff9ab91 76000: 78a806d5 77000: f4a08f3c 78000: 881a47c4 < 79000: 1fc4ddc2 7a000: 857bc1ad 7b000: 562203d5 7c000: 97d0ab30 < 7d000: 419df3dd 7e000: 24d3e9bd 7f000: e1c36deb 80000: 00000000 < 81000: 00000000 82000: 00000000 83000: 00000000 84000: 00000000 < 85000: 00000000 86000: 00000000 87000: 00000000 88000: 00000000 < 89000: 00000000 8a000: 00000000 8b000: 9d31dbd1 8c000: 00000000 < 8d000: 00000000 8e000: 00000000 8f000: 00000000 90000: 9881149e < 91000: 6cb15c7e 92000: 00000000 93000: 00000000 94000: 00000000 < 95000: 00000000 96000: 00000000 97000: 00000000 98000: 287e0fea < 99000: 3fa13608 9a000: 00000000 9b000: 00000000 9c000: 00000000 < 9d000: deadbeef 9e000: deadbeef 9f000: deadbeef --- > 5d000: 00000000 5e000: 00000000 5f000: 00000000 60000: 61827eee > 61000: 2bea141a 62000: 73fced95 63000: 37a88a30 64000: 2fcf1ab7 > 65000: 4ff9ab91 66000: 78a806d5 67000: f4a08f3c 68000: 881a47c4 > 69000: 1fc4ddc2 6a000: 857bc1ad 6b000: 562203d5 6c000: 97d0ab30 > 6d000: 419df3dd 6e000: 24d3e9bd 6f000: e1c36deb 70000: 00000000 > 71000: 00000000 72000: 00000000 73000: 00000000 74000: 00000000 > 75000: 00000000 76000: 00000000 77000: 00000000 78000: 00000000 > 79000: 00000000 7a000: 00000000 7b000: 9d31dbd1 7c000: 00000000 > 7d000: 00000000 7e000: 00000000 7f000: 00000000 80000: 9881149e > 81000: 6cb15c7e 82000: 00000000 83000: 00000000 84000: 00000000 > 85000: 00000000 86000: 00000000 87000: 00000000 88000: 287e0fea > 89000: 3fa13608 8a000: 00000000 8b000: 00000000 8c000: 00000000 > 8d000: 00000000 8e000: 64c2bd1a 8f000: e7c10a27 90000: 24328a51 > 91000: 8367eb29 92000: 83710699 93000: 87e26201 94000: 87eb9d61 > 95000: 838c9cd9 96000: 8395944a 97000: 839eafb9 98000: 83a7cb29 > 99000: 83b0e699 9a000: 8822b201 9b000: 882bed61 9c000: 83cc7cd9 > 9d000: 83d5744a 9e000: 83de8fb9 9f000: 83e7ab29 # I'm going to retest that x86_64 machine with kexec-tools-1.102pre-126.el5 and see what the region looks like before-and-after.
> I'm going to retest that x86_64 machine with > kexec-tools-1.102pre-126.el5 and see what the > region looks like before-and-after. OK, with kexec-tools-1.102pre-126.el5, I see the following: # diff two dump 1,6c1,6 < 01000: deadbeef 02000: deadbeef 03000: deadbeef 04000: deadbeef < 05000: deadbeef 06000: deadbeef 07000: deadbeef 08000: deadbeef < 09000: deadbeef 0a000: deadbeef 0b000: deadbeef 0c000: deadbeef < 0d000: deadbeef 0e000: deadbeef 0f000: deadbeef 10000: 00036129 < 11000: 1f4f4580 12000: e003c600 13000: ed6bd500 14000: 8d916668 < 15000: 10c429f0 16000: 0fa1c70e 17000: 4003d600 18000: 4003d600 --- > 01000: b759da3b 02000: a4d62ea7 03000: 55874745 04000: 3e96a858 > 05000: fe64aad0 06000: 9ede145c 07000: 004050c6 08000: a9419b6c > 09000: bb432d72 0a000: 0350aedc 0b000: 9d4fe8aa 0c000: 47c84b9e > 0d000: 19f5acf8 0e000: 01a7faf4 0f000: c6048f06 10000: 00036129 > 11000: 1f4f4580 12000: e003c600 13000: ed6bd500 14000: 8d916666 > 15000: 12ec9bd2 16000: 0fa1c70e 17000: 4003d600 18000: 4003d600 40c40 < 9d000: deadbeef 9e000: deadbeef 9f000: deadbeef --- > 9d000: 9c3e9946 9e000: c45712ee 9f000: 8372a59e # So with the "old" kexec-tools, we still see the non-RAM (deadbeef) pages somehow being made available (with some kind of data) in the vmcore. So at least that issue is not regressing... And the other differences above are also when taking two subsequent readings on the live system, where the pages at 14000 and 15000 are constantly churning: # diff one two 6c6 < 15000: 10bdbcb6 16000: 0fa1c70e 17000: 4003d600 18000: 4003d600 --- > 15000: 10c429f0 16000: 0fa1c70e 17000: 4003d600 18000: 4003d600 # So AFAICT -- with 131.el5 plus the newest patch -- the memory in the x86_64 640K region is very different in the dumpfile than it was live.
As far as RHEL6 i386 is concerned, I don't know how it's possible to create an ELF vmcore with RHEL6? I've set this in /etc/kdump.conf: ext4 /dev/mapper/vg_dellpe295002-lv_root core_collector cp --sparse=always But it still creates a compressed and filtered "-c -d 31" dumpfile. Anyway, with kexec-tools-2.0.0-186.el6.i686 and this: # head /proc/iomem 00000000-00000fff : reserved 00001000-0009ffff : System RAM ... the 640K range does not change on a live system between two consecutive reads of the 640K range: # diff one two # But when I look at the dumpfile, I see the following differences, where in the live system there is data in some of the pages, but in the dumpfile they are either returning all zeroes, or are "deadbeef", which means that the page is marked as not available in the dumpfile: # diff two dump 1,40c1,40 < 01000: 69a4c39d 02000: 00000000 03000: 00000000 04000: 00000000 < 05000: 00000000 06000: 047fdefc 07000: 3132f33a 08000: e070817b < 09000: 00000000 0a000: fc681471 0b000: 7fdda29f 0c000: fffffd00 < 0d000: 00000000 0e000: 00000000 0f000: 00000000 10000: 00000000 < 11000: dc994904 12000: 9e375fc4 13000: aebb8a81 14000: d38cc304 < 15000: 35cebdc8 16000: 4c80ed74 17000: e61c0a14 18000: 8b84872c < 19000: 277165a2 1a000: 918d9732 1b000: 856b743a 1c000: 1227e188 < 1d000: e07901e8 1e000: 4000065c 1f000: afe21057 20000: f24329e0 < 21000: 845a40ba 22000: 7f031102 23000: ac127978 24000: 88d66c5c < 25000: a6fa36ba 26000: cbe13841 27000: a0212f95 28000: 00000000 < 29000: 00000000 2a000: 00000000 2b000: 00000000 2c000: 00000000 < 2d000: 00000000 2e000: 00000000 2f000: 00000000 30000: 00000000 < 31000: 1b4910ad 32000: 8b31f46d 33000: 00000000 34000: 00000000 < 35000: 00000000 36000: 00000000 37000: 00000000 38000: 00000000 < 39000: 00000000 3a000: 00000000 3b000: 00000000 3c000: 00000000 < 3d000: 00000000 3e000: 00000000 3f000: 00000000 40000: 00000000 < 41000: 00000000 42000: 00000000 43000: 00000000 44000: 00000000 < 45000: 00000000 46000: 00000000 47000: 00000000 48000: 00000000 < 49000: 00000000 4a000: 00000000 4b000: 00000000 4c000: 00000000 < 4d000: 00000000 4e000: 00000000 4f000: 00000000 50000: 00000000 < 51000: 00000000 52000: 00000000 53000: 00000000 54000: 00000000 < 55000: 00000000 56000: 00000000 57000: 00000000 58000: 9e0724c0 < 59000: 00000000 5a000: 00000000 5b000: 00000000 5c000: 00000000 < 5d000: 00000000 5e000: 00000000 5f000: 00000000 60000: 00000000 < 61000: 00000000 62000: 00000000 63000: 00000000 64000: 03bf159b < 65000: dc1d3011 66000: f5edc5f1 67000: 6ee897d0 68000: 637597f5 < 69000: 00000000 6a000: 00000000 6b000: 00000000 6c000: 00000000 < 6d000: 00000000 6e000: 00000000 6f000: 00000000 70000: 2972a1fe < 71000: 8bc254fc 72000: ec3c695a 73000: cdec06d3 74000: 36906ac5 < 75000: c993dd62 76000: aef525c6 77000: 9aaf5c1c 78000: 2ca6bbdb < 79000: 845a40ba 7a000: 7f031102 7b000: 0b35592d 7c000: 88d66c5c < 7d000: a6fa36ba 7e000: 0997e3bc 7f000: a81ce446 80000: 00000000 < 81000: 00000000 82000: 00000000 83000: 00000000 84000: 00000000 < 85000: 00000000 86000: 00000000 87000: 595b041b 88000: 00000000 < 89000: 00000000 8a000: 00000000 8b000: 00000000 8c000: 00000000 < 8d000: 00000000 8e000: 00000000 8f000: 00000000 90000: 128abf4e < 91000: 890c948f 92000: f31f6236 93000: 27005a2a 94000: 6e2cfe18 < 95000: c5c152d4 96000: 00000000 97000: 00000000 98000: 2a90a338 < 99000: 8dbdf809 9a000: 00000000 9b000: 00000000 9c000: 00000000 < 9d000: 00000000 9e000: 00000000 9f000: db524b1d --- > 01000: 00000000 02000: 00000000 03000: 00000000 04000: 00000000 > 05000: 00000000 06000: 00000000 07000: 00000000 08000: 00000000 > 09000: 00000000 0a000: 00000000 0b000: deadbeef 0c000: deadbeef > 0d000: deadbeef 0e000: deadbeef 0f000: deadbeef 10000: deadbeef > 11000: deadbeef 12000: deadbeef 13000: deadbeef 14000: deadbeef > 15000: deadbeef 16000: deadbeef 17000: deadbeef 18000: deadbeef > 19000: deadbeef 1a000: deadbeef 1b000: deadbeef 1c000: deadbeef > 1d000: deadbeef 1e000: deadbeef 1f000: deadbeef 20000: deadbeef > 21000: deadbeef 22000: deadbeef 23000: deadbeef 24000: deadbeef > 25000: deadbeef 26000: deadbeef 27000: deadbeef 28000: deadbeef > 29000: deadbeef 2a000: deadbeef 2b000: deadbeef 2c000: deadbeef > 2d000: deadbeef 2e000: deadbeef 2f000: deadbeef 30000: deadbeef > 31000: deadbeef 32000: deadbeef 33000: deadbeef 34000: deadbeef > 35000: deadbeef 36000: deadbeef 37000: deadbeef 38000: deadbeef > 39000: deadbeef 3a000: deadbeef 3b000: deadbeef 3c000: deadbeef > 3d000: deadbeef 3e000: deadbeef 3f000: deadbeef 40000: deadbeef > 41000: deadbeef 42000: deadbeef 43000: deadbeef 44000: deadbeef > 45000: deadbeef 46000: deadbeef 47000: deadbeef 48000: deadbeef > 49000: deadbeef 4a000: deadbeef 4b000: deadbeef 4c000: deadbeef > 4d000: deadbeef 4e000: deadbeef 4f000: deadbeef 50000: deadbeef > 51000: deadbeef 52000: deadbeef 53000: deadbeef 54000: deadbeef > 55000: deadbeef 56000: deadbeef 57000: deadbeef 58000: deadbeef > 59000: deadbeef 5a000: deadbeef 5b000: deadbeef 5c000: deadbeef > 5d000: deadbeef 5e000: deadbeef 5f000: deadbeef 60000: deadbeef > 61000: deadbeef 62000: deadbeef 63000: deadbeef 64000: deadbeef > 65000: deadbeef 66000: deadbeef 67000: deadbeef 68000: deadbeef > 69000: deadbeef 6a000: deadbeef 6b000: deadbeef 6c000: deadbeef > 6d000: deadbeef 6e000: deadbeef 6f000: deadbeef 70000: deadbeef > 71000: deadbeef 72000: deadbeef 73000: deadbeef 74000: deadbeef > 75000: deadbeef 76000: deadbeef 77000: deadbeef 78000: deadbeef > 79000: deadbeef 7a000: deadbeef 7b000: deadbeef 7c000: deadbeef > 7d000: deadbeef 7e000: deadbeef 7f000: deadbeef 80000: deadbeef > 81000: deadbeef 82000: deadbeef 83000: deadbeef 84000: deadbeef > 85000: deadbeef 86000: deadbeef 87000: deadbeef 88000: deadbeef > 89000: deadbeef 8a000: deadbeef 8b000: deadbeef 8c000: deadbeef > 8d000: deadbeef 8e000: deadbeef 8f000: deadbeef 90000: deadbeef > 91000: deadbeef 92000: deadbeef 93000: deadbeef 94000: deadbeef > 95000: deadbeef 96000: deadbeef 97000: deadbeef 98000: deadbeef > 99000: deadbeef 9a000: deadbeef 9b000: deadbeef 9c000: deadbeef > 9d000: deadbeef 9e000: deadbeef 9f000: 00000000 # So that's why I tried to configure an ELF vmcore with no filtering, to determine whether the memory would be passed through. On the other hand, the first few pages are of interest: # diff two dump 1,40c1,40 < 01000: 69a4c39d 02000: 00000000 03000: 00000000 04000: 00000000 < 05000: 00000000 06000: 047fdefc 07000: 3132f33a 08000: e070817b < 09000: 00000000 0a000: fc681471 0b000: 7fdda29f 0c000: fffffd00 ... > 01000: 00000000 02000: 00000000 03000: 00000000 04000: 00000000 > 05000: 00000000 06000: 00000000 07000: 00000000 08000: 00000000 > 09000: 00000000 0a000: 00000000 0b000: deadbeef 0c000: deadbeef If the vmcore were being created "correctly", why doesn't the dumpfile have the same data at 01000, 06000, 07000, 08000 and 0a000? And for that matter why aren't the other 00000000 pages being filtered out, given that DUMP_EXCLUDE_ZERO is being used? In any case, I'm happy that this configuration works with the new patch: (1) RHEL5 i386 But I'm still not convinced that these configurations work: (1) RHEL5 x86_64 (even with the new patch) (2) RHEL6 i386 Note that I haven't tested RHEL6 x86_64.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0152.html