Created attachment 409205 [details] kexec-phys40bit-fix.patch Description of problem: RHEL5 kernel supports upto 40bits physical address and discard pages above the boundary. /proc/vmcore refuses to access over-40bits areas. On the other hand, /sbin/kexec generates ELF header including the over-40bits area as physical memory due to /proc/iomem contents. As a result, kdump tries to read /proc/vmcore beyond 40bits boundary, gets -EINVAL and fails. The proposed patch fixes the problem by applying the same limitation as kernel to kexec. Version-Release number of selected component (if applicable): kernel-2.6.18-194.el5 How reproducible: 100% Steps to Reproduce: 1. Set up kdump 2. Trigger kdump # echo c > /proc/sysrq-trigger Actual results: Saving vmcore aborts near the end. Expected results: Saving vmcore succeeds. Additional info:
I fixed this a few weeks back already, but thank you dave! 1.102pre-97 should have the fix you need. *** This bug has been marked as a duplicate of bug 559928 ***
Hi Neil - These bugs certainly appear to be similar, however this one's x86_64. The patch (provided by NEC) appears to do essentially the same thing as the patch from bug 559928, but at a different boundary. ----- Fix kdump on a machine with 1TB memory. RHEL5 kernel supports upto 40bits physical address and discard pages above the boundary. /proc/vmcore refuses to access over-40bits areas. On the other hand, /sbin/kexec generates ELF header including the over-40bits area as physical memory due to /proc/iomem contents. As a result, kdump tries to read /proc/vmcore beyond 40bits boundary, gets -EINVAL and fails. This patch fixes the problem by applying the same limitation as kernel to kexec. --- kexec-tools-testing-20070330/kexec/arch/x86_64/crashdump-x86_64.c 2010-04-20 11:38:56.000000000 +0900 +++ kexec-tools-testing-20070330.1TB/kexec/arch/x86_64/crashdump-x86_64.c 2010-04-21 16:38:18.000000000 +0900 @@ -203,6 +203,11 @@ static int get_crash_memory_ranges(struc /* Only Dumping memory of type System RAM. */ if (memcmp(str, "System RAM\n", 11) == 0) { type = RANGE_RAM; +#define MAX_PHYSMEM_40BIT ((1UL << 40) - 1) + if (start > MAX_PHYSMEM_40BIT) + continue; + else if (end > MAX_PHYSMEM_40BIT) + end = MAX_PHYSMEM_40BIT; } else if (memcmp(str, "Crash kernel\n", 13) == 0) { /* Reserved memory region. New kernel can * use this region to boot into. */
ah, I see what they're doing now. Ok, I can take this, once all the flags get set
Neil -- We happen to have a 1TB system in Westford *today*. I have added Shyam so he can test the patch. Could you please upload or provide a link to a test rpm that he could try? Thanks!
The 1TB system I refer to below is x86_64.
Neil, does the suggested patch cover the case where the BIOS remaps memory (creates holes in the physical address space) so kdump would be physically trying to reference addresses over the 1TB boundary?
fixed, thanks!
Would all crashkernel parameters work here ? crashkernel=128M@16M would not work for me whereas crashkernel=128M&32M would work when I was debugging another issue on this system.
So, the kernel still crashes if I pass crashkernel=128M@16M for the kdump configuration The crashkernel=128M@32M always passes.This was the workaround used to pass certification. In my environment I don't see the kdump failing but a driver that needs the memory already reserved by kdump that panics. It is either the storage(boot controller driver) or the usb driver. Tested with kexec-tools-1.102pre-96.el5_5.2.x86_64.rpm
Created attachment 412985 [details] panic trace
(In reply to comment #11) > Would all crashkernel parameters work here ? > > crashkernel=128M@16M would not work for me whereas crashkernel=128M&32M would > work > when I was debugging another issue on this system. hi Shyam, this is maybe another bug I think. And any chance to verify this 1tb bug with kexec-tools-1.102pre-96.el5_5? Thanks.
(In reply to comment #16) > (In reply to comment #11) > > Would all crashkernel parameters work here ? > > > > crashkernel=128M@16M would not work for me whereas crashkernel=128M&32M would > > work > > when I was debugging another issue on this system. > > hi Shyam, this is maybe another bug I think. And any chance to verify this 1tb > bug with kexec-tools-1.102pre-96.el5_5? Thanks. Sorry, should be 1.102pre-96.el5_5.2. Thanks.
Partners, Please grab the latest available bits here to test whether the new kdump can save vmcore to local disk. http://people.redhat.com/qcai/kexec-tools/
We no longer have in Westford the 1TB Dell system I referred to in comments 5, 6 (it had to be sent back and it's currently not available to RH). Shyam had kindly volunteered to assist with the testing, but given that the system is no longer available, he won't be able to provide any additional testing feedback.
Event posted on 05-24-2010 01:54pm JST by jnomura File uploaded: kexec-1.102pre-96.el5_5.2.log This event sent from IssueTracker by mfuruta issue 795343 it_file 694393
Event posted on 05-24-2010 01:54pm JST by jnomura Furuta-san, With kexec-1.102pre-96.el5_5.2 and "crashkernel=128M@16M" boot option, - vmcore was saved to the local disk without error - crash can open the vmcore without error on our 1TB-memory machine. Attached is a short log. Internal Status set to 'Waiting on Support' Status set to: Waiting on Tech This event sent from IssueTracker by mfuruta issue 795343
Hi, (In reply to comment #20) > Partners, > > Please grab the latest available bits here to test whether the new kdump can > save vmcore to local disk. > > http://people.redhat.com/qcai/kexec-tools/ NEC had verified this on kexec-1.102pre-96.el5_5.2, I've forwarded it from IT#795343, could you please check last comment from them? Thank you in advance. Best Regards, Masaki Furuta
Thank you Masaki-san.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0061.html