Bug 607400
| Summary: | UV support: kexec command: extend for large cpu count and memory | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | George Beshers <gbeshers> | ||||
| Component: | kexec-tools | Assignee: | Cong Wang <amwang> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Chao Ye <cye> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 6.0 | CC: | cpw, cye, dwa, gbeshers, martinez, phan, qcai, rkhan, syeghiay, tee | ||||
| Target Milestone: | rc | ||||||
| Target Release: | 6.1 | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | kexec-tools-2_0_0-172_el6 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-05-19 14:15:15 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 619426, 650298 | ||||||
| Bug Blocks: | 580566, 645474 | ||||||
| Attachments: |
|
||||||
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. George -- Per your comment in the description, please verify and update this BZ accordingly. Thanks! George, please either provide the patch as an attachment or give me the upstream commit ID's, please don't inline the patch in BZ, it is unusable. Also, have you tested it? Amerigo, Sorry, our internal bug system doesn't have the attachment capability and I did a cut-and-paste. We found another problem in the kernel with kdump. I am planning on testing this on a 5Tb system tomorrow (7/21). George Amerigo, I ran across a couple of completely different bugs testing this. Also, we are making a large (1024core) system available to RedHat on Tuesdays. It did not happen this last Tuesday because of a problem booting the system. George commit 4b4b2a533e218e287ab4aed25678434ad938309e
Author: Cliff Wickman <cpw>
Date: Wed Jun 16 08:36:09 2010 -0500
kexec: extend for large cpu count and memory
-----------
commit 26ed909df48ea3db3f7395713a9c68c94d091032
Author: Cliff Wickman <cpw>
Date: Thu Jun 17 11:37:06 2010 -0500
kexec: Unusable memory range type
-----------
Are the above two commits all what we need? It seems I am still missing some other commit?
Hi Amerigo, I believe that those are the only two patches we need, although to actually do a dump we can't really dump a full 5Tb. Our suggestion is to set the debug level to 31 which should provide a great deal of useful information if there is a problem in the field with rhel6. In any case, SGI is making a large system available to RedHat this evening until early Wed morning. I am hoping to find time in that period to test kdump. George George, Okay, we already use '-d 31' by default now. I am waiting for your testing result. Thanks! I built a test package: https://brewweb.devel.redhat.com/taskinfo?taskID=2674836 The makedumpfile command worked with our modified kexec based on 2.0.1. However, the modified kexec did not work. I am currently on my third patch to try to fix the problem. George To clarify the situation. I asked another SGI engineer for help with this patch. The patch does work, but against a later version of the kexec-tools from upstream. It was my mistake to pass the patch along without personally testing it. I worked this last weekend to try to fix the patch. George thank you testing the package, and for the clarification.. so, which patch(es) from upstream kexec-tools is missing other than the two patches listed in comment#11 above? I have requested help from another SGI engineer with this and will be careful to test the patched rpm on the 1024 core 5Tb machine that we make available to RedHat on a weekly basis. George George -- I believe Linda's Q on comment #20 is still outstanding. Could you please update this BZ with the specific patches the upstream version has vs. RH's? Thanks! We finally found the problem with kexec-tools and the e820 table -- it manifested itself as a memory corruption in the running kernel. I am currently cleaning up the patchset -- the last patch is upstream. George Created attachment 483023 [details] Tar bz2 file of patches and a series file. Up to a few comment cleanups this is what was built http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3164707 I have verified that this works on a number of UV systems. The filo is a bzip2 tar file of a quilt patches directory George Ok, finally I get the tar ball. One question, are all these patches in upstream? And I do appreciate that your patches attached are against latest RHEL-6 kexec-tools, this would save me much time to handle conflicts. Anyway, I will try to see if this is true. :) Thanks. Hi Amerigo, I added Cliff Wickman to the CC list. He indicated that they all were and I found most of them. A few had been partially applied and IIRC one I was unsure about because some of the code had been rewritten and moved. Let me know when you are ready to test and I will grab a big system. George Thanks, George.
There are some problems from my eyes:
1. Not all commits matches in your patchset description, e.g. in kexec_segs_ranges,
Backport of commit 563ee341d950f2fae0ba6608d70c19eb647ff943
and commit 7b325f8528d230e50a0c3841a3ac587dea2200e2
just for crashdump-x86_64.c which doesn't exist upstream.
Neither of them matches that patch.
2. For 100823.kcore_header_patch, probably we need to backport my patch
commit 1100580b05e3fdfe648d9be8617d962b11f4b88b
Author: Amerigo Wang <amwang>
Date: Thu Mar 3 00:10:43 2011 +0800
get the backup area dynamically
Anyway, I will build a kexec-tools package with all of your patches except 100823.kcore_header_patch, plus the backport of 1100580b05e3fdfe648d9be8617d962b11f4b88b for you to test.
George, please help to test this one: https://brewweb.devel.redhat.com/taskinfo?taskID=3181998 Thanks! Hmm, please use this one instead: https://brewweb.devel.redhat.com/taskinfo?taskID=3182054 Hi Amerigo, Interestingly enough if I take the x86_64 rpm that fails, but if I rebuild the source rpm on the system I am testing (I was trying to locate the problem) then it does work. Possibly a problem with the Brew root? George Oh, maybe, I made the srpm locally and send it to brew to build. Anyway, I take all the patches. Please try https://brewweb.devel.redhat.com/buildinfo?buildID=159954 to see if this rpm is okay. Thanks. Seems to be OK, but I haven't tested on a 2 rack system yet. George An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0736.html |
David, I didn't actually check as these went upstream very recently, so they might be in the package already. George Description of problem: A couple fixes are needed to the kexec command to make dumps work on UV. The MAX_MEMORY_RANGES of 64 is too small for a very large NUMA machine. (A 512 processor SGI UV, for example.) And fix a temporary workaround (hack) in load_crashdump_segments() that assumes that 16k is sufficient for the size of the crashdump elf header. This is too small for a machine with a large cpu count. A PT_NOTE is created in the elf header for each cpu. This first patch looks like this: Index: kexec-tools-2.0.1/kexec/arch/i386/kexec-x86.h =================================================================== --- kexec-tools-2.0.1.orig/kexec/arch/i386/kexec-x86.h +++ kexec-tools-2.0.1/kexec/arch/i386/kexec-x86.h @@ -1,7 +1,7 @@ #ifndef KEXEC_X86_H #define KEXEC_X86_H -#define MAX_MEMORY_RANGES 64 +#define MAX_MEMORY_RANGES 1024 enum coretype { CORE_TYPE_UNDEF = 0,h Index: kexec-tools-2.0.1/kexec/arch/x86_64/crashdump-x86_64.c =================================================================== --- kexec-tools-2.0.1.orig/kexec/arch/x86_64/crashdump-x86_64.c +++ kexec-tools-2.0.1/kexec/arch/x86_64/crashdump-x86_64.c @@ -268,6 +268,9 @@ static int exclude_region(int *nr_ranges { int i, j, tidx = -1; struct memory_range temp_region; + temp_region.start = 0; + temp_region.end = 0; + temp_region.type = 0; for (i = 0; i < (*nr_ranges); i++) { unsigned long long mstart, mend; @@ -403,6 +406,7 @@ static int delete_memmap(struct memory_r memmap_p[i].end = addr - 1; temp_region.start = addr + size; temp_region.end = mend; + temp_region.type = memmap_p[i].type; operation = 1; tidx = i; break; @@ -580,7 +584,7 @@ int load_crashdump_segments(struct kexec unsigned long max_addr, unsigned long min_base) { void *tmp; - unsigned long sz, elfcorehdr; + unsigned long sz, bufsz, memsz, elfcorehdr; int nr_ranges, align = 1024, i; struct memory_range *mem_range, *memmap_p; @@ -613,9 +617,10 @@ int load_crashdump_segments(struct kexec /* Create elf header segment and store crash image data. */ if (crash_create_elf64_headers(info, &elf_info, crash_memory_range, nr_ranges, - &tmp, &sz, + &tmp, &bufsz, ELF_CORE_HEADER_ALIGN) < 0) return -1; + /* the size of the elf headers allocated is returned in 'bufsz' */ /* Hack: With some ld versions (GNU ld version 2.14.90.0.4 20030523), * vmlinux program headers show a gap of two pages between bss segment @@ -624,9 +629,15 @@ int load_crashdump_segments(struct kexec * elf core header segment to 16K to avoid being placed in such gaps. * This is a makeshift solution until it is fixed in kernel. */ - elfcorehdr = add_buffer(info, tmp, sz, 16*1024, align, min_base, + if (bufsz < (16*1024)) + /* bufsize is big enough for all the PT_NOTE's and PT_LOAD's */ + memsz = 16*1024; + /* memsz will be the size of the memory hole we look for */ + else + memsz = bufsz; + elfcorehdr = add_buffer(info, tmp, bufsz, memsz, align, min_base, max_addr, -1); - if (delete_memmap(memmap_p, elfcorehdr, sz) < 0) + if (delete_memmap(memmap_p, elfcorehdr, memsz) < 0) return -1; cmdline_add_memmap(mod_cmdline, memmap_p); cmdline_add_elfcorehdr(mod_cmdline, elfcorehdr); and the other to prevent some rather verbose kexec grumbling: Index: kexec-tools/kexec/firmware_memmap.c =================================================================== --- kexec-tools.orig/kexec/firmware_memmap.c +++ kexec-tools/kexec/firmware_memmap.c @@ -161,6 +161,8 @@ static int parse_memmap_entry(const char range->type = RANGE_RAM; else if (strcmp(type, "ACPI Tables") == 0) range->type = RANGE_ACPI; + else if (strcmp(type, "Unusable memory") == 0) + range->type = RANGE_RESERVED; else if (strcmp(type, "reserved") == 0) range->type = RANGE_RESERVED; else if (strcmp(type, "Unusable memory") == 0) Both have been applied upstream. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: