Bug 607400

Summary: UV support: kexec command: extend for large cpu count and memory
Product: Red Hat Enterprise Linux 6 Reporter: George Beshers <gbeshers>
Component: kexec-toolsAssignee: Cong Wang <amwang>
Status: CLOSED ERRATA QA Contact: Chao Ye <cye>
Severity: high Docs Contact:
Priority: high    
Version: 6.0CC: cpw, cye, dwa, gbeshers, martinez, phan, qcai, rkhan, syeghiay, tee
Target Milestone: rc   
Target Release: 6.1   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kexec-tools-2_0_0-172_el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 14:15:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 619426, 650298    
Bug Blocks: 580566, 645474    
Attachments:
Description Flags
Tar bz2 file of patches and a series file. none

Description George Beshers 2010-06-24 02:14:10 UTC
David,

I didn't actually check as these went upstream very recently,
so they might be in the package already.

George


Description of problem:
A couple fixes are needed to the kexec command to make dumps work on UV.

The MAX_MEMORY_RANGES of 64 is too small for a very large NUMA machine.
(A 512 processor SGI UV, for example.)
And fix a temporary workaround (hack) in load_crashdump_segments() that
assumes that 16k is sufficient for the size of the crashdump elf header.
This is too small for a machine with a large cpu count. A PT_NOTE is created
in the elf header for each cpu.

This first patch looks like this:

Index: kexec-tools-2.0.1/kexec/arch/i386/kexec-x86.h
===================================================================
--- kexec-tools-2.0.1.orig/kexec/arch/i386/kexec-x86.h
+++ kexec-tools-2.0.1/kexec/arch/i386/kexec-x86.h
@@ -1,7 +1,7 @@
 #ifndef KEXEC_X86_H
 #define KEXEC_X86_H

-#define MAX_MEMORY_RANGES 64
+#define MAX_MEMORY_RANGES 1024

 enum coretype {
        CORE_TYPE_UNDEF = 0,h
Index: kexec-tools-2.0.1/kexec/arch/x86_64/crashdump-x86_64.c
===================================================================
--- kexec-tools-2.0.1.orig/kexec/arch/x86_64/crashdump-x86_64.c
+++ kexec-tools-2.0.1/kexec/arch/x86_64/crashdump-x86_64.c
@@ -268,6 +268,9 @@ static int exclude_region(int *nr_ranges
 {
        int i, j, tidx = -1;
        struct memory_range temp_region;
+       temp_region.start = 0;
+       temp_region.end = 0;
+       temp_region.type = 0;

        for (i = 0; i < (*nr_ranges); i++) {
                unsigned long long mstart, mend;
@@ -403,6 +406,7 @@ static int delete_memmap(struct memory_r
                                memmap_p[i].end = addr - 1;
                                temp_region.start = addr + size;
                                temp_region.end = mend;
+                               temp_region.type = memmap_p[i].type;
                                operation = 1;
                                tidx = i;
                                break;
@@ -580,7 +584,7 @@ int load_crashdump_segments(struct kexec
                                unsigned long max_addr, unsigned long min_base)
 {
        void *tmp;
-       unsigned long sz, elfcorehdr;
+       unsigned long sz, bufsz, memsz, elfcorehdr;
        int nr_ranges, align = 1024, i;
        struct memory_range *mem_range, *memmap_p;

@@ -613,9 +617,10 @@ int load_crashdump_segments(struct kexec
        /* Create elf header segment and store crash image data. */
        if (crash_create_elf64_headers(info, &elf_info,
                                       crash_memory_range, nr_ranges,
-                                      &tmp, &sz,
+                                      &tmp, &bufsz,
                                       ELF_CORE_HEADER_ALIGN) < 0)
                return -1;
+       /* the size of the elf headers allocated is returned in 'bufsz' */

        /* Hack: With some ld versions (GNU ld version 2.14.90.0.4 20030523),
         * vmlinux program headers show a gap of two pages between bss segment
@@ -624,9 +629,15 @@ int load_crashdump_segments(struct kexec
         * elf core header segment to 16K to avoid being placed in such gaps.
         * This is a makeshift solution until it is fixed in kernel.
         */
-       elfcorehdr = add_buffer(info, tmp, sz, 16*1024, align, min_base,
+       if (bufsz < (16*1024))
+               /* bufsize is big enough for all the PT_NOTE's and PT_LOAD's */
+               memsz = 16*1024;
+               /* memsz will be the size of the memory hole we look for */
+       else
+               memsz = bufsz;
+       elfcorehdr = add_buffer(info, tmp, bufsz, memsz, align, min_base,
                                                        max_addr, -1);
-       if (delete_memmap(memmap_p, elfcorehdr, sz) < 0)
+       if (delete_memmap(memmap_p, elfcorehdr, memsz) < 0)
                return -1;
        cmdline_add_memmap(mod_cmdline, memmap_p);
        cmdline_add_elfcorehdr(mod_cmdline, elfcorehdr);


and the other to prevent some rather verbose kexec grumbling:

Index: kexec-tools/kexec/firmware_memmap.c
===================================================================
--- kexec-tools.orig/kexec/firmware_memmap.c
+++ kexec-tools/kexec/firmware_memmap.c
@@ -161,6 +161,8 @@ static int parse_memmap_entry(const char
                range->type = RANGE_RAM;
        else if (strcmp(type, "ACPI Tables") == 0)
                range->type = RANGE_ACPI;
+       else if (strcmp(type, "Unusable memory") == 0)
+               range->type = RANGE_RESERVED;
        else if (strcmp(type, "reserved") == 0)
                range->type = RANGE_RESERVED;
        else if (strcmp(type, "Unusable memory") == 0)

Both have been applied upstream.



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 RHEL Program Management 2010-06-24 02:32:51 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 4 Marizol Martinez 2010-07-08 15:31:02 UTC
George -- Per your comment in the description, please verify and update this BZ accordingly. Thanks!

Comment 5 Cong Wang 2010-07-09 08:05:24 UTC
George, please either provide the patch as an attachment or give me the upstream commit ID's, please don't inline the patch in BZ, it is unusable.

Also, have you tested it?

Comment 6 George Beshers 2010-07-20 16:30:53 UTC
Amerigo,

Sorry, our internal bug system doesn't have the attachment capability
and I did a cut-and-paste.

We found another problem in the kernel with kdump.
I am planning on testing this on a 5Tb system tomorrow (7/21).

George

Comment 8 George Beshers 2010-07-29 13:42:57 UTC
Amerigo,

I ran across a couple of completely different bugs testing this.

Also, we are making a large (1024core) system available
to RedHat on Tuesdays.  It did not happen this last Tuesday
because of a problem booting the system.

George

Comment 11 Cong Wang 2010-08-09 04:32:30 UTC
commit 4b4b2a533e218e287ab4aed25678434ad938309e
Author: Cliff Wickman <cpw>
Date:   Wed Jun 16 08:36:09 2010 -0500

    kexec: extend for large cpu count and memory
    
-----------
commit 26ed909df48ea3db3f7395713a9c68c94d091032
Author: Cliff Wickman <cpw>
Date:   Thu Jun 17 11:37:06 2010 -0500

    kexec: Unusable memory range type
    
-----------

Are the above two commits all what we need? It seems I am still missing some other commit?

Comment 12 George Beshers 2010-08-09 16:31:48 UTC
Hi Amerigo,

I believe that those are the only two patches we need,
although to actually do a dump we can't really dump a full
5Tb.  Our suggestion is to set the debug level to 31 which
should provide a great deal of useful information if there
is a problem in the field with rhel6.

In any case, SGI is making a large system available to RedHat
this evening until early Wed morning.  I am hoping to find
time in that period to test kdump.

George

Comment 13 Cong Wang 2010-08-10 10:27:58 UTC
George, Okay, we already use '-d 31' by default now.
I am waiting for your testing result. Thanks!

Comment 14 Cong Wang 2010-08-11 10:16:41 UTC
I built a test package:
https://brewweb.devel.redhat.com/taskinfo?taskID=2674836

Comment 17 George Beshers 2010-08-18 14:17:37 UTC
The makedumpfile command worked with our modified kexec based on 2.0.1.
However, the modified kexec did not work.

I am currently on my third patch to try to fix the problem.

George

Comment 18 George Beshers 2010-08-19 18:38:47 UTC
To clarify the situation.

I asked another SGI engineer for help with this patch.
The patch does work, but against a later version of the
kexec-tools from upstream.

It was my mistake to pass the patch along without
personally testing it.  I worked this last weekend
to try to fix the patch.

George

Comment 19 Linda Wang 2010-08-20 02:28:41 UTC
thank you testing the package, and for the clarification..

so, which patch(es) from upstream kexec-tools is missing
other than the two patches listed in comment#11 above?

Comment 20 George Beshers 2010-08-20 12:37:49 UTC
I have requested help from another SGI engineer with this
and will be careful to test the patched rpm on the 1024 core
5Tb machine that we make available to RedHat on a weekly basis.

George

Comment 21 Marizol Martinez 2010-08-20 13:13:53 UTC
George -- I believe Linda's Q on comment #20 is still outstanding. Could you please update this BZ with the specific patches the upstream version has vs. RH's? Thanks!

Comment 22 George Beshers 2011-02-25 21:37:55 UTC

We finally found the problem with kexec-tools and the e820
table -- it manifested itself as a memory corruption in
the running kernel.

I am currently cleaning up the patchset -- the last patch
is upstream.

George

Comment 24 George Beshers 2011-03-08 21:18:23 UTC
Created attachment 483023 [details]
Tar bz2 file of patches and a series file.

Up to a few comment cleanups this is what was built

http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3164707

I have verified that this works on a number of UV systems.

The filo is a bzip2 tar file of a quilt patches directory

George

Comment 25 Cong Wang 2011-03-15 09:02:32 UTC
Ok, finally I get the tar ball. One question, are all these patches in upstream?

And I do appreciate that your patches attached are against latest RHEL-6 kexec-tools, this would save me much time to handle conflicts. Anyway, I will try to see if this is true. :)

Thanks.

Comment 26 gbeshers 2011-03-15 17:15:21 UTC
Hi Amerigo,

I added Cliff Wickman to the CC list.  He indicated that
they all were and I found most of them.  A few had been
partially applied and IIRC one I was unsure about because
some of the code had been rewritten and moved.

Let me know when you are ready to test and I will grab
a big system.

George

Comment 28 Cong Wang 2011-03-16 12:23:22 UTC
Thanks, George.

There are some problems from my eyes:

1. Not all commits matches in your patchset description, e.g. in kexec_segs_ranges,

Backport of commit 563ee341d950f2fae0ba6608d70c19eb647ff943
and commit 7b325f8528d230e50a0c3841a3ac587dea2200e2
just for crashdump-x86_64.c which doesn't exist upstream.

Neither of them matches that patch.

2. For 100823.kcore_header_patch, probably we need to backport my patch

commit 1100580b05e3fdfe648d9be8617d962b11f4b88b
Author: Amerigo Wang <amwang>
Date:   Thu Mar 3 00:10:43 2011 +0800

    get the backup area dynamically

Anyway, I will build a kexec-tools package with all of your patches except 100823.kcore_header_patch, plus the backport of 1100580b05e3fdfe648d9be8617d962b11f4b88b for you to test.

Comment 29 Cong Wang 2011-03-16 13:04:00 UTC
George, please help to test this one:
https://brewweb.devel.redhat.com/taskinfo?taskID=3181998

Thanks!

Comment 30 Cong Wang 2011-03-16 13:20:50 UTC
Hmm, please use this one instead:
https://brewweb.devel.redhat.com/taskinfo?taskID=3182054

Comment 31 George Beshers 2011-03-18 19:52:07 UTC
Hi Amerigo,

Interestingly enough if I take the x86_64 rpm that fails,
but if I rebuild the source rpm on the system I am testing
(I was trying to locate the problem) then it does work.

Possibly a problem with the Brew root?

George

Comment 32 Cong Wang 2011-03-21 05:47:37 UTC
Oh, maybe, I made the srpm locally and send it to brew to build. Anyway, I take all the patches. Please try

https://brewweb.devel.redhat.com/buildinfo?buildID=159954

to see if this rpm is okay.

Thanks.

Comment 35 gbeshers 2011-03-23 13:35:15 UTC
Seems to be OK, but I haven't tested on
a 2 rack system yet.

George

Comment 36 errata-xmlrpc 2011-05-19 14:15:15 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0736.html