Bug 705142

Summary: [RHEL6.1] crash: compressed kdump: invalid nr_cpus value: <cpus>
Product: Red Hat Enterprise Linux 6 Reporter: Jeff Burke <jburke>
Component: crashAssignee: Dave Anderson <anderson>
Status: CLOSED ERRATA QA Contact: Kernel Dump QE <kernel-dump-qe>
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: anderson, dwa, gasmith, gbeshers, martinez, pbunyan, phan, randerso, rja
Target Milestone: rc   
Target Release: 6.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: crash-5.1.7-1.el6 Doc Type: Bug Fix
Doc Text:
Previously, compressed kdump dump files were handled incorrectly on AMD64 and Intel 64 architectures if a system contained more than 454 CPUs. In such a case, the crash session terminated during initialization with the "crash: compressed kdump: invalid nr_cpus value: [cpus]" error message. A patch has been provided to address this issue, and the compressed dump files are now handled properly, thus fixing this bug.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 16:30:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 688933    

Description Jeff Burke 2011-05-16 18:53:56 UTC
If an x86_64 compressed kdump generated with "makedumpfile -c" contains
more than 454 cpus, a crash session using RHEL6 crash-5.1.1-2.el6 will
fails immediately with these two error messages:

  crash: compressed kdump: invalid nr_cpus value: <cpus>
  crash: vmcore: not a supported file format

This is a vestige of the old diskdump dumpfile format upon which the
compressed kdump format is based, where the dumpfile header in diskdump
dumpfiles imposed such a limit, but which does not exist in compressed
kdump dumpfile headers.  

The upstream crash utility version 5.1.5 has this fix, which restricts
the error to only diskdump dumpfiles:

  http://people.redhat.com/anderson/crash.changelog.html

         - Fix for the handling of x86_64 compressed kdump dumpfiles where
           the crashing system contained more than 454 cpus.  Without the
           patch, the crash session fails during initialization with the error
           message "crash: compressed kdump: invalid nr_cpus value: <cpus>"
           followed by "crash: vmcore: not a supported file format".
           (tindoh, tachibana.nec.co.jp)

This problem is only applicable to compressed kdump dumpfiles, and
is not a problem with ELF-style kdump dumpfiles.  So the only way to
work around this problem with a crash utility version earler than
crash-5.1.5 is to avoid compressing the kernel.  This can be done
by *not* specifying the "makedumpfile -c" argument to the "core_collector"
directive in /etc/kdump.conf.  However, it is still acceptable to use
"makedumpfile -d<option>" to filter out pages, because the output will
still be an ELF kdump dumpfile.

Comment 1 Dave Anderson 2011-05-16 19:22:24 UTC
FYI -- this issue was originally discovered and reported by Cliff Wickman
of SGI (cpw) in this crash-utility mailing list thread:
  
  [Crash-utility] x86_64 limit of 454 cpu's?
  https://www.redhat.com/archives/crash-utility/2011-April/msg00020.html

Comment 4 Tomas Capek 2011-10-18 15:01:05 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, compressed kdump dump files were handled incorrectly on AMD64 and Intel 64 architectures if a system contained more than 454 CPUs. In such a case, the crash session terminated during initialization with the "crash: compressed kdump: invalid nr_cpus value: [cpus]" error message. A patch has been provided to address this issue, and the compressed dump files are now handled properly, thus fixing this bug.

Comment 5 errata-xmlrpc 2011-12-06 16:30:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1648.html