Bug 227365 - diskdumpmsg fails with dumps from large memory systems
diskdumpmsg fails with dumps from large memory systems
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: diskdumputils (Show other bugs)
All Linux
high Severity medium
: ---
: ---
Assigned To: Takao Indoh
Depends On:
Blocks: 222397 234251
  Show dependency treegraph
Reported: 2007-02-05 11:47 EST by Bryn M. Reeves
Modified: 2010-10-22 08:51 EDT (History)
4 users (show)

See Also:
Fixed In Version: RHBA-2007-0717
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2007-11-15 10:59:02 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
diskdumpmsg.diff (1.81 KB, patch)
2007-02-14 11:51 EST, Nobuhiro Tachino
no flags Details | Diff

  None (edit)
Description Bryn M. Reeves 2007-02-05 11:47:45 EST
Description of problem:
diskdumpmsg emits a backtrace when run on a vmcore from a system with large
amounts of RAM (reported on an ia64 with ~2Tb):

Traceback (most recent call last):
 File "/sbin/diskdumpmsg", line 916, in ?
   vmcore = Vmcore.generate(vmcorefile)
 File "/sbin/diskdumpmsg", line 306, in generate
   return subclass(vmcore, map)
 File "/sbin/diskdumpmsg", line 621, in __init__
 File "/sbin/diskdumpmsg", line 678, in memory_dump
   page_desc_raw = self.fd.read(pd_size * self.header.max_mapnr)
OverflowError: requested number of bytes is more than a Python string can
Version-Release number of selected component (if applicable):

That self.fs.read seems to be trying to slurp up the entire set of page
descriptors in one read. When max_mapnr exceeds 178956970, this amounts to >4Gb
and fails with "requested number of bytes is more than a Python string can

How reproducible:

Steps to Reproduce:
1. Generate a vmcore from a machine with several Tb of memory
2. Attempt to process the core with diskdumpmsg
Actual results:
Backtrace listed above.

Expected results:
diskdumpmsg reads core correctly.

Additional info:

Looks like that one big read should be split up to pull the descriptors in one
at a time or in small groups.
Comment 1 Nobuhiro Tachino 2007-02-14 11:51:09 EST
Created attachment 148066 [details]

This patch fixes the problem.
Comment 2 Nobuhiro Tachino 2007-02-21 11:03:43 EST
The patch was merged in diskdumputils v1.3.27.
Comment 3 RHEL Product and Program Management 2007-05-09 03:56:01 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
Comment 4 Dave Anderson 2007-07-13 15:02:41 EDT
MODIFIED -- CVS Tag: diskdumputils-1_4_0-1

Comment 6 Dave Anderson 2007-07-17 14:05:55 EDT
The diskdumputils package has been re-spun -- CVS Tag: diskdumputils-1_4_1-1

Please post QA results here.  I will transfer the test results
to the errata's QA report, and then set this bugzilla to VERIFIED
via the errata interface.
Comment 7 RHEL Product and Program Management 2007-09-13 15:50:49 EDT
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.
Comment 10 Dave Anderson 2007-09-28 15:48:37 EDT
Reminder -- we are still awaiting QA results from Fujitsu for this bugzilla.
Comment 11 Dave Anderson 2007-10-01 11:12:39 EDT
The diskdumputils package has been re-spun -- CVS Tag: diskdumputils-1_4_1-2

Please post QA results here.  I will transfer the test results
to the errata's QA report, and then set this bugzilla to VERIFIED
via the errata interface.
Comment 12 Dave Anderson 2007-10-03 11:57:51 EDT
Nobuhiro Tachino (ntachino@redhat.com) and Akira Imamura 
(aimamura@redhat.com) are no longer here at the Westford 
facility working as embedded engineers for Fujitsu, and 
therefore cannot complete the QA for this bugzilla's 
RHEL4-6 errata.

For that reason, it is essential that the Issue Tracker
REPORTER test this issue, and report back the results
to this bugzilla.  

If the QA is successful, I (as the proxy maintainer) will
transfer the test results to the errata's QA report, and 
then set this bugzilla to VERIFIED via the errata interface.  

  Dave Anderson
Comment 15 Dave Anderson 2007-10-04 11:54:09 EDT
Given that mmatsuya@redhat.com has requested in IT #112576 that Fujitsu
perform the QA for this bugzilla:

> Event posted 10-03-2007 09:13pm by mmatsuya 	
> Hi Indo-san,
> Have you already been in Westford?
> Can anyone in Fujitsu re-test with the new packages diskdumputils-1.4.1-2?

I have set this bugzilla's NEEDINFO to our remaining in-house Fujitsu
representative, ktokunag@redhat.com, as he has offered off-line to
help move things along.

Comment 18 Dave Anderson 2007-10-08 10:07:09 EDT
Based upon the last two comments in IT #112576, I'm changing
the NEEDINFO from ktokunag to mmatsuya:

> Event posted 10-05-2007 03:22am by L3support_kernel 	
> Hi matsuya-san,
> I take charge of this issue instead of Indoh.
> I will confirm diskdumputils-1.4.1-2 by Oct 12th.
> Kazuhiro Yoshida

> Event posted 10-05-2007 02:47am by mmatsuya 	
> Please discuss who in Fujitsu will test this issue with diskdumputils-1.4.1-2
> without confusion. onsite team in Westford or anyone in Fujitsu Japan.
Comment 20 Larry Troan 2007-10-11 13:59:52 EDT
Yoshida-san, have you confirmed that this is fixed?
Comment 24 Dave Anderson 2007-10-12 09:15:21 EDT
I'm not entirely clear on this, but as I understand it, if the system's
memory was such that there could be a huge contiguous array of physical
memory that in turn would cause a contiguous array of 178956970 (or more)
page_desc_t structures (at 24 bytes each) to exist, which would make
the total array size to be greater than 4GB, then the diskdumpmsg python
script would fail.  That could not occur on a 64GB machine, given that it
would only have 4194304 total pages.  

That being said, the code has been restructured so that the page_desc_t
reads are broken up, so it would be impossible to see the same failure
with the new diskdumpmsg.

So I'm not sure what is the best way to continue.

I've tried reassigning this bugzilla to our new embedded Fujitsu engineer,
Takao Indoh, who will be the diskdump maintainer in the future, and he is
familiar with this issue, but his email address (tindoh@redhat.com) is not
"known" by the bugzilla system yet.

Kei, can you check with Takao and ask him for his suggestion on how
best to proceed?

Comment 25 Keiichiro Tokunaga 2007-10-12 14:27:47 EDT
Per the discussion with Takao, it should occur on the machine having 256GB or 
more memory.  We have a PRIMEQUEST, which has 512GB memory and is waiting for 
power, in the lab, so we will try to find a way to use it for the QA.
Comment 26 Keiichiro Tokunaga 2007-10-15 16:36:45 EDT
I finally setup the machine with 512GB memory, installed the latest RHEL4.6 
(re20071011.0) on it, and have confirmed that the issue was fixed on 
diskdumputils-1.4.1-2 at first.

Then, I installed the old version of diskdumputils (1.3.25), which did not 
have the fix patch, and confirmed that the issue reproduced on it.
Comment 31 errata-xmlrpc 2007-11-15 10:59:02 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.