Description of problem: diskdumpmsg emits a backtrace when run on a vmcore from a system with large amounts of RAM (reported on an ia64 with ~2Tb): Traceback (most recent call last): File "/sbin/diskdumpmsg", line 916, in ? vmcore = Vmcore.generate(vmcorefile) File "/sbin/diskdumpmsg", line 306, in generate return subclass(vmcore, map) File "/sbin/diskdumpmsg", line 621, in __init__ self.memory_dump(self.datafilename) File "/sbin/diskdumpmsg", line 678, in memory_dump page_desc_raw = self.fd.read(pd_size * self.header.max_mapnr) OverflowError: requested number of bytes is more than a Python string can hold Version-Release number of selected component (if applicable): That self.fs.read seems to be trying to slurp up the entire set of page descriptors in one read. When max_mapnr exceeds 178956970, this amounts to >4Gb and fails with "requested number of bytes is more than a Python string can hold". How reproducible: 100% Steps to Reproduce: 1. Generate a vmcore from a machine with several Tb of memory 2. Attempt to process the core with diskdumpmsg Actual results: Backtrace listed above. Expected results: diskdumpmsg reads core correctly. Additional info: Looks like that one big read should be split up to pull the descriptors in one at a time or in small groups.
Created attachment 148066 [details] diskdumpmsg.diff This patch fixes the problem.
The patch was merged in diskdumputils v1.3.27.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
MODIFIED -- CVS Tag: diskdumputils-1_4_0-1
The diskdumputils package has been re-spun -- CVS Tag: diskdumputils-1_4_1-1 Please post QA results here. I will transfer the test results to the errata's QA report, and then set this bugzilla to VERIFIED via the errata interface.
This request was previously evaluated by Red Hat Product Management for inclusion in the current Red Hat Enterprise Linux release, but Red Hat was unable to resolve it in time. This request will be reviewed for a future Red Hat Enterprise Linux release.
Reminder -- we are still awaiting QA results from Fujitsu for this bugzilla.
The diskdumputils package has been re-spun -- CVS Tag: diskdumputils-1_4_1-2 Please post QA results here. I will transfer the test results to the errata's QA report, and then set this bugzilla to VERIFIED via the errata interface.
Nobuhiro Tachino (ntachino) and Akira Imamura (aimamura) are no longer here at the Westford facility working as embedded engineers for Fujitsu, and therefore cannot complete the QA for this bugzilla's RHEL4-6 errata. For that reason, it is essential that the Issue Tracker REPORTER test this issue, and report back the results to this bugzilla. If the QA is successful, I (as the proxy maintainer) will transfer the test results to the errata's QA report, and then set this bugzilla to VERIFIED via the errata interface. Thanks, Dave Anderson
Given that mmatsuya has requested in IT #112576 that Fujitsu perform the QA for this bugzilla: > Event posted 10-03-2007 09:13pm by mmatsuya > Hi Indo-san, > > Have you already been in Westford? > Can anyone in Fujitsu re-test with the new packages diskdumputils-1.4.1-2? I have set this bugzilla's NEEDINFO to our remaining in-house Fujitsu representative, ktokunag, as he has offered off-line to help move things along.
Based upon the last two comments in IT #112576, I'm changing the NEEDINFO from ktokunag to mmatsuya: > Event posted 10-05-2007 03:22am by L3support_kernel > Hi matsuya-san, > > I take charge of this issue instead of Indoh. > I will confirm diskdumputils-1.4.1-2 by Oct 12th. > > Kazuhiro Yoshida > Event posted 10-05-2007 02:47am by mmatsuya > Please discuss who in Fujitsu will test this issue with diskdumputils-1.4.1-2 > without confusion. onsite team in Westford or anyone in Fujitsu Japan.
Yoshida-san, have you confirmed that this is fixed?
I'm not entirely clear on this, but as I understand it, if the system's memory was such that there could be a huge contiguous array of physical memory that in turn would cause a contiguous array of 178956970 (or more) page_desc_t structures (at 24 bytes each) to exist, which would make the total array size to be greater than 4GB, then the diskdumpmsg python script would fail. That could not occur on a 64GB machine, given that it would only have 4194304 total pages. That being said, the code has been restructured so that the page_desc_t reads are broken up, so it would be impossible to see the same failure with the new diskdumpmsg. So I'm not sure what is the best way to continue. I've tried reassigning this bugzilla to our new embedded Fujitsu engineer, Takao Indoh, who will be the diskdump maintainer in the future, and he is familiar with this issue, but his email address (tindoh) is not "known" by the bugzilla system yet. Kei, can you check with Takao and ask him for his suggestion on how best to proceed?
Per the discussion with Takao, it should occur on the machine having 256GB or more memory. We have a PRIMEQUEST, which has 512GB memory and is waiting for power, in the lab, so we will try to find a way to use it for the QA.
I finally setup the machine with 512GB memory, installed the latest RHEL4.6 (re20071011.0) on it, and have confirmed that the issue was fixed on diskdumputils-1.4.1-2 at first. Then, I installed the old version of diskdumputils (1.3.25), which did not have the fix patch, and confirmed that the issue reproduced on it.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0717.html