Description of problem:
By default retrace-server runs makedumpfile on all vmcores which come in based on a variable inside retrace-server.conf, VmcoreDumpLevel. The idea is to strip out pages to save space and performance (for example, zero pages). Note that on our server we have this inside /etc/retrace-server.conf:
# Run makedumpfile with specified dumplevel; <= 0 or >= 32 means disabled
VmcoreDumpLevel = 1
However a lot of vmcores which come in already have zero-pages stripped, so running it again by default is extra overhead with no benefit.
We should be able to detect this with a simple heuristic as outlined here (from Marc Milgram)
I tested on a RHEL5.7 and RHEL6.2 vmcore and both worked great:
[dwysocha@optimus expect]$ ln -s /cores/retrace/tasks/644405833/crash/vmcore test/rhel5.7-vmcore
[dwysocha@optimus expect]$ makedumpfile -D --dump-dmesg test/rhel5.7-vmcore /dev/null 2>/dev/null | grep dump_level
dump_level : 1
[dwysocha@optimus expect]$ ln -s /cores/retrace/tasks/645955026/crash/vmcore test/rhel6.2-vmcore
[dwysocha@optimus expect]$ makedumpfile -D --dump-dmesg test/rhel6.2-vmcore /dev/null 2>/dev/null | grep dump_level
dump_level : 31
Version-Release number of selected component (if applicable):
makedumpfile is run on every vmcore based on setting in retrace-server.conf
There should be an option to configure retrace server to not run makedumpfile with redundant options, though I'm not entirely sure of the design.
We probably just need to patch this code in download_remote which calls strip_vmcore in /usr/lib/python2.6/site-packages/retrace/retrace.py
Maybe add a new function to determine dump level of the vmcore? Then looks like we'll need to do something with VmcoreDumpLevel from the config file, or perhaps add a new config variable.
def strip_vmcore(vmcore, kernelver=None):
vmlinux = prepare_debuginfo(vmcore, kernelver=kernelver)
except Exception as ex:
log_warn("prepare_debuginfo failed: %s" % ex)
newvmcore = "%s.stripped" % vmcore
retcode = call(["makedumpfile", "-c", "-d", "%d" % CONFIG["VmcoreDumpLevel"],
"-x", vmlinux, "--message-level", "0", vmcore, newvmcore])
log_warn("makedumpfile exited with %d" % retcode)
def download_remote(self, unpack=True, timeout=0, kernelver=None):
"""Downloads all remote resources and returns a list of errors."""
oldsize = os.path.getsize(vmcore)
log_info("Vmcore size: %s" % human_readable_size(oldsize))
if CONFIG["VmcoreDumpLevel"] > 0 and CONFIG["VmcoreDumpLevel"] < 32:
start = time.time()
dur = int(time.time() - start)
st = os.stat(vmcore)
os.chmod(vmcore, st.st_mode | stat.S_IRGRP)
log_info("Stripped size: %s" % human_readable_size(st.st_size))
log_info("Makedumpfile took %d seconds and saved %s" % (dur, human_readable_size(oldsize - st.st_size)))
Fixed in upstream
Author: Michal Toman <firstname.lastname@example.org>
Date: Thu Feb 27 13:05:59 2014 +0100
do not run makedumpfile when not necessary
Signed-off-by: Michal Toman <email@example.com>
retrace-server-1.11-1.el6 has been submitted as an update for Fedora EPEL 6.
* should fix your issue,
* was pushed to the Fedora EPEL 6 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=epel-testing retrace-server-1.11-1.el6'
as soon as you are able to.
Please go to the following url:
then log in and leave karma (feedback).
I think this one is fixed but it may have introduced a side-effect of not running makedumpfile on any vmcores. I'm trying whether this is the case or not.
Turns out my test was invalid and it looks like this is fixed with no side-effects that I can tell.
retrace-server-1.12-2.el6 has been submitted as an update for Fedora EPEL 6.
retrace-server-1.12-2.el6 has been pushed to the Fedora EPEL 6 stable repository. If problems still persist, please make note of it in this bug report.