Description of problem: By default retrace-server runs makedumpfile on all vmcores which come in based on a variable inside retrace-server.conf, VmcoreDumpLevel. The idea is to strip out pages to save space and performance (for example, zero pages). Note that on our server we have this inside /etc/retrace-server.conf: # Run makedumpfile with specified dumplevel; <= 0 or >= 32 means disabled VmcoreDumpLevel = 1 However a lot of vmcores which come in already have zero-pages stripped, so running it again by default is extra overhead with no benefit. We should be able to detect this with a simple heuristic as outlined here (from Marc Milgram) https://access.redhat.com/site/solutions/696803 I tested on a RHEL5.7 and RHEL6.2 vmcore and both worked great: [dwysocha@optimus expect]$ ln -s /cores/retrace/tasks/644405833/crash/vmcore test/rhel5.7-vmcore [dwysocha@optimus expect]$ makedumpfile -D --dump-dmesg test/rhel5.7-vmcore /dev/null 2>/dev/null | grep dump_level dump_level : 1 [dwysocha@optimus expect]$ ln -s /cores/retrace/tasks/645955026/crash/vmcore test/rhel6.2-vmcore [dwysocha@optimus expect]$ makedumpfile -D --dump-dmesg test/rhel6.2-vmcore /dev/null 2>/dev/null | grep dump_level dump_level : 31 Version-Release number of selected component (if applicable): retrace-server-1.10-1.el6.noarch kexec-tools-2.0.0-258.el6_4.2.x86_64 Actual results: makedumpfile is run on every vmcore based on setting in retrace-server.conf Expected results: There should be an option to configure retrace server to not run makedumpfile with redundant options, though I'm not entirely sure of the design. Additional info: We probably just need to patch this code in download_remote which calls strip_vmcore in /usr/lib/python2.6/site-packages/retrace/retrace.py Maybe add a new function to determine dump level of the vmcore? Then looks like we'll need to do something with VmcoreDumpLevel from the config file, or perhaps add a new config variable. def strip_vmcore(vmcore, kernelver=None): try: vmlinux = prepare_debuginfo(vmcore, kernelver=kernelver) except Exception as ex: log_warn("prepare_debuginfo failed: %s" % ex) return newvmcore = "%s.stripped" % vmcore retcode = call(["makedumpfile", "-c", "-d", "%d" % CONFIG["VmcoreDumpLevel"], "-x", vmlinux, "--message-level", "0", vmcore, newvmcore]) if retcode: log_warn("makedumpfile exited with %d" % retcode) if os.path.isfile(newvmcore): os.unlink(newvmcore) else: os.rename(newvmcore, vmcore) def download_remote(self, unpack=True, timeout=0, kernelver=None): """Downloads all remote resources and returns a list of errors.""" ... if os.path.isfile(vmcore): oldsize = os.path.getsize(vmcore) log_info("Vmcore size: %s" % human_readable_size(oldsize)) if CONFIG["VmcoreDumpLevel"] > 0 and CONFIG["VmcoreDumpLevel"] < 32: log_debug("Executing makedumpfile") start = time.time() strip_vmcore(vmcore, kernelver) dur = int(time.time() - start) st = os.stat(vmcore) os.chmod(vmcore, st.st_mode | stat.S_IRGRP) log_info("Stripped size: %s" % human_readable_size(st.st_size)) log_info("Makedumpfile took %d seconds and saved %s" % (dur, human_readable_size(oldsize - st.st_size)))
Fixed in upstream commit 1457d77d250ca5f0570e1077da569a9a1d131d81 Author: Michal Toman <mtoman> Date: Thu Feb 27 13:05:59 2014 +0100 do not run makedumpfile when not necessary Signed-off-by: Michal Toman <mtoman>
retrace-server-1.11-1.el6 has been submitted as an update for Fedora EPEL 6. https://admin.fedoraproject.org/updates/retrace-server-1.11-1.el6
Package retrace-server-1.11-1.el6: * should fix your issue, * was pushed to the Fedora EPEL 6 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=epel-testing retrace-server-1.11-1.el6' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-EPEL-2014-0687/retrace-server-1.11-1.el6 then log in and leave karma (feedback).
I think this one is fixed but it may have introduced a side-effect of not running makedumpfile on any vmcores. I'm trying whether this is the case or not.
Turns out my test was invalid and it looks like this is fixed with no side-effects that I can tell.
retrace-server-1.12-2.el6 has been submitted as an update for Fedora EPEL 6. https://admin.fedoraproject.org/updates/retrace-server-1.12-2.el6
retrace-server-1.12-2.el6 has been pushed to the Fedora EPEL 6 stable repository. If problems still persist, please make note of it in this bug report.