Bug 1067188 - retrace-server should only run makedumpfile with dump_level which is different from the server generating the vmcore
Summary: retrace-server should only run makedumpfile with dump_level which is differen...
Status: CLOSED ERRATA
Alias: None
Product: Fedora EPEL
Classification: Fedora
Component: retrace-server
Version: el6
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
Assignee: Michal Toman
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Keywords: TestCaseProvided
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-02-19 22:28 UTC by Dave Wysochanski
Modified: 2015-03-23 00:42 UTC (History)
4 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2014-08-15 18:58:15 UTC


Attachments (Terms of Use)

Description Dave Wysochanski 2014-02-19 22:28:54 UTC
Description of problem:
By default retrace-server runs makedumpfile on all vmcores which come in based on a variable inside retrace-server.conf, VmcoreDumpLevel.  The idea is to strip out pages to save space and performance (for example, zero pages).  Note that on our server we have this inside /etc/retrace-server.conf:
# Run makedumpfile with specified dumplevel; <= 0 or >= 32 means disabled
VmcoreDumpLevel = 1

However a lot of vmcores which come in already have zero-pages stripped, so running it again by default is extra overhead with no benefit.

We should be able to detect this with a simple heuristic as outlined here (from Marc Milgram)
https://access.redhat.com/site/solutions/696803

I tested on a RHEL5.7 and RHEL6.2 vmcore and both worked great:

[dwysocha@optimus expect]$ ln -s /cores/retrace/tasks/644405833/crash/vmcore test/rhel5.7-vmcore
[dwysocha@optimus expect]$ makedumpfile -D --dump-dmesg test/rhel5.7-vmcore /dev/null 2>/dev/null | grep dump_level
  dump_level       : 1
[dwysocha@optimus expect]$ ln -s /cores/retrace/tasks/645955026/crash/vmcore test/rhel6.2-vmcore
[dwysocha@optimus expect]$ makedumpfile -D --dump-dmesg test/rhel6.2-vmcore /dev/null 2>/dev/null | grep dump_level
  dump_level       : 31

Version-Release number of selected component (if applicable):
retrace-server-1.10-1.el6.noarch
kexec-tools-2.0.0-258.el6_4.2.x86_64

Actual results:
makedumpfile is run on every vmcore based on setting in retrace-server.conf

Expected results:
There should be an option to configure retrace server to not run makedumpfile with redundant options, though I'm not entirely sure of the design.


Additional info:
We probably just need to patch this code in download_remote which calls strip_vmcore in /usr/lib/python2.6/site-packages/retrace/retrace.py

Maybe add a new function to determine dump level of the vmcore?  Then looks like we'll need to do something with VmcoreDumpLevel from the config file, or perhaps add a new config variable.  


def strip_vmcore(vmcore, kernelver=None):
    try:
        vmlinux = prepare_debuginfo(vmcore, kernelver=kernelver)
    except Exception as ex:
        log_warn("prepare_debuginfo failed: %s" % ex)
        return

    newvmcore = "%s.stripped" % vmcore
    retcode = call(["makedumpfile", "-c", "-d", "%d" % CONFIG["VmcoreDumpLevel"],
                    "-x", vmlinux, "--message-level", "0", vmcore, newvmcore])
    if retcode:
        log_warn("makedumpfile exited with %d" % retcode)
        if os.path.isfile(newvmcore):
            os.unlink(newvmcore)
    else:
        os.rename(newvmcore, vmcore)



    def download_remote(self, unpack=True, timeout=0, kernelver=None):
        """Downloads all remote resources and returns a list of errors."""
...
            if os.path.isfile(vmcore):
                oldsize = os.path.getsize(vmcore)
                log_info("Vmcore size: %s" % human_readable_size(oldsize))
                if CONFIG["VmcoreDumpLevel"] > 0 and CONFIG["VmcoreDumpLevel"] < 32:
                    log_debug("Executing makedumpfile")
                    start = time.time()
                    strip_vmcore(vmcore, kernelver)
                    dur = int(time.time() - start)
                    st = os.stat(vmcore)
                    os.chmod(vmcore, st.st_mode | stat.S_IRGRP)
                    log_info("Stripped size: %s" % human_readable_size(st.st_size))
                    log_info("Makedumpfile took %d seconds and saved %s" % (dur, human_readable_size(oldsize - st.st_size)))

Comment 1 Michal Toman 2014-02-27 12:06:59 UTC
Fixed in upstream

commit 1457d77d250ca5f0570e1077da569a9a1d131d81
Author: Michal Toman <mtoman@redhat.com>
Date:   Thu Feb 27 13:05:59 2014 +0100

    do not run makedumpfile when not necessary
    
    Signed-off-by: Michal Toman <mtoman@redhat.com>

Comment 2 Fedora Update System 2014-02-27 13:47:22 UTC
retrace-server-1.11-1.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/retrace-server-1.11-1.el6

Comment 3 Fedora Update System 2014-03-01 07:12:18 UTC
Package retrace-server-1.11-1.el6:
* should fix your issue,
* was pushed to the Fedora EPEL 6 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=epel-testing retrace-server-1.11-1.el6'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-EPEL-2014-0687/retrace-server-1.11-1.el6
then log in and leave karma (feedback).

Comment 6 Dave Wysochanski 2014-03-07 21:56:10 UTC
I think this one is fixed but it may have introduced a side-effect of not running makedumpfile on any vmcores.  I'm trying whether this is the case or not.

Comment 8 Dave Wysochanski 2014-03-07 22:35:30 UTC
Turns out my test was invalid and it looks like this is fixed with no side-effects that I can tell.

Comment 11 Fedora Update System 2014-07-31 11:52:55 UTC
retrace-server-1.12-2.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/retrace-server-1.12-2.el6

Comment 13 Fedora Update System 2014-08-15 18:58:15 UTC
retrace-server-1.12-2.el6 has been pushed to the Fedora EPEL 6 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.