Bug 1067188

Summary: retrace-server should only run makedumpfile with dump_level which is different from the server generating the vmcore
Product: [Fedora] Fedora EPEL Reporter: Dave Wysochanski <dwysocha>
Component: retrace-serverAssignee: Michal Toman <mtoman>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: medium    
Version: el6CC: mmilgram, mtoman, pknirsch, rvokal
Target Milestone: ---Keywords: TestCaseProvided
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: retrace-server-1.12-2.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-15 18:58:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dave Wysochanski 2014-02-19 22:28:54 UTC
Description of problem:
By default retrace-server runs makedumpfile on all vmcores which come in based on a variable inside retrace-server.conf, VmcoreDumpLevel.  The idea is to strip out pages to save space and performance (for example, zero pages).  Note that on our server we have this inside /etc/retrace-server.conf:
# Run makedumpfile with specified dumplevel; <= 0 or >= 32 means disabled
VmcoreDumpLevel = 1

However a lot of vmcores which come in already have zero-pages stripped, so running it again by default is extra overhead with no benefit.

We should be able to detect this with a simple heuristic as outlined here (from Marc Milgram)
https://access.redhat.com/site/solutions/696803

I tested on a RHEL5.7 and RHEL6.2 vmcore and both worked great:

[dwysocha@optimus expect]$ ln -s /cores/retrace/tasks/644405833/crash/vmcore test/rhel5.7-vmcore
[dwysocha@optimus expect]$ makedumpfile -D --dump-dmesg test/rhel5.7-vmcore /dev/null 2>/dev/null | grep dump_level
  dump_level       : 1
[dwysocha@optimus expect]$ ln -s /cores/retrace/tasks/645955026/crash/vmcore test/rhel6.2-vmcore
[dwysocha@optimus expect]$ makedumpfile -D --dump-dmesg test/rhel6.2-vmcore /dev/null 2>/dev/null | grep dump_level
  dump_level       : 31

Version-Release number of selected component (if applicable):
retrace-server-1.10-1.el6.noarch
kexec-tools-2.0.0-258.el6_4.2.x86_64

Actual results:
makedumpfile is run on every vmcore based on setting in retrace-server.conf

Expected results:
There should be an option to configure retrace server to not run makedumpfile with redundant options, though I'm not entirely sure of the design.


Additional info:
We probably just need to patch this code in download_remote which calls strip_vmcore in /usr/lib/python2.6/site-packages/retrace/retrace.py

Maybe add a new function to determine dump level of the vmcore?  Then looks like we'll need to do something with VmcoreDumpLevel from the config file, or perhaps add a new config variable.  


def strip_vmcore(vmcore, kernelver=None):
    try:
        vmlinux = prepare_debuginfo(vmcore, kernelver=kernelver)
    except Exception as ex:
        log_warn("prepare_debuginfo failed: %s" % ex)
        return

    newvmcore = "%s.stripped" % vmcore
    retcode = call(["makedumpfile", "-c", "-d", "%d" % CONFIG["VmcoreDumpLevel"],
                    "-x", vmlinux, "--message-level", "0", vmcore, newvmcore])
    if retcode:
        log_warn("makedumpfile exited with %d" % retcode)
        if os.path.isfile(newvmcore):
            os.unlink(newvmcore)
    else:
        os.rename(newvmcore, vmcore)



    def download_remote(self, unpack=True, timeout=0, kernelver=None):
        """Downloads all remote resources and returns a list of errors."""
...
            if os.path.isfile(vmcore):
                oldsize = os.path.getsize(vmcore)
                log_info("Vmcore size: %s" % human_readable_size(oldsize))
                if CONFIG["VmcoreDumpLevel"] > 0 and CONFIG["VmcoreDumpLevel"] < 32:
                    log_debug("Executing makedumpfile")
                    start = time.time()
                    strip_vmcore(vmcore, kernelver)
                    dur = int(time.time() - start)
                    st = os.stat(vmcore)
                    os.chmod(vmcore, st.st_mode | stat.S_IRGRP)
                    log_info("Stripped size: %s" % human_readable_size(st.st_size))
                    log_info("Makedumpfile took %d seconds and saved %s" % (dur, human_readable_size(oldsize - st.st_size)))

Comment 1 Michal Toman 2014-02-27 12:06:59 UTC
Fixed in upstream

commit 1457d77d250ca5f0570e1077da569a9a1d131d81
Author: Michal Toman <mtoman>
Date:   Thu Feb 27 13:05:59 2014 +0100

    do not run makedumpfile when not necessary
    
    Signed-off-by: Michal Toman <mtoman>

Comment 2 Fedora Update System 2014-02-27 13:47:22 UTC
retrace-server-1.11-1.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/retrace-server-1.11-1.el6

Comment 3 Fedora Update System 2014-03-01 07:12:18 UTC
Package retrace-server-1.11-1.el6:
* should fix your issue,
* was pushed to the Fedora EPEL 6 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=epel-testing retrace-server-1.11-1.el6'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-EPEL-2014-0687/retrace-server-1.11-1.el6
then log in and leave karma (feedback).

Comment 6 Dave Wysochanski 2014-03-07 21:56:10 UTC
I think this one is fixed but it may have introduced a side-effect of not running makedumpfile on any vmcores.  I'm trying whether this is the case or not.

Comment 8 Dave Wysochanski 2014-03-07 22:35:30 UTC
Turns out my test was invalid and it looks like this is fixed with no side-effects that I can tell.

Comment 11 Fedora Update System 2014-07-31 11:52:55 UTC
retrace-server-1.12-2.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/retrace-server-1.12-2.el6

Comment 13 Fedora Update System 2014-08-15 18:58:15 UTC
retrace-server-1.12-2.el6 has been pushed to the Fedora EPEL 6 stable repository.  If problems still persist, please make note of it in this bug report.