Bug 1076833 - On vmcore tarballs with no group read perms, retrace-server may create crash/vmcore without group read perms set so 'retrace-server-interact <taskid> crash' fails with permission denied
Summary: On vmcore tarballs with no group read perms, retrace-server may create crash/...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora EPEL
Classification: Fedora
Component: retrace-server
Version: el6
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
Assignee: Dave Wysochanski
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-03-15 15:04 UTC by Dave Wysochanski
Modified: 2017-04-11 22:26 UTC (History)
3 users (show)

Fixed In Version: retrace-server-1.11-4.el6.noarch
Clone Of:
Environment:
Last Closed: 2017-04-11 22:26:46 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Patch to fix this bug, v1 (2.39 KB, text/plain)
2014-03-17 18:09 UTC, Dave Wysochanski
no flags Details
Patch to fix this bug on top of previous patch. Move 'stat' and 'chmod' to the very end after all extraction and makedumpfile processing. Fix bug introduced where if makedumpfile ran it would always report 'saved 0 bytes' when it may have saved signfica (2.88 KB, text/plain)
2014-04-21 13:51 UTC, Dave Wysochanski
no flags Details

Description Dave Wysochanski 2014-03-15 15:04:04 UTC
Description of problem:
It looks like with some vmcores, retrace-server may process them ok but not add group read permissions on the vmcore.  So anyone running 'retrace-server-interact <taskid> crash' won't be able to read the core and crash will fail.  I am not sure if this is a regression, or something that has been there for a long time.  I think it just shows up on certain vmcores.

Example:
$ retrace-server-interact 556682995 crash
WARNING:root: 2014-03-15 10:39:13 Unable to list modules: crash exited with 1:
crash: /cores/retrace/tasks/556682995/crash/vmcore: Permission denied

Usage:

  crash [OPTION]... NAMELIST MEMORY-IMAGE  (dumpfile form)
  crash [OPTION]... [NAMELIST]             (live system form)

Enter "crash -h" for details.

If you want to execute the command manually, you can run
$ crash -i /cores/retrace/tasks/556682995/crashrc /cores/retrace/tasks/556682995/crash/vmcore /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/2.6.32-358.6.2.el6.x86_64/vmlinux


crash 7.0.1
Copyright (C) 2002-2013  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
crash: /cores/retrace/tasks/556682995/crash/vmcore: Permission denied

Usage:

  crash [OPTION]... NAMELIST MEMORY-IMAGE  (dumpfile form)
  crash [OPTION]... [NAMELIST]             (live system form)

Enter "crash -h" for details.
$ ls -lh /cores/retrace/tasks/556682995/crash/vmcore
-rw-------. 1 retrace gss-eng-collab 2.9G Mar 14 09:30 /cores/retrace/tasks/556682995/crash/vmcore


Version-Release number of selected component (if applicable):
retrace-server-1.11-1.el6.noarch


How reproducible:
Unclear.  I think it is only certain vmcores.  Perhaps ones where the vmcore is contained in a tar or gz file, and the final perms are more restrictive than they need to be, but this is just a guess.


Steps to Reproduce:
1. Submit vmcore to retrace-server.
2. vmcore completes processing ok
3. 'retrace-server-interact <taskid> crash' or crash tool fails due to crash being unable to read the file.

Actual results:
crash: /cores/retrace/tasks/556682995/crash/vmcore: Permission denied

Expected results:
The vmcore in 'crash/vmcore' should have 'other' and 'world' read perms after extraction so crash can read it.

Additional info:
I have an example of one vmcore which fails and one that succeeds.

Comment 3 Dave Wysochanski 2014-03-15 23:19:35 UTC
Ok, I'm more convinced this is a regression now.  I've been successfully getting vmcores from this one customer with no problems.  Now the latest ones I submitted tonight are unreadable with the same problem.
 for t in 370500522 373515740 154200319 347574662; do ls -lh /cores/retrace/tasks/$t/crash/vmcore; done
-rw-------. 1 retrace gss-eng-collab 474M Mar 15 04:13 /cores/retrace/tasks/370500522/crash/vmcore
-rw-------. 1 retrace gss-eng-collab 555M Mar 15 12:23 /cores/retrace/tasks/373515740/crash/vmcore
-rw-------. 1 retrace gss-eng-collab 531M Mar 15 07:55 /cores/retrace/tasks/154200319/crash/vmcore
-rw-------. 1 retrace gss-eng-collab 372M Mar 15 13:23 /cores/retrace/tasks/347574662/crash/vmcore

Comment 4 Dave Wysochanski 2014-03-17 15:28:54 UTC
NOTE: For now, we've got a workaround in place on our production system, which is a cronjob that looks for vmcore files under the /cores/retrace/tasks/<taskid>/crash/vmcore and does a 'chmod go+r on them".  Once this is fixed we'll remove the workaround.

Comment 5 Dave Wysochanski 2014-03-17 15:50:21 UTC
Ah, I think I found it.  I think this is only a problem only if we skip the makedumpfile check.  This may have been introduced with the fix to skip over makedumpfile, https://bugzilla.redhat.com/show_bug.cgi?id=1067188

Latest code: /usr/lib/python2.6/site-packages/retrace/retrace.py

                skip_makedumpfile = CONFIG["VmcoreDumpLevel"] <= 0 or CONFIG["VmcoreDumpLevel"] >= 32
                if (dump_level is not None and
                    (dump_level & CONFIG["VmcoreDumpLevel"]) == CONFIG["VmcoreDumpLevel"]):
                    log_info("Stripping to %d would have no effect" % CONFIG["VmcoreDumpLevel"])
                    skip_makedumpfile = True  <--------------------------------------------------- we don't do a chmod in this case

                if not skip_makedumpfile:
                    log_debug("Executing makedumpfile")
                    start = time.time()
                    strip_vmcore(vmcore, kernelver)
                    dur = int(time.time() - start)
                    st = os.stat(vmcore)
                    if (st.st_mode & stat.S_IRGRP) == 0:  <------------ probably needs moved outside and below the 'if' statements so it always gets executed.
                        try:
                            os.chmod(vmcore, st.st_mode | stat.S_IRGRP)  <---------------------- here is the chmod; only done underneath 'if not skip_makedumpfile'
                        except Exception as ex:
                            log_warn("File '%s' is not group readable and chmod"
                                     " failed. The process will continue but if"
                                     " it fails this is the likely cause."
                                     % vmcore)

                    log_info("Stripped size: %s" % human_readable_size(st.st_size))
                    log_info("Makedumpfile took %d seconds and saved %s" % (dur, human_readable_size(oldsize - st.st_size)))  


Now comparing with earlier code from https://bugzilla.redhat.com/show_bug.cgi?id=1067188#c0

    def download_remote(self, unpack=True, timeout=0, kernelver=None):
        """Downloads all remote resources and returns a list of errors."""
...
            if os.path.isfile(vmcore):
                oldsize = os.path.getsize(vmcore)
                log_info("Vmcore size: %s" % human_readable_size(oldsize))
                if CONFIG["VmcoreDumpLevel"] > 0 and CONFIG["VmcoreDumpLevel"] < 32:
                    log_debug("Executing makedumpfile")
                    start = time.time()
                    strip_vmcore(vmcore, kernelver)
                    dur = int(time.time() - start)
                    st = os.stat(vmcore)
                    os.chmod(vmcore, st.st_mode | stat.S_IRGRP)   <----------------------- used to do a chmod here unconditionally
                    log_info("Stripped size: %s" % human_readable_size(st.st_size))
                    log_info("Makedumpfile took %d seconds and saved %s" % (dur, human_readable_size(oldsize - st.st_size)))


It looks like download_remote has been refactored significantly though, perhaps for multiple bug fixes.  Actually it looks like we always had a form of the bug, it was just not noticed due to the fact that our config file was set such that we always did a makedumpfile stripping, and the chmod was after that.  When we added the logic to skip makedumpfile, we now have a situation where if the tarball was created with a vmcore without group read perms, it remains that way.

I'm not sure what a good fix is right now.  We may just want to add similar code to the non-stripped case, or perhaps better put the chmod below the 'if' conditionals.

Comment 6 Dave Wysochanski 2014-03-17 18:09:50 UTC
Created attachment 875615 [details]
Patch to fix this bug, v1

Comment 7 Dave Wysochanski 2014-03-17 18:13:47 UTC
(In reply to Dave Wysochanski from comment #6)
> Created attachment 875615 [details]
> Patch to fix this bug, v1
NOTE: This is completely untested but it's a first stab.

Comment 8 Dave Wysochanski 2014-03-17 19:48:33 UTC
I just verified the patch in comment #7 fixes the bug.

Comment 16 Dave Wysochanski 2014-04-11 01:39:58 UTC
I guess this is not fully fixed.  Today someone produced another vmcore that had another permissions issue.  This one ran makedumpfile but makedumpfile saved 0 bytes.  I tested the original vmcores in this bug and thos work.  But there must be another subtlety I missed when makedumpfile is run.

Comment 23 Dave Wysochanski 2014-04-21 13:51:11 UTC
Created attachment 888095 [details]
Patch to fix this bug on top of previous patch.  Move 'stat' and 'chmod' to the very end after all extraction and makedumpfile processing.  Fix bug introduced where if makedumpfile ran it would always report 'saved 0 bytes' when it may have saved signfica

Comment 26 Dave Wysochanski 2014-05-06 17:44:41 UTC
This looks fixed in retrace-server-1.11-4.el6.noarch


Note You need to log in before you can comment on or make changes to this bug.