Bug 1076833
| Summary: | On vmcore tarballs with no group read perms, retrace-server may create crash/vmcore without group read perms set so 'retrace-server-interact <taskid> crash' fails with permission denied | ||
|---|---|---|---|
| Product: | [Fedora] Fedora EPEL | Reporter: | Dave Wysochanski <dwysocha> |
| Component: | retrace-server | Assignee: | Dave Wysochanski <dwysocha> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | high | Docs Contact: | |
| Priority: | urgent | ||
| Version: | el6 | CC: | abrt-devel-list, harshula, rvokal |
| Target Milestone: | --- | Keywords: | Patch, Regression, TestCaseProvided |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | retrace-server-1.11-4.el6.noarch | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-04-11 22:26:46 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Attachments: | |||
|
Description
Dave Wysochanski
2014-03-15 15:04:04 UTC
Ok, I'm more convinced this is a regression now. I've been successfully getting vmcores from this one customer with no problems. Now the latest ones I submitted tonight are unreadable with the same problem. for t in 370500522 373515740 154200319 347574662; do ls -lh /cores/retrace/tasks/$t/crash/vmcore; done -rw-------. 1 retrace gss-eng-collab 474M Mar 15 04:13 /cores/retrace/tasks/370500522/crash/vmcore -rw-------. 1 retrace gss-eng-collab 555M Mar 15 12:23 /cores/retrace/tasks/373515740/crash/vmcore -rw-------. 1 retrace gss-eng-collab 531M Mar 15 07:55 /cores/retrace/tasks/154200319/crash/vmcore -rw-------. 1 retrace gss-eng-collab 372M Mar 15 13:23 /cores/retrace/tasks/347574662/crash/vmcore NOTE: For now, we've got a workaround in place on our production system, which is a cronjob that looks for vmcore files under the /cores/retrace/tasks/<taskid>/crash/vmcore and does a 'chmod go+r on them". Once this is fixed we'll remove the workaround. Ah, I think I found it. I think this is only a problem only if we skip the makedumpfile check. This may have been introduced with the fix to skip over makedumpfile, https://bugzilla.redhat.com/show_bug.cgi?id=1067188 Latest code: /usr/lib/python2.6/site-packages/retrace/retrace.py skip_makedumpfile = CONFIG["VmcoreDumpLevel"] <= 0 or CONFIG["VmcoreDumpLevel"] >= 32 if (dump_level is not None and (dump_level & CONFIG["VmcoreDumpLevel"]) == CONFIG["VmcoreDumpLevel"]): log_info("Stripping to %d would have no effect" % CONFIG["VmcoreDumpLevel"]) skip_makedumpfile = True <--------------------------------------------------- we don't do a chmod in this case if not skip_makedumpfile: log_debug("Executing makedumpfile") start = time.time() strip_vmcore(vmcore, kernelver) dur = int(time.time() - start) st = os.stat(vmcore) if (st.st_mode & stat.S_IRGRP) == 0: <------------ probably needs moved outside and below the 'if' statements so it always gets executed. try: os.chmod(vmcore, st.st_mode | stat.S_IRGRP) <---------------------- here is the chmod; only done underneath 'if not skip_makedumpfile' except Exception as ex: log_warn("File '%s' is not group readable and chmod" " failed. The process will continue but if" " it fails this is the likely cause." % vmcore) log_info("Stripped size: %s" % human_readable_size(st.st_size)) log_info("Makedumpfile took %d seconds and saved %s" % (dur, human_readable_size(oldsize - st.st_size))) Now comparing with earlier code from https://bugzilla.redhat.com/show_bug.cgi?id=1067188#c0 def download_remote(self, unpack=True, timeout=0, kernelver=None): """Downloads all remote resources and returns a list of errors.""" ... if os.path.isfile(vmcore): oldsize = os.path.getsize(vmcore) log_info("Vmcore size: %s" % human_readable_size(oldsize)) if CONFIG["VmcoreDumpLevel"] > 0 and CONFIG["VmcoreDumpLevel"] < 32: log_debug("Executing makedumpfile") start = time.time() strip_vmcore(vmcore, kernelver) dur = int(time.time() - start) st = os.stat(vmcore) os.chmod(vmcore, st.st_mode | stat.S_IRGRP) <----------------------- used to do a chmod here unconditionally log_info("Stripped size: %s" % human_readable_size(st.st_size)) log_info("Makedumpfile took %d seconds and saved %s" % (dur, human_readable_size(oldsize - st.st_size))) It looks like download_remote has been refactored significantly though, perhaps for multiple bug fixes. Actually it looks like we always had a form of the bug, it was just not noticed due to the fact that our config file was set such that we always did a makedumpfile stripping, and the chmod was after that. When we added the logic to skip makedumpfile, we now have a situation where if the tarball was created with a vmcore without group read perms, it remains that way. I'm not sure what a good fix is right now. We may just want to add similar code to the non-stripped case, or perhaps better put the chmod below the 'if' conditionals. Created attachment 875615 [details]
Patch to fix this bug, v1
(In reply to Dave Wysochanski from comment #6) > Created attachment 875615 [details] > Patch to fix this bug, v1 NOTE: This is completely untested but it's a first stab. I just verified the patch in comment #7 fixes the bug. I guess this is not fully fixed. Today someone produced another vmcore that had another permissions issue. This one ran makedumpfile but makedumpfile saved 0 bytes. I tested the original vmcores in this bug and thos work. But there must be another subtlety I missed when makedumpfile is run. Created attachment 888095 [details]
Patch to fix this bug on top of previous patch. Move 'stat' and 'chmod' to the very end after all extraction and makedumpfile processing. Fix bug introduced where if makedumpfile ran it would always report 'saved 0 bytes' when it may have saved signfica
This looks fixed in retrace-server-1.11-4.el6.noarch |