Bug 1314973

Summary: Refactor code to avoid calling prepare_debuginfo in a redundant way, store path to vmlinux file
Product: [Fedora] Fedora EPEL Reporter: Dave Wysochanski <dwysocha>
Component: retrace-serverAssignee: Dave Wysochanski <dwysocha>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: el6CC: jberan, michal.toman
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: retrace-server-1.15-2.fc23 retrace-server-1.15-2.el6 retrace-server-1.15-2.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-11 20:27:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch1 to fix this bug
none
patch2 to fix this bug
none
patch 1/3 to fix this bug
none
patch2/3 to fix this bug
none
patch 3/3 to fix this bug
none
patch 1/4 to fix this bug
none
patch 2/4 to fix this bug
none
patch 3/4 to fix this bug
none
patch 4/4 to fix this bug none

Description Dave Wysochanski 2016-03-05 11:16:42 UTC
Description of problem:
Kernel version detection works, but it's not broken out into a small testable step for each vmcore.  Furthermore, we call 'prepare_debuginfo' from too many places, and do not store the path to the vmlinux file anywhere.  Ideally we should:
1. Make kernel version detection one easily testable step from the commandline
2. Store path to the vmlinux file inside a RetraceTask file

Specifically, whenever a user runs 'retrace-server-interact <task> crash' we:
1. call rpm -ql on the kernel-debuginfo file
2. call 'crash -s' on the vmcore + vmlinux file
These steps are totally unnecessary and just overhead, since the kernel version has been detected and the kernel-debuginfo has been extracted prior to this.

Version-Release number of selected component (if applicable):
retrace-server-1.14-2.el6.noarch

Actual results:
prepare_debuginfo / kernel detection called from too many places and in a redundant way

Expected results:
kernel version detection should be done in one place, and code like prepare_debuginfo should not be called in a redundant way

Additional info:
This has been a problem for a while but it's becoming more of an issue for proper testing and maintenance so it's time to do this refactoring.  There is some risk here that kernel version detection could break but in the end the cleanup will be worth it.  Also once the refactoring is done we should be able to test this critical step much better.

I may just shoot for getting rid of the redundancy and storing the vmlinux file path.  I'm not sure about where to put the detection as a separate step but if I can work it out it may be part of this patchset.

Comment 1 Dave Wysochanski 2016-03-05 12:58:16 UTC
Created attachment 1133315 [details]
patch1 to fix this bug

Comment 2 Dave Wysochanski 2016-03-05 12:58:19 UTC
Created attachment 1133316 [details]
patch2 to fix this bug

Comment 3 Dave Wysochanski 2016-03-05 13:01:29 UTC
Ok I think I got it and it was not as hard as I feared.  

I measured the perf difference on an idle system and it was anywhere from 5 to 10 seconds on a typical vmcore to skip prepare_debuginfo in retrace-server-interact.

I'm testing these patches on top of patches for bug 1313011 and bug 1314897.  Assuming they test out ok, I'll put in another pull request on github.

Comment 4 Dave Wysochanski 2016-03-05 13:25:49 UTC
All patches testing out ok and pull request in.

Comment 5 Dave Wysochanski 2016-03-07 15:42:24 UTC
I am going to re-do these patches but keep the same content.
The second patch has too much in it - the hunks which remove the VMLINUX file should go in the first patch.  The hunks which move the prepare_debuginfo and strip_vmcore should be in their own patch.  Finally the code which fixes this bug and uses the vmlinux file should be in a 3rd patch to make it clear.

Comment 6 Dave Wysochanski 2016-03-07 18:54:58 UTC
Actually I found one small bug in the original patchset.  In retrace-server-interact, I omitted the call to task.set_crash_cmd(' '.join(crash_cmd)) after the call to prepare_debuginfo in the case when the vmlinux file does not exist.

I was also debating about the vmlinux file itself.  On one hand it would be more flexible if the vmlinux file was a symlink to the actual file, rather than a file containing the path.  Unfortunately that would imply that more people may call crash directly on the vmcore and vmlinux file, and this would have implications for our cleanup task which relies on the 'mtime' of the task directory:

retrace.py:

RetraceTask
...
    def get_age(self):
        """Returns the age of the task in hours."""
        return int(time.time() - os.path.getmtime(self._savedir)) / 3600


retrace-server-cleanup:

              if task.get_age() >= CONFIG["ArchiveTaskAfter"]:
                    log.write("Archiving task %s\n" % filename)
 ...
               if task.get_age() >= CONFIG["DeleteTaskAfter"]:
                    log.write("Deleting old task %s\n" % filename)
                    task.create_worker().remove_task()


I have a bug open to change the cleanup based on atime of the vmcore but that wasn't unanimous and is stalled.  For now will leave the vmlinux file containing the path, rather than making it a symlink.  Hopefully this will avoid problems with people running crash directly and fooling the cleanup script.  If we fix that other bug or people complain we can always change it to a symlinked file.

Comment 7 Dave Wysochanski 2016-03-07 19:28:56 UTC
Created attachment 1133895 [details]
patch 1/3 to fix this bug

Comment 8 Dave Wysochanski 2016-03-07 19:28:58 UTC
Created attachment 1133896 [details]
patch2/3 to fix this bug

Comment 9 Dave Wysochanski 2016-03-07 19:29:00 UTC
Created attachment 1133897 [details]
patch 3/3 to fix this bug

Comment 10 Dave Wysochanski 2016-03-07 19:40:21 UTC
All patches tested out ok, pull request in.

Comment 11 Dave Wysochanski 2016-03-11 15:32:30 UTC
I actually had to do a 3rd round of these patches.  Patrik found one small but significant omission in the 'create' method and after I did some testing, I found a second issue in strip_vmcore which was easily fixable.  Patches are posted now on github and pull request in.

Comment 12 Dave Wysochanski 2016-03-11 15:36:10 UTC
Created attachment 1135264 [details]
patch 1/4 to fix this bug

Comment 13 Dave Wysochanski 2016-03-11 15:36:12 UTC
Created attachment 1135265 [details]
patch 2/4 to fix this bug

Comment 14 Dave Wysochanski 2016-03-11 15:36:14 UTC
Created attachment 1135266 [details]
patch 3/4 to fix this bug

Comment 15 Dave Wysochanski 2016-03-11 15:36:16 UTC
Created attachment 1135267 [details]
patch 4/4 to fix this bug

Comment 16 Fedora Update System 2016-03-21 12:23:38 UTC
retrace-server-1.15-1.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-88382c37cc

Comment 17 Fedora Update System 2016-03-21 12:23:51 UTC
retrace-server-1.15-1.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-3fd6c52337

Comment 18 Fedora Update System 2016-03-21 12:24:00 UTC
retrace-server-1.15-1.el6 has been submitted as an update to Fedora EPEL 6. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-1082131b97

Comment 19 Fedora Update System 2016-03-21 12:24:08 UTC
retrace-server-1.15-1.el7 has been submitted as an update to Fedora EPEL 7. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-505f2e5c59

Comment 20 Fedora Update System 2016-03-21 22:31:19 UTC
retrace-server-1.15-1.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-3fd6c52337

Comment 21 Fedora Update System 2016-03-22 00:26:29 UTC
retrace-server-1.15-1.el7 has been pushed to the Fedora EPEL 7 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-505f2e5c59

Comment 22 Fedora Update System 2016-03-22 06:49:17 UTC
retrace-server-1.15-1.el6 has been pushed to the Fedora EPEL 6 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-1082131b97

Comment 23 Fedora Update System 2016-03-22 15:22:58 UTC
retrace-server-1.15-1.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-88382c37cc

Comment 24 Fedora Update System 2016-03-22 16:43:53 UTC
retrace-server-1.15-2.el6 has been submitted as an update to Fedora EPEL 6. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-1082131b97

Comment 25 Fedora Update System 2016-03-23 08:57:01 UTC
retrace-server-1.15-2.el7 has been submitted as an update to Fedora EPEL 7. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-505f2e5c59

Comment 26 Fedora Update System 2016-03-23 08:57:11 UTC
retrace-server-1.15-2.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-3fd6c52337

Comment 27 Fedora Update System 2016-03-23 08:58:40 UTC
retrace-server-1.15-2.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-88382c37cc

Comment 28 Fedora Update System 2016-03-23 19:56:58 UTC
retrace-server-1.15-2.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-3fd6c52337

Comment 29 Fedora Update System 2016-03-24 01:53:39 UTC
retrace-server-1.15-2.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-88382c37cc

Comment 30 Fedora Update System 2016-03-24 15:49:41 UTC
retrace-server-1.15-2.el6 has been pushed to the Fedora EPEL 6 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-1082131b97

Comment 31 Fedora Update System 2016-03-24 15:50:22 UTC
retrace-server-1.15-2.el7 has been pushed to the Fedora EPEL 7 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-505f2e5c59

Comment 32 Fedora Update System 2016-04-05 13:54:06 UTC
retrace-server-1.15-2.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.

Comment 33 Fedora Update System 2016-04-05 16:21:12 UTC
retrace-server-1.15-2.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.

Comment 34 Fedora Update System 2016-04-08 21:25:10 UTC
retrace-server-1.15-2.el6 has been pushed to the Fedora EPEL 6 stable repository. If problems still persist, please make note of it in this bug report.

Comment 35 Fedora Update System 2016-04-08 21:31:12 UTC
retrace-server-1.15-2.el7 has been pushed to the Fedora EPEL 7 stable repository. If problems still persist, please make note of it in this bug report.

Comment 36 Fedora Update System 2016-06-03 09:46:21 UTC
retrace-server-1.16-1.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-81eb6f786a

Comment 37 Fedora Update System 2016-06-04 18:26:30 UTC
retrace-server-1.16-1.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-81eb6f786a

Comment 38 Dave Wysochanski 2016-06-30 19:05:12 UTC
I verified this has been fixed in retrace-server-1.15-1.el6

Comment 39 Fedora Update System 2016-07-01 06:19:55 UTC
retrace-server-1.16-2.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-81eb6f786a

Comment 40 Fedora Update System 2016-07-02 20:31:27 UTC
retrace-server-1.16-2.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-81eb6f786a