Bug 1368807
Summary: | Avoid looking for kernel-debuginfo and possibly failing the task, if extracted vmlinux and kernel modules exist in local cache | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora EPEL | Reporter: | Dave Wysochanski <dwysocha> | ||||
Component: | retrace-server | Assignee: | Dave Wysochanski <dwysocha> | ||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | el6 | CC: | hmadhava, jberan, michal.toman, mmarusak, phelia | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | retrace-server-1.17.0-1.fc26 retrace-server-1.17.0-1.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-04-03 16:09:48 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Dave Wysochanski
2016-08-21 14:14:53 UTC
Looking a bit more I think this bug will definitely require refactoring of prepare_debuginfo() Not surprisingly, this might not be very easily fixable now that I look it at. Unfortunately retrace-server uses the path inside the kernel-debuginfo file to store the vmlinux file locally. This means even validation of the existence of the vmlinux file depends on the kernel-debuginfo file existing. def prepare_debuginfo(self, vmcore, chroot=None, kernelver=None, crash_cmd=["crash"]): log_info("Calling prepare_debuginfo with crash_cmd = " + str(crash_cmd)) if kernelver is None: kernelver = get_kernel_release(vmcore, crash_cmd) if kernelver is None: raise Exception, "Unable to determine kernel version" debuginfo = find_kernel_debuginfo(kernelver) <--------------- ideally we want to avoid this. We should be able to since we have the kernel version. if not debuginfo: raise Exception, "Unable to find debuginfo package" if "EL" in kernelver.release: if kernelver.flavour is None: pattern = "EL/vmlinux" else: pattern = "EL%s/vmlinux" % kernelver.flavour else: pattern = "/vmlinux" vmlinux_path = None debugfiles = {} child = Popen(["rpm", "-qpl", debuginfo], stdout=PIPE) lines = child.communicate()[0].splitlines() for line in lines: if line.endswith(pattern): vmlinux_path = line <------------- vmlinux_path depends on output from 'rpm -qlp' above continue match = KO_DEBUG_PARSER.match(line) if not match: continue # only pick the correct flavour for el4 if "EL" in kernelver.release: if kernelver.flavour is None: pattern2 = "EL/" else: pattern2 = "EL%s/" % kernelver.flavour if not pattern2 in os.path.dirname(line): continue # '-' in file name is transformed to '_' in module name debugfiles[match.group(1).replace("-", "_")] = line debugdir_base = os.path.join(CONFIG["RepoDir"], "kernel", kernelver.arch) if not os.path.isdir(debugdir_base): os.makedirs(debugdir_base) vmlinux = os.path.join(debugdir_base, vmlinux_path.lstrip("/")) <---- vmlinux_path depends on kernel-debuginfo if not os.path.isfile(vmlinux): <------------------ if vmlinux did not depend on vmlinux_path, we could move this check higher cache_files_from_debuginfo(debuginfo, debugdir_base, [vmlinux_path]) if not os.path.isfile(vmlinux): raise Exception, "Caching vmlinux failed" More I thought about this - it's possible earlier versions of crash required the path to the vmlinux or modules to match what was in the kernel-debuginfo file. Other that this, I'm not sure why retrace-server's local cache of the extracted kernel-debuginfo depends on the path format for the files in kernel-debuginfo. FWIW, this has potential to impact us now since eng-ops (IT) is moving the share where all the kernel-debuginfos are stored. As a result I think our production retrace-server may be offline during their outage even though most of the debuginfos used for incoming vmcores are already extracted. Created attachment 1230977 [details]
0001-Avoid-circular-dependency-on-kernel-debuginfo-for-vm.patch
Patch has been merged upstream which fixes the most common cases. Unfortunately older kernels (RHEL5, etc) are not found in the cache due to differences in kernel-debuginfo. I am looking into fixing this but right now I don't have another patch that fixes all cases. Ok so here's the remaining problems I see with the existing patch / code. I should have another patch soon to address all known cases. This code is a good example of why the detection code needs pulled out and tested separately. 1. With RHEL5, the 'Arch' is not present in the kernel-debuginfo path to vmlinux. Example: Where we look: 2017-02-22 04:22:11 Version: '2.6.18'; Release: '412.el5'; Arch: 'x86_64'; Flavour: 'None'; Realtime: False 2017-02-22 04:22:11 Unable to find cached vmlinux at path: /retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/2.6.18-412.el5.x86_64/vmlinux - searching for kernel-debuginfo package The correct path should be: /retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/2.6.18-412.el5/vmlinux 2. On RHEL5, we need to add the 'Flavour' for some unusual vmcores 2017-02-22 04:22:15 Version: '2.6.18'; Release: '194.el5'; Arch: 'i386'; Flavour: 'PAE'; Realtime: False 2017-02-22 04:22:15 Unable to find cached vmlinux at path: /retrace/repos/kernel/i386/usr/lib/debug/lib/modules/2.6.18-194.el5.i386.PAE/vmlinux - searching for kernel-debuginfo package The correct path should be: /retrace/repos/kernel/i386/usr/lib/debug/lib/modules/2.6.18-194.el5PAE/vmlinux 3. The code does some 'fixup' with the 'pattern' variable and the 'Flavour'. I'm not sure this is correct, since on RHEL4 we end up with: 2017-02-22 04:22:11 Version: '2.6.9'; Release: '89.0.7.EL'; Arch: 'i386'; Flavour: 'hugemem'; Realtime: False 2017-02-22 04:22:11 Unable to find cached vmlinux at path: /retrace/repos/kernel/i386/usr/lib/debug/lib/modules/2.6.9-89.0.7.EL.i386.hugememELhugemem/vmlinux - searching for kernel-debuginfo package The correct path should be: /retrace/repos/kernel/i386/usr/lib/debug/lib/modules/2.6.9-89.0.7.ELhugemem/vmlinux Again, we omit the 'Arch' on RHEL4, but we need the 'Flavour' Here's the odd piece of code inside prepare_debuginfo if "EL" in kernelver.release: if kernelver.flavour is None: pattern = "EL/vmlinux" <--- this looks wrong; won't we have "ELEL/vmlinux" for some vmcores? else: pattern = "EL%s/vmlinux" % kernelver.flavour else: pattern = "/vmlinux" Ok posted latest patch which handles all kernel-debuginfo variants https://github.com/abrt/retrace-server/pull/145 retrace-server-1.17.0-1.el7 has been submitted as an update to Fedora EPEL 7. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2017-3d55370e77 retrace-server-1.17.0-1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-ffb8a84c9c retrace-server-1.17.0-1.el6 has been submitted as an update to Fedora EPEL 6. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2017-9390d60e0d retrace-server-1.17.0-1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-ffb8a84c9c This is not fixed or at least can cause regressions where kernel modules are failing to get extracted. I'm not sure why I missed this earlier it is fairly obvious. If you have two vmcores from the same kernel, but they have a different series of modules loaded: vmcoreA: module1, module2 vmcoreB: module1, module2, module3, module4 If vmcoreA gets submitted first, the both the vmlinux and kernel modules will get extracted. When vmcoreB gets submitted, it will find the vmlinux file but incorrectly assume all modules exist and it returns early from prepare_debuginfo. The fix will be to avoid returning early, and look inside the cache area for any kernel modules, similar to the vmlinux file. I'll have to refactor the code for this. There is also a second problem I saw with 32-bit vmcores on 64-bit machine but it may be a subset of this problem. For more info, see https://bugzilla.redhat.com/show_bug.cgi?id=1437637 *** Bug 1437637 has been marked as a duplicate of this bug. *** retrace-server-1.17.0-1.el6 has been pushed to the Fedora EPEL 6 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2017-9390d60e0d retrace-server-1.17.0-1.el7 has been pushed to the Fedora EPEL 7 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2017-3d55370e77 retrace-server-1.17.0-1.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report. retrace-server-1.17.0-1.el7 has been pushed to the Fedora EPEL 7 stable repository. If problems still persist, please make note of it in this bug report. |