Bug 1566422
Summary: | stap ERROR: Build-id mismatch with (re)inserted/removed kernel module probe | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Mark Wielaard <mjw> | ||||
Component: | systemtap | Assignee: | Frank Ch. Eigler <fche> | ||||
Status: | CLOSED ERRATA | QA Contact: | Martin Cermak <mcermak> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.4 | CC: | alanm, jbainbri, jistone, lberk, mcermak, mjw, smakarov | ||||
Target Milestone: | rc | Keywords: | ZStream | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | systemtap-3.2-8.el7.src.rpm | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1571845 (view as bug list) | Environment: | |||||
Last Closed: | 2018-10-30 10:46:00 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1571845 | ||||||
Attachments: |
|
Sequence of events appears to be as follows: When module has already been loaded: - modprobe igb - systemtap module loaded - notes_sect calculated to be at 0xffffffffc0792000 by _stp_kmodule_update_address (called via _stp_do_relocation) - _stp_module_check is called and correctly reads notes_sect, 'theory' and 'practice' build-ids match - rmmod igb; modprobe igb - second time, _stp_kmodule_update_address is not called before _stp_kmodule_check - notes_sect is still recorded as being at 0xffffffffc0792000 - _stp_kmodule_check performs invalid memory access to that location, sets 'practice' to 0x00, comparison vs. 'theory' (0x8f) fails When module has not been loaded: - systemtap module loaded - since igb is not loaded, notes_sect for igb is set to 0x0 - modprobe igb - _stp_kmodule_update_address is not called - _stp_[k]module_check are called (via if(MODULE_STATE_COMING) branch in _stp_module_notifier), but since notes_sect is 0x0, _stp_build_id_check is never called (not sure if this behaviour is correct or not) - can continue to load and unload igb without incident, notes_sect remains set to 0x0 Created attachment 1422702 [details]
suggested patch
Suggested patch for fixing this problem. Will push upstream once I'm more certain that clearing the section addresses in _stp_module_notifier is the right thing to do (since these addresses were originally populated by a call from _stp_do_relocation/_stp_ctl_write_cmd rather than by _stp_module_notifier).
This patch looks good. It makes everything work out of the box for me. I have added it to the fedora 28 systemtap build: http://koji.fedoraproject.org/koji/buildinfo?buildID=1073377 There is also a copr build I did for another unwinding bug: https://copr.fedorainfracloud.org/coprs/mjw/systemtap-fallback-unwind/ With EPEL yum-plugin-copr it can be enabled and installed with: $ yum install yum-plugin-copr $ yum copr enable mjw/systemtap-fallback-unwind $ yum upgrade systemtap The result is of course totally unsupported, untested and might totally break your system. But it might be a simple way to test this fix. systemtap-3.2-11.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-8b5ad21c11 systemtap-3.2-11.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-b6ed279f75 systemtap-3.2-11.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-b6ed279f75 systemtap-3.2-11.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-8b5ad21c11 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:3168 |
Description of problem: Setting a probe on an loaded kernel module and then reinserting it (to trigger a init or exit probe) will ERROR out because of a Build-id mismatch. Version-Release number of selected component (if applicable): On 7.4: systemtap-3.1-5.el7_4.x86_64 kernel-3.10.0-693.21.1.el7.x86_64 kernel-debuginfo-3.10.0-693.21.1.el7.x86_64 And also replicated against 7.5: systemtap-3.2-4.el7.x86_64 kernel-3.10.0-862.el7.x86_64 kernel-debuginfo-3.10.0-862.el7.x86_64 How reproducible: Always, if module was already inserted. Never, if module was not inserted yet. Steps to Reproduce: 1. sudo modprobe igb 2. stap -e 'probe module("igb").function("igb_*_module") { printf("%s: %s.\n", ctime(gettimeofday_s()), ppfunc()); } 3. sudo rmmod igb 4. sudo modprobe igb Actual results: Tue Apr 10 09:34:20 2018: igb_exit_module. ERROR: Build-id mismatch [man error::buildid]: "/usr/lib/modules/3.10.0-693.21.1.el7.x86_64/kernel/drivers/net/ethernet/intel/igb/igb.ko.xz" byte 0 (0x9c vs 0x00) address 0xffffffffc0a97010 rc -14 Tue Apr 10 09:34:22 2018: igb_init_module. Note that the probe still hits... But the error triggers and takes out the stap script. Expected results: When switching step 1 and 2. Or explicitly doing a sudo rmmod igb before the stap command to make sure the igb kernel module isn't loaded before the stap script starts. > sudo modprobe igb > Tue Apr 10 09:32:34 2018: igb_init_module. > sudo rmmod igb > Tue Apr 10 09:32:37 2018: igb_exit_module. > sudo modprobe igb > Tue Apr 10 09:32:39 2018: igb_init_module. > sudo rmmod igb > Tue Apr 10 09:32:41 2018: igb_exit_module. Additional info: So for some reason if the kernel module was already inserted and get reinserted when the stap script runs a (bogus) build-id mismatch error occurs.