Bug 1566422

Summary: stap ERROR: Build-id mismatch with (re)inserted/removed kernel module probe
Product: Red Hat Enterprise Linux 7 Reporter: Mark Wielaard <mjw>
Component: systemtapAssignee: Frank Ch. Eigler <fche>
Status: CLOSED ERRATA QA Contact: Martin Cermak <mcermak>
Severity: unspecified Docs Contact:
Priority: high    
Version: 7.4CC: alanm, jbainbri, jistone, lberk, mcermak, mjw, smakarov
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: systemtap-3.2-8.el7.src.rpm Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1571845 (view as bug list) Environment:
Last Closed: 2018-10-30 10:46:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1571845    
Attachments:
Description Flags
suggested patch none

Description Mark Wielaard 2018-04-12 09:50:22 UTC
Description of problem:

Setting a probe on an loaded kernel module and then reinserting it (to trigger a init or exit probe) will ERROR out because of a Build-id mismatch.

Version-Release number of selected component (if applicable):

On 7.4:
systemtap-3.1-5.el7_4.x86_64
kernel-3.10.0-693.21.1.el7.x86_64
kernel-debuginfo-3.10.0-693.21.1.el7.x86_64

And also replicated against 7.5:

systemtap-3.2-4.el7.x86_64
kernel-3.10.0-862.el7.x86_64
kernel-debuginfo-3.10.0-862.el7.x86_64

How reproducible:

Always, if module was already inserted.
Never, if module was not inserted yet.

Steps to Reproduce:
1. sudo modprobe igb
2. stap -e 'probe module("igb").function("igb_*_module") { printf("%s: %s.\n", ctime(gettimeofday_s()), ppfunc()); }
3. sudo rmmod igb
4. sudo modprobe igb

Actual results:

Tue Apr 10 09:34:20 2018: igb_exit_module.
ERROR: Build-id mismatch [man error::buildid]: "/usr/lib/modules/3.10.0-693.21.1.el7.x86_64/kernel/drivers/net/ethernet/intel/igb/igb.ko.xz" byte 0 (0x9c vs 0x00) address 0xffffffffc0a97010 rc -14
Tue Apr 10 09:34:22 2018: igb_init_module.

Note that the probe still hits... But the error triggers and takes out
the stap script.

Expected results:

When switching step 1 and 2. Or explicitly doing a sudo rmmod igb before the stap command to make sure the igb kernel module isn't loaded before the stap script starts.

> sudo modprobe igb
> Tue Apr 10 09:32:34 2018: igb_init_module.
> sudo rmmod igb
> Tue Apr 10 09:32:37 2018: igb_exit_module.
> sudo modprobe igb
> Tue Apr 10 09:32:39 2018: igb_init_module.
> sudo rmmod igb
> Tue Apr 10 09:32:41 2018: igb_exit_module.

Additional info:

So for some reason if the kernel module was already inserted and get reinserted when the stap script runs a (bogus) build-id mismatch error occurs.

Comment 3 Serguei Makarov 2018-04-16 17:29:36 UTC
Sequence of events appears to be as follows:

When module has already been loaded:
- modprobe igb
- systemtap module loaded
- notes_sect calculated to be at 0xffffffffc0792000 by _stp_kmodule_update_address (called via _stp_do_relocation)
- _stp_module_check is called and correctly reads notes_sect, 'theory' and 'practice' build-ids match
- rmmod igb; modprobe igb
- second time, _stp_kmodule_update_address is not called before _stp_kmodule_check
- notes_sect is still recorded as being at 0xffffffffc0792000
- _stp_kmodule_check performs invalid memory access to that location, sets 'practice' to 0x00, comparison vs. 'theory' (0x8f) fails

When module has not been loaded:
- systemtap module loaded
- since igb is not loaded, notes_sect for igb is set to 0x0
- modprobe igb
- _stp_kmodule_update_address is not called
- _stp_[k]module_check are called (via if(MODULE_STATE_COMING) branch in _stp_module_notifier), but since notes_sect is 0x0, _stp_build_id_check is never called (not sure if this behaviour is correct or not)
- can continue to load and unload igb without incident, notes_sect remains set to 0x0

Comment 4 Serguei Makarov 2018-04-16 19:19:03 UTC
Created attachment 1422702 [details]
suggested patch

Suggested patch for fixing this problem. Will push upstream once I'm more certain that clearing the section addresses in _stp_module_notifier is the right thing to do (since these addresses were originally populated by a call from _stp_do_relocation/_stp_ctl_write_cmd rather than by _stp_module_notifier).

Comment 5 Mark Wielaard 2018-04-18 15:46:27 UTC
This patch looks good. It makes everything work out of the box for me.
I have added it to the fedora 28 systemtap build:
http://koji.fedoraproject.org/koji/buildinfo?buildID=1073377

There is also a copr build I did for another unwinding bug:
https://copr.fedorainfracloud.org/coprs/mjw/systemtap-fallback-unwind/

With EPEL yum-plugin-copr it can be enabled and installed with:
$ yum install yum-plugin-copr
$ yum copr enable mjw/systemtap-fallback-unwind
$ yum upgrade systemtap

The result is of course totally unsupported, untested and might totally break your system. But it might be a simple way to test this fix.

Comment 6 Fedora Update System 2018-04-20 15:18:34 UTC
systemtap-3.2-11.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-8b5ad21c11

Comment 7 Fedora Update System 2018-04-20 16:29:15 UTC
systemtap-3.2-11.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-b6ed279f75

Comment 8 Fedora Update System 2018-04-21 05:03:35 UTC
systemtap-3.2-11.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-b6ed279f75

Comment 9 Fedora Update System 2018-04-21 18:38:51 UTC
systemtap-3.2-11.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-8b5ad21c11

Comment 16 errata-xmlrpc 2018-10-30 10:46:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:3168