Bug 1087623 - kernel tracepoints have become unregisterable from modules as of 3.15-rc1
Summary: kernel tracepoints have become unregisterable from modules as of 3.15-rc1
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: systemtap
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
Assignee: Josh Stone
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-04-14 20:48 UTC by Frank Ch. Eigler
Modified: 2014-05-03 19:30 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-05-01 20:52:24 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Frank Ch. Eigler 2014-04-14 20:48:34 UTC
The following commit has changed the tracepoint registration API
to require link-time references to the __tracepoint_foo_bar symbols.
But these are not SYMBOL_EXPORT'd, so modules such as stap's can't
use them.  Please add a SYMBOL_EXPORT_GPL or equivalent for all of
them.  (stap is particularly crippled by sys_* and sched_process_*
effectively going away.)


commit de7b2973903c6cc50b31ee5682a69b2219b9919d
Author: Mathieu Desnoyers <mathieu.desnoyers>
Date:   Tue Apr 8 17:26:21 2014 -0400

    tracepoint: Use struct pointer instead of name hash for reg/unreg tracepoints
    
    Register/unregister tracepoint probes with struct tracepoint pointer
    rather than tracepoint name.

Comment 1 Mathieu Desnoyers 2014-04-15 03:35:30 UTC
This is why there is a new

for_each_kernel_tracepoint()

interface, exported with EXPORT_SYMBOL_GPL. You can use it to iterate on each tracepoint defined within the core kernel, without having to rely on their symbol being exported.

The module tracepoints can be monitored using:

register_tracepoint_module_notifier()
unregister_tracepoint_module_notifier()

Please let me know if those interfaces do not fulfill your needs. FWIW, they are sufficient for LTTng to perform the same things it was doing in prior versions of the Linux kernel.

Thanks,

Mathieu

Comment 2 Josh Boyer 2014-04-15 13:47:26 UTC
Mathieu, if I'm understanding the situation correctly the change in question took previously working userspace and broke it.  While providing a new for_each_kernel_tracepoint might be a solution going forward, I don't see how userspace that was working with the old API would continue to work.  It seems that a userspace code change would be required, and that userspace would then need to check the kernel version to figure out which API to use.

Is that an accurate assessment of the situation?  If so, that would seem to run afoul of the "don't break userspace" mantra that Linus so clearly pounds into people quite often...

Comment 3 Frank Ch. Eigler 2014-04-15 13:54:32 UTC
Josh, it's not a userspace API being broken here, but kernel-space one.

OTOH, this aspect of the change was not noted in the LKML thread, so
came as a surprise.  There are no in-tree users for that iteration
function Mathieu wrote, and indeed git lttng-modules doesn't use
it either (yet), so it's hard to justify relying on it.

(Note that the EXPORT_TRACEPOINT_SYMBOL* mechanism is completely
mooted by the iteration function; it was just as moot before with
the named tracepoint registration.)

Comment 4 Josh Boyer 2014-04-15 14:08:56 UTC
Ah, my mistake.  Thanks for the clarification.

Comment 5 Mathieu Desnoyers 2014-04-15 16:26:53 UTC
Hi Frank,

I had the commit waiting in a dev branch until the tracepoint API change would make its way into the mainline kernel. I just pushed it to lttng-modules, see

commit 20591cf76d5dfd42d61a18bd494b40a70a04dd59
Author: Mathieu Desnoyers <mathieu.desnoyers>
Date:   Wed Mar 19 23:12:32 2014 -0400

    Adapt to 3.15 tracepoint API
    
    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers>

I had tested those functions against this commit before submitting the tracepoint API change for inclusion.

Thanks,

Mathieu

Comment 6 Josh Boyer 2014-04-23 15:05:31 UTC
After someone else posted patches to export the symbols upstream, the consensus seems to be that they don't need to be exported because there are no in-tree users and that external modules should use the new API.

Frank, do you want me to reassign this to systemtap to track the changes needed, or should I just close it out?

Comment 7 Frank Ch. Eigler 2014-04-23 15:07:53 UTC
Thanks for checking, Josh.  Systemtap will adapt to these changes.
The other Josh has a prototype based on the lttng code Mathieu referred to above.

Comment 8 Josh Stone 2014-05-01 20:52:24 UTC
Mathieu, thanks for the shared code!  SystemTap 2.5 is fine now.

(I find it a bit silly that basically all tracepoint clients will need this kind of bookkeeping, but at least we're few in number...)

Comment 9 Mathieu Desnoyers 2014-05-03 19:30:53 UTC
You're welcome!

The reasoning behind this change is that perf and ftrace (both in kernel) can use symbols to interact with tracepoints. It's easy for those, since they are built with the kernel and can modify the TRACE_EVENT macros to generate what they need.

So it's pretty much in the use-case where tracer modules (both in-tree and out of tree) want to interact with tracepoints in a way that allow enabling events even if the modules containing those events are not yet loaded that we need those hash tables. However, perf and ftrace don't care about this use-case.

Therefore, it seems like a good tradeoff to shrink the size of the kernel image for the common case (upstream perf and ftrace), and take a small space hit within tracer modules (but only when they are loaded).

Thanks,

Mathieu


Note You need to log in before you can comment on or make changes to this bug.