Description of problem: Consider the following set of source files: $ cat main.c #include <stdio.h> #include <dlfcn.h> int main(int argc, char **argv) { void *mod; void (*modf)(void); fprintf(stderr, "Hello world\n"); if (!(mod = dlopen("./mod.so", RTLD_LAZY | RTLD_LOCAL))) { fprintf(stderr, "dlopen(mod.so): %s\n", dlerror()); return 1; } modf = dlsym(mod, "func"); modf(); fprintf(stderr, "Goodbye\n"); return 0; } $ cat mod.c #include <stdio.h> #include "trace.h" void func(void) { fprintf(stderr, "In mod\n"); MAIN_MOD_EVENT(); fprintf(stderr, "Out mod\n"); } $ cat trace.dtrace provider main { probe mod_event(); }; Build them with the following set of commands $ dtrace -o trace.o -G -s trace.dtrace $ dtrace -o trace.h -h -s trace.dtrace $ gcc -shared -o mod.so mod.c -fPIC $ gcc -o main main.c -ldl trace.o -Wl,--export-dynamic Notice we're using the probe in mod.so, but we're linking the trace.o into the main binary instead of mod.so Historically this worked, *PROVIDED*, mod.so never used an "if (MAIN_MOD_EVENT_ENABLED())" conditional check. $ ./main Hello world In mod Out mod Goodbye Bulding with systemtap 4.4 this now fails $ ./main Hello world dlopen(mod.so): ./mod.so: undefined symbol: main_mod_event_semaphore IIUC, this behaviour change was caused by this upstream commit: commit 82b8e1a07a31bf37ed05d6ebc5162b054c0be9fd Author: William Cohen <wcohen> Date: Wed Aug 5 09:43:17 2020 -0400 Make dtrace generated code work with LTO (take 2) As now the SDT_PROBE() macro will always reference the semaphore, even if the caller never used the MAIN_MOD_EVENT_ENABLED() check. I note that '-Wl,--export-dynamic' doesn't have any effect on the probes in the dtrace generated .o file which is the root problem that prevents use of the _ENABLED() macros from modules historically. I don't know if there's any way to fix this with the way system creates the .o files in the separate .probes ELF section. I'm not sure whether I'd call this a bug in systemtap though, rather than a bug in the demonstration code. Arguably the trace.o really should have been linked to the mod.so instead of 'main' ELF binary, since it was always broken if using the _ENABLED() macros. The reason I raise this bug though, is that QEMU uses dlopen'd modules and always links all probes to its main binary not the modules. This is because we've not isolated probes related to modules from probes related to the main binary - they're all in the same .dtrace input file. If the only practical answer is that QEMU needs fixing, we'll accept that, but if you think systemtap should support this scenario explicitly that'd be great. Version-Release number of selected component (if applicable): systemtap-sdt-devel-4.4-0.20200805git82b8e1a07.fc33.x86_64
For reference, our original QEMU bug report was https://bugzilla.redhat.com/show_bug.cgi?id=1869216
Each semaphore variable definition has a hidden attribute. According to https://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attributes.html that allows different objects in the same shared object to be linked together, but not be visible outside the shared object. The way that the reproducer (and Qemu) is linking the trace.o outside the shared object looks like it violates that. The probe infrastructure is trying to keep those things hidden, so don't want to make them globally visible. I verified that moving the trace.o linking to the shared mod.so generation the reproducer works. However, I am not sure if everything is going to work if the same trace.o is linked into multiple shared libraries.
Can you please confirm that the 4.4-1 build in -testing or -stable corrects this for you?
I don't see any difference in behaviour from the demo program above with 4.4-1
I just tried to reproducer on x86_64 fc33 with systemtap-4.4-1 rpms installed. The one from koji built Mon 09 Nov 2020 10:29:24 PM EST. It ran without the "dlopen(mod.so): ./mod.so: undefined symbol: main_mod_event_semaphore" error: $ ./main Hello world In mod Out mod Goodbye Comparing the old copies of mod.so which had the problem and the mod.so generated by the current systemtap-4.4-1 func the current (working one) doesn't have a reference to main_mod_event_semaphore reference in func. Does the mod.so generated on your machine still have that reference in the "objdump -dl mod.so"? Could you provide more details about the environment? -Will
Yeah, the objdump output still shows the semaphore reference. What was the commit in systemtap upstream that's supposed to fix it ? I dont't see any updates in systemtap.git upstream for include/sys/sdt.h since the original commit that introduced the problem (82b8e1a07a31bf37ed05d6ebc5162b054c0be9fd) and the generated trace.h seems unchanged too. I'm on a F33 host with following present binutils-2.35-14.fc33.x86_64 gcc-10.2.1-6.fc33.x86_64 systemtap-4.4-1.fc33.x86_64 systemtap-client-4.4-1.fc33.x86_64 systemtap-devel-4.4-1.fc33.x86_64 systemtap-runtime-4.4-1.fc33.x86_64 systemtap-sdt-devel-4.4-1.fc33.x86_64
Okay, found that the reason things were suceeding on the one f33 machine was due to sdt.h and sdt-config.h in /usr/local/include/sys/ being used.
As mentioned in comment #2 QEMU is violating the visibility rules by linking in the .o with the actual semaphore variables outside of the shared library. This isn't fixable in systemtap. QEMU needs to change how it is doing the linking so that the semaphore variables are in the shared libraries.