Bug 1869642 - Probes always reference the semaphore even if caller doesn't use it, breaking QEMU modules.
Summary: Probes always reference the semaphore even if caller doesn't use it, breaking...
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: systemtap
Version: 33
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Frank Ch. Eigler
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-18 12:30 UTC by Daniel Berrangé
Modified: 2020-11-17 19:48 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-17 19:48:47 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Daniel Berrangé 2020-08-18 12:30:43 UTC
Description of problem:

Consider the following set of source files:

$ cat main.c
#include <stdio.h>
#include <dlfcn.h>

int main(int argc, char **argv) {
  void *mod;
  void (*modf)(void);
  
  fprintf(stderr, "Hello world\n");

  if (!(mod = dlopen("./mod.so", RTLD_LAZY | RTLD_LOCAL))) {
    fprintf(stderr, "dlopen(mod.so): %s\n", dlerror());
    return 1;
  }

  modf = dlsym(mod, "func");

  modf();

  fprintf(stderr, "Goodbye\n");
  return 0;
}


$ cat mod.c
#include <stdio.h>
#include "trace.h"
void func(void) {
  fprintf(stderr, "In mod\n");
  MAIN_MOD_EVENT();
  fprintf(stderr, "Out mod\n");
}



$ cat trace.dtrace 
provider main {
    probe mod_event();
};


Build them with the following set of commands


$ dtrace -o trace.o -G -s trace.dtrace
$ dtrace -o trace.h -h -s trace.dtrace
$ gcc -shared -o mod.so mod.c -fPIC
$ gcc -o main main.c -ldl trace.o -Wl,--export-dynamic


Notice we're using the probe in mod.so, but we're linking the trace.o into the main binary instead of mod.so

Historically this worked, *PROVIDED*, mod.so never used an "if (MAIN_MOD_EVENT_ENABLED())"  conditional check.

$ ./main 
Hello world
In mod
Out mod
Goodbye


Bulding with systemtap 4.4 this now fails

$ ./main 
Hello world
dlopen(mod.so): ./mod.so: undefined symbol: main_mod_event_semaphore


IIUC, this behaviour change was caused by this upstream commit:

commit 82b8e1a07a31bf37ed05d6ebc5162b054c0be9fd
Author: William Cohen <wcohen>
Date:   Wed Aug 5 09:43:17 2020 -0400

    Make dtrace generated code work with LTO (take 2)


As now the SDT_PROBE() macro will always reference the semaphore, even if the caller never used the MAIN_MOD_EVENT_ENABLED() check.

I note that  '-Wl,--export-dynamic' doesn't have any effect on the probes in the dtrace generated .o file which is the root problem that prevents use of the _ENABLED() macros from modules historically. I don't know if there's any way to fix this with the way system creates the .o files in the separate .probes ELF section.

I'm not sure whether I'd call this a bug in systemtap though, rather than a bug in the demonstration code.

Arguably the trace.o really should have been linked to the mod.so instead of 'main' ELF binary, since it was always broken if using the _ENABLED() macros.

The reason I raise this bug though, is that QEMU uses dlopen'd modules and always links all probes to its main binary not the modules. This is because we've not isolated probes related to modules from probes related to the main binary - they're all in the same .dtrace input file.


If the only practical answer is that QEMU needs fixing, we'll accept that, but if you think systemtap should support this scenario explicitly that'd be great.


Version-Release number of selected component (if applicable):
systemtap-sdt-devel-4.4-0.20200805git82b8e1a07.fc33.x86_64

Comment 1 Daniel Berrangé 2020-08-18 12:32:22 UTC
For reference, our original QEMU bug report was https://bugzilla.redhat.com/show_bug.cgi?id=1869216

Comment 2 William Cohen 2020-08-18 18:40:54 UTC
Each semaphore variable definition has a hidden attribute.  According to https://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attributes.html  that allows different objects in the same shared object to be linked together, but not be visible outside the shared object.  The way that the reproducer (and Qemu) is linking the trace.o outside the shared object looks like it violates that.  The probe infrastructure is trying to keep those things hidden, so don't want to make them globally visible.

I verified that moving the trace.o linking to the shared mod.so generation the reproducer works.  However, I am not sure if everything is going to work if the same trace.o is linked into multiple shared libraries.

Comment 3 Frank Ch. Eigler 2020-11-16 23:51:12 UTC
Can you please confirm that the 4.4-1 build in -testing or -stable corrects this for you?

Comment 4 Daniel Berrangé 2020-11-17 09:40:57 UTC
I don't see any difference in behaviour from the demo program above with 4.4-1

Comment 5 William Cohen 2020-11-17 17:11:33 UTC
I just tried to reproducer on x86_64 fc33 with systemtap-4.4-1 rpms installed.  The one from koji built Mon 09 Nov 2020 10:29:24 PM EST.  It ran without the "dlopen(mod.so): ./mod.so: undefined symbol: main_mod_event_semaphore" error:

$ ./main 
Hello world
In mod
Out mod
Goodbye

Comparing the old copies of mod.so which had the problem and the mod.so generated by the current systemtap-4.4-1 func the current (working one) doesn't have a reference to main_mod_event_semaphore reference in func.  Does the mod.so generated on your machine still have that reference in the "objdump -dl mod.so"?  Could you provide more details about the environment?

-Will

Comment 6 Daniel Berrangé 2020-11-17 17:36:11 UTC
Yeah, the objdump output still shows the semaphore reference.

What was the commit in systemtap upstream that's supposed to fix it ? I dont't see any updates in systemtap.git upstream for include/sys/sdt.h since the original commit that introduced the problem (82b8e1a07a31bf37ed05d6ebc5162b054c0be9fd)  and the generated trace.h seems unchanged too.

I'm on a F33 host with following present

binutils-2.35-14.fc33.x86_64
gcc-10.2.1-6.fc33.x86_64
systemtap-4.4-1.fc33.x86_64
systemtap-client-4.4-1.fc33.x86_64
systemtap-devel-4.4-1.fc33.x86_64
systemtap-runtime-4.4-1.fc33.x86_64
systemtap-sdt-devel-4.4-1.fc33.x86_64

Comment 7 William Cohen 2020-11-17 19:29:32 UTC
Okay, found that the reason things were suceeding on the one f33 machine was due to sdt.h and sdt-config.h in /usr/local/include/sys/ being used.

Comment 8 William Cohen 2020-11-17 19:48:47 UTC
As mentioned in comment #2 QEMU is violating the visibility rules by linking in the .o with the actual semaphore variables outside of the shared library.  This isn't fixable in systemtap.  QEMU needs to change how it is doing the linking so that the semaphore variables are in the shared libraries.


Note You need to log in before you can comment on or make changes to this bug.