Bug 2230079 - systemtap scripts fail to compile with conflicting types error on kallsyms_on_each_symbol()
Summary: systemtap scripts fail to compile with conflicting types error on kallsyms_on...
Keywords:
Status: MODIFIED
Alias: None
Product: Fedora
Classification: Fedora
Component: systemtap
Version: 38
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Frank Ch. Eigler
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-08 16:24 UTC by Chung
Modified: 2023-08-17 15:51 UTC (History)
7 users (show)

Fixed In Version: systemtap-4.9-2.fc38
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:


Attachments (Terms of Use)
kernel oops triggered by systemtap-4.9-2.fc38.x86_64 (7.69 KB, text/plain)
2023-08-11 04:04 UTC, David Gibson
no flags Details

Description Chung 2023-08-08 16:24:03 UTC
Systemtap failed to run with error:

/usr/share/systemtap/runtime/sym.c:1159:5: error: conflicting types for ‘kallsyms_on_each_symbol’; have ‘int(int (*)(void *, const char *, struct module *, long unsigned int), void *)’
 1159 | int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
      |     ^~~~~~~~~~~~~~~~~~~~~~~


Reproducible: Always

Steps to Reproduce:
1. Get a F38 System.

2.  Create a file with the following content:

cat > simple-test.stp
# Script to test basic System Tap functionality.

global tickCounter = 0;
global vmallocCounter = 0;

function sayHello()
%{
    printk("systemtap script says hello\n");
%}

probe begin
{
    sayHello();
    printf("hello\n");
}

probe timer.ms(100)
{
    tickCounter++;
}

function sayGoodbye()
%{
    printk("systemtap script says goodbye\n");
%}

// Force use of some basic debug info.
probe kernel.function("vmalloc")
{
    vmallocCounter++;
}

probe end
{
    sayGoodbye();
    printf("counter = %d\nvmalloc = %d\nbye!\n", tickCounter, vmallocCounter);
}


3. run the following commands:

# stap -v -F -o out.log -g --disable-cache  ./simple-test.stp

Actual Results:  
Pass 1: parsed user script and 486 library scripts using 135860virt/107056res/15488shr/90928data kb, in 170usr/40sys/444real ms.
Pass 2: analyzed script: 4 probes, 2 functions, 0 embeds, 2 globals using 299308virt/201360res/20784shr/188804data kb, in 6030usr/1270sys/160938real ms.
Pass 3: translated to C into "/tmp/stapA4qqSi/stap_723_src.c" using 299328virt/201552res/20976shr/188824data kb, in 120usr/60sys/191real ms.
In file included from /usr/share/systemtap/runtime/linux/runtime.h:288,
                 from /usr/share/systemtap/runtime/runtime.h:26,
                 from /tmp/stapA4qqSi/stap_723_src.c:21:
/usr/share/systemtap/runtime/sym.c:1159:5: error: conflicting types for ‘kallsyms_on_each_symbol’; have ‘int(int (*)(void *, const char *, struct module *, long unsigned int), void *)’
 1159 | int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
      |     ^~~~~~~~~~~~~~~~~~~~~~~
In file included from ./include/linux/ftrace.h:13,
                 from ./include/linux/kprobes.h:28,
                 from /usr/share/systemtap/runtime/linux/runtime.h:21:
./include/linux/kallsyms.h:70:5: note: previous declaration of ‘kallsyms_on_each_symbol’ with type ‘int(int (*)(void *, const char *, long unsigned int), void *)’
   70 | int kallsyms_on_each_symbol(int (*fn)(void *, const char *, unsigned long),
      |     ^~~~~~~~~~~~~~~~~~~~~~~
/usr/share/systemtap/runtime/sym.c: In function ‘kallsyms_on_each_symbol’:
/usr/share/systemtap/runtime/sym.c:1166:85: error: passing argument 1 of ‘(int (*)(int (*)(void *, const char *, long unsigned int), void *))_stp_kallsyms_on_each_symbol’ from incompatible pointer type [-Werror=incompatible-pointer-types]
 1166 |                 return (* (kallsyms_on_each_symbol_fn)_stp_kallsyms_on_each_symbol)(fn, data);
      |                                                                                     ^~
      |                                                                                     |
      |                                                                                     int (*)(void *, const char *, struct module *, long unsigned int)
/usr/share/systemtap/runtime/sym.c:1166:85: note: expected ‘int (*)(void *, const char *, long unsigned int)’ but argument is of type ‘int (*)(void *, const char *, struct module *, long unsigned int)’
In file included from ./include/linux/kernel.h:30,
                 from ./arch/x86/include/asm/percpu.h:27,
                 from ./arch/x86/include/asm/preempt.h:6,
                 from ./include/linux/preempt.h:78,
                 from ./include/linux/spinlock.h:56,
                 from ./include/linux/mmzone.h:8,
                 from ./include/linux/gfp.h:7,
                 from /usr/share/systemtap/runtime/linux/runtime_defines.h:20,
                 from /usr/share/systemtap/runtime/runtime_defines.h:8,
                 from /tmp/stapA4qqSi/stap_723_src.c:12:
/usr/share/systemtap/runtime/linux/print.c: In function ‘_stp_print_kernel_info’:
/usr/share/systemtap/runtime/linux/print.c:365:43: error: ‘struct module’ has no member named ‘module_core’
  365 |                (unsigned long) THIS_MODULE->module_core,
      |                                           ^~
./include/linux/printk.h:427:33: note: in definition of macro ‘printk_index_wrap’
  427 |                 _p_func(_fmt, ##__VA_ARGS__);                           \
      |                                 ^~~~~~~~~~~
/usr/share/systemtap/runtime/linux/print.c:348:9: note: in expansion of macro ‘printk’
  348 |         printk(KERN_DEBUG
      |         ^~~~~~
/usr/share/systemtap/runtime/linux/print.c:366:44: error: ‘struct module’ has no member named ‘core_size’
  366 |                (unsigned long) (THIS_MODULE->core_size - THIS_MODULE->core_text_size)/1024,
      |                                            ^~
./include/linux/printk.h:427:33: note: in definition of macro ‘printk_index_wrap’
  427 |                 _p_func(_fmt, ##__VA_ARGS__);                           \
      |                                 ^~~~~~~~~~~
/usr/share/systemtap/runtime/linux/print.c:348:9: note: in expansion of macro ‘printk’
  348 |         printk(KERN_DEBUG
      |         ^~~~~~
/usr/share/systemtap/runtime/linux/print.c:366:71: error: ‘struct module’ has no member named ‘core_text_size’; did you mean ‘kprobes_text_size’?
  366 |                (unsigned long) (THIS_MODULE->core_size - THIS_MODULE->core_text_size)/1024,
      |                                                                       ^~~~~~~~~~~~~~
./include/linux/printk.h:427:33: note: in definition of macro ‘printk_index_wrap’
  427 |                 _p_func(_fmt, ##__VA_ARGS__);                           \
      |                                 ^~~~~~~~~~~
/usr/share/systemtap/runtime/linux/print.c:348:9: note: in expansion of macro ‘printk’
  348 |         printk(KERN_DEBUG
      |         ^~~~~~
/usr/share/systemtap/runtime/linux/print.c:367:46: error: ‘struct module’ has no member named ‘core_text_size’; did you mean ‘kprobes_text_size’?
  367 |                (unsigned long) (THIS_MODULE->core_text_size)/1024,
      |                                              ^~~~~~~~~~~~~~
./include/linux/printk.h:427:33: note: in definition of macro ‘printk_index_wrap’
  427 |                 _p_func(_fmt, ##__VA_ARGS__);                           \
      |                                 ^~~~~~~~~~~
/usr/share/systemtap/runtime/linux/print.c:348:9: note: in expansion of macro ‘printk’
  348 |         printk(KERN_DEBUG
      |         ^~~~~~
cc1: all warnings being treated as errors
make[1]: *** [scripts/Makefile.build:252: /tmp/stapA4qqSi/stap_723_src.o] Error 1


Expected Results:  
No error.

Comment 1 Frank Ch. Eigler 2023-08-08 18:24:04 UTC
Please share your version of stap, and dnf-update if you haven't already.

Comment 2 William Cohen 2023-08-08 19:06:34 UTC
I can reproduce this with systemtap-4.9-1.fc38.x86_64 and kernel-6.4.7-200.fc38.x86_64.  This looks like some of the fixes for the 6.4 kernels need to be backported to the fedora 38 (and 37) systemtap packages.   Looking through the git commits should need: 

33fae2d0107fb6166b4eac3fdffd277829849ab0
5251b3060790faafa9f94c14801baaa76a2bf8ea
fc6519089d3f9366470ce442b648d69ed9b56f53
56054abb4efb3ef95808306b2f22339ab5c96352

might also need the following if complains about "zero length arrays"
788c58ced532537b87f596355d3e9b6dec30e61a

As a quick test did a local rpm build of the current systemtap git repo checkout, systemtap-5.0-0.1.202308081455.fc38.x86_64, installed it and run the reproducer.  It ran without issue:

[wcohen@fedora38 bz2230079]$ rpm -q systemtap kernel
systemtap-5.0-0.1.202308081455.fc38.x86_64
kernel-6.3.11-200.fc38.x86_64
kernel-6.4.7-200.fc38.x86_64
[wcohen@fedora38 bz2230079]$ sudo stap -v -F -o out.log -g --disable-cache  ./simple-test.stp
Pass 1: parsed user script and 535 library scripts using 532936virt/289256res/14848shr/273804data kb, in 610usr/80sys/692real ms.
Pass 2: analyzed script: 4 probes, 2 functions, 0 embeds, 2 globals using 632092virt/389468res/16476shr/372960data kb, in 1290usr/30sys/1340real ms.
Pass 3: translated to C into "/tmp/stapFxAjgw/stap_40645_src.c" using 632092virt/389596res/16604shr/372960data kb, in 10usr/10sys/12real ms.
Pass 4: compiled C into "stap_40645.ko" in 18950usr/2470sys/11794real ms.
Pass 5: starting run.
Pass 5: run completed in 10usr/20sys/29real ms.
41504

Comment 3 Chung 2023-08-08 19:19:38 UTC
The systemtap version is 

systemtap.x86_64                        4.9-1.fc38 

I make sure the kernel and debug version are the same.  6.4.7

chuung

Comment 4 David Gibson 2023-08-10 03:28:25 UTC
I have encountered the same problem with:
    systemtap-4.9-1.fc38.x86_64
    kernel-6.4.8-200.fc38.x86_64
    kernel-devel-6.4.8-200.fc38.x86_64
    kernel-debuginfo-6.4.8-200.fc38.x86_64

I think it will affect all scripts, since I see it even with this trivial one:

probe begin {
  printf("BEGIN\n");
}

probe end {
  printf("END\n");
}


Raising severity to "high", since AFAICT this makes systemtap unusable on the current kernel.

Comment 5 William Cohen 2023-08-10 15:17:51 UTC
I am working on getting f37/f38 systemtap rpms with patches mentioned for linux-6.4 support.

Comment 6 William Cohen 2023-08-10 17:05:04 UTC
Bohdi updates filed for systemtap builds with the fixes:
f38 https://bodhi.fedoraproject.org/updates/FEDORA-2023-4617bf01d3
f37 https://bodhi.fedoraproject.org/updates/FEDORA-2023-1fb197c4ff

Comment 7 David Gibson 2023-08-11 04:03:28 UTC
Alas, the version from comment 6 does not appear to work correctly either.  Indeed, it's arguably worse with problems now showing up on the kernel side.

I no longer get a compile error, but running 'sudo stap trivial.stp' hangs before printing the "BEGIN" message.  Weirdly strace indicates that stap is in wait4() waiting on a specific PID, and ps shows that process (stapio) already in zombie state.

It also triggers a kernel oops, I'll attach the trace.

This occurs with:
    systemtap-runtime-4.9-2.fc38.x86_64
    systemtap-client-4.9-2.fc38.x86_64
    systemtap-devel-4.9-2.fc38.x86_64
    systemtap-4.9-2.fc38.x86_64
    kernel-modules-core-6.4.9-200.fc38.x86_64
    kernel-core-6.4.9-200.fc38.x86_64
    kernel-modules-6.4.9-200.fc38.x86_64
    kernel-debuginfo-common-x86_64-6.4.9-200.fc38.x86_64
    kernel-debuginfo-6.4.9-200.fc38.x86_64
    kernel-6.4.9-200.fc38.x86_64
    kernel-modules-extra-6.4.9-200.fc38.x86_64
    kernel-devel-6.4.9-200.fc38.x86_64

$ uname -a
Linux zatzit 6.4.9-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Aug  8 21:21:11 UTC 2023 x86_64 GNU/Linux

Comment 8 David Gibson 2023-08-11 04:04:53 UTC
Created attachment 1982902 [details]
kernel oops triggered by systemtap-4.9-2.fc38.x86_64

Oops log as described.

Comment 9 William Cohen 2023-08-11 15:01:21 UTC
@dgibson 

The kernel oops indicates this is related to Intel CET support being enabled:

[  684.201414] traps: Missing ENDBR: kallsyms_lookup_name+0x0/0xd0


[  684.201461] RIP: 0010:kallsyms_lookup_name+0x0/0xd0
[  684.201463] Code: 79 0a 48 f7 d0 48 03 05 56 41 5b 01 c3 cc cc cc cc 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <66> 0f 1f 00 0f 1f 44 00 00 53 48 83 ec 10 65 48 8b 04 25 28 00 00
[  684.201464] RSP: 0018:ffffa1d0853a7db8 EFLAGS: 00010282
[  684.201465] RAX: ffffffff9e206980 RBX: 00007ffc71cedf54 RCX: 0000000000000000
[  684.201465] RDX: 0000000080000000 RSI: ffff902a4040fb50 RDI: ffffffffc1b4c3b5
[  684.201466] RBP: 0000000000000008 R08: ffff902a4040fb78 R09: ffffffffa005d6a0
[  684.201467] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  684.201467] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  684.201468]  ? __pfx_kallsyms_lookup_name+0x10/0x10
[  684.201472]  _stp_ctl_write_cmd+0x46b/0xbe0 [stap_6f8266ac9ff80bbafe256ed5ed9b11a_2890]
[  684.201478]  ? inode_security+0x22/0x60

Would it be possible for you to disable Intel CET on the machine and verify that?


Looking at the disassembled /usr/lib/debug/lib/modules/6.4.9-200.fc38.x86_64/vmlinux there appears to be an endbr64 instruction there, so not sure why it it would trip on that:

ffffffff811dda40 <module_kallsyms_lookup_name>:
ffffffff811dda40:	f3 0f 1e fa          	endbr64
ffffffff811dda44:	e8 c7 6a ea ff       	call   ffffffff81084510 <__fentry__>
ffffffff811dda49:	55                   	push   %rbp
ffffffff811dda4a:	48 89 fd             	mov    %rdi,%rbp
ffffffff811dda4d:	53                   	push   %rbx

Comment 10 William Cohen 2023-08-11 15:09:53 UTC
Ah, Looks like systemtap is using __pfx_kallsyms_lookup_names rather than kallsyms_lookup_name.  From /proc/kallsyms

ffffffffa7206970 T __pfx_kallsyms_lookup_name
ffffffffa7206980 T kallsyms_lookup_name

see that it is calling __pfx_kallsyms_lookup_name rather than kallsyms_lookup_name.

ffffffff81206970 <__pfx_kallsyms_lookup_name>:
ffffffff81206970:	90                   	nop
ffffffff81206971:	90                   	nop
ffffffff81206972:	90                   	nop
ffffffff81206973:	90                   	nop
ffffffff81206974:	90                   	nop
ffffffff81206975:	90                   	nop
ffffffff81206976:	90                   	nop
ffffffff81206977:	90                   	nop
ffffffff81206978:	90                   	nop
ffffffff81206979:	90                   	nop
ffffffff8120697a:	90                   	nop
ffffffff8120697b:	90                   	nop
ffffffff8120697c:	90                   	nop
ffffffff8120697d:	90                   	nop
ffffffff8120697e:	90                   	nop
ffffffff8120697f:	90                   	nop

ffffffff81206980 <kallsyms_lookup_name>:
ffffffff81206980:	f3 0f 1e fa          	endbr64

Comment 11 William Cohen 2023-08-11 20:38:46 UTC
I have a machine that has the CET-IBT support and I have verified that turning off the X86_FEATURE_IBT by adding the following to the kernel boot parameters will allow the systemtap instrumentation to run correctly:

 clearcpuid=596

That doesn't address the basic problem that systemtap is using  __pfx_kallsyms_lookup_name rather than  kallsyms_lookup_name, but it will allow one to use systemtap on the Intel systems with CET-IBT support (have "CET detected: Indirect Branch Tracking enabled" in the boot up dmesgs and ibt in /proc/cpuinfo flags).

Comment 12 David Gibson 2023-08-15 04:16:01 UTC
I can confirm systemtap seems to work again (at least with the trivial script) when I add 'clearcpuid=596' to the kernel command line.

I also updated to kernel-6.4.10-200.fc38.x86_64 and verified that it still fails with that kernel but without the change to the command line.

Comment 13 William Cohen 2023-08-17 15:51:14 UTC
The issue with the Intel IBT support is not related to this particular bug of compiling code for the linux 6.4 kernels.  There has been an upstream systemtap bug filed about systemtap not working with Intel IBT support, https://sourceware.org/bugzilla/show_bug.cgi?id=30777


Note You need to log in before you can comment on or make changes to this bug.