Bug 2238787 - stap fails compilation
Summary: stap fails compilation
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: systemtap
Version: rawhide
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: William Cohen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 2239421
TreeView+ depends on / blocked
 
Reported: 2023-09-13 15:48 UTC by Vít Ondruch
Modified: 2023-09-27 20:51 UTC (History)
6 users (show)

Fixed In Version: systemtap-5.0~pre16958465gca71442b-1.fc40
Clone Of:
: 2239421 (view as bug list)
Environment:
Last Closed: 2023-09-27 20:51:22 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Vít Ondruch 2023-09-13 15:48:09 UTC
Testing SystemTap on Rawhide ppc64le, the kernel module compilation fails:

~~~
# stap -v /usr/share/doc/ruby-doc/ruby-exercise.stp
Pass 1: parsed user script and 507 library scripts using 150208virt/102400res/22528shr/79488data kb, in 500usr/0sys/671real ms.
Pass 2: analyzed script: 17 probes, 7 functions, 1 embed, 0 globals using 152512virt/110592res/26624shr/81792data kb, in 30usr/0sys/61real ms.
Pass 3: translated to C into "/tmp/stapXv6Nnh/stap_974f5ad397b8d50f228b589575aa24c6_11778_src.c" using 161920virt/110592res/26624shr/91200data kb, in 60usr/280sys/1008real ms.
In file included from /usr/share/systemtap/runtime/stp_utrace.c:28,
                 from /usr/share/systemtap/runtime/linux/task_finder2.c:4,
                 from /usr/share/systemtap/runtime/linux/task_finder.c:17,
                 from /usr/share/systemtap/runtime/linux/runtime.h:284,
                 from /usr/share/systemtap/runtime/runtime.h:26,
                 from /tmp/stapXv6Nnh/stap_974f5ad397b8d50f228b589575aa24c6_11778_src.c:21:
/usr/share/systemtap/runtime/stp_task_work.c: In function ‘stp_task_work_add’:
/usr/share/systemtap/runtime/stp_task_work.c:92:41: error: implicit conversion from ‘enum <anonymous>’ to ‘enum task_work_notify_mode’ [-Werror=enum-conversion]
   92 |         rc = task_work_add(task, twork, true);
      |                                         ^~~~
/usr/share/systemtap/runtime/linux/task_finder2.c: In function ‘stap_start_task_finder’:
/usr/share/systemtap/runtime/linux/task_finder2.c:1721:9: error: implicit declaration of function ‘do_each_thread’; did you mean ‘for_each_thread’? [-Werror=implicit-function-declaration]
 1721 |         do_each_thread(grp, tsk) {
      |         ^~~~~~~~~~~~~~
      |         for_each_thread
/usr/share/systemtap/runtime/linux/task_finder2.c:1721:33: error: expected ‘;’ before ‘{’ token
 1721 |         do_each_thread(grp, tsk) {
      |                                 ^~
      |                                 ;
/usr/share/systemtap/runtime/linux/task_finder2.c: In function ‘stap_task_finder_post_init’:
/usr/share/systemtap/runtime/linux/task_finder2.c:1868:33: error: expected ‘;’ before ‘{’ token
 1868 |         do_each_thread(grp, tsk) {
      |                                 ^~
      |                                 ;
In file included from /usr/share/systemtap/runtime/namespaces.h:17,
                 from /tmp/stapXv6Nnh/stap_974f5ad397b8d50f228b589575aa24c6_11778_src.c:728:
/usr/share/systemtap/runtime/linux/namespaces.h: In function ‘_stp_task_struct_valid’:
/usr/share/systemtap/runtime/linux/namespaces.h:107:27: error: expected ‘;’ before ‘{’ token
  107 |   do_each_thread(grp, tsk) {
      |                           ^~
      |                           ;
cc1: all warnings being treated as errors
make[1]: *** [scripts/Makefile.build:243: /tmp/stapXv6Nnh/stap_974f5ad397b8d50f228b589575aa24c6_11778_src.o] Error 1
make: *** [Makefile:1931: /tmp/stapXv6Nnh] Error 2
WARNING: kbuild exited with status: 2
Pass 4: compiled C into "stap_974f5ad397b8d50f228b589575aa24c6_11778.ko" in 105080usr/4950sys/22790real ms.
Pass 4: compilation failed.  [man error::pass4]
~~~

Reproducible: Always

Actual Results:  
kernel module compilation fails

Expected Results:  
kernel module compilation succeeds and stap works

~~~
# rpm -qf `which stap`
systemtap-devel-5.0~pre16891249ge891a37e-0.2.fc39.ppc64le
systemtap-client-5.0~pre16891249ge891a37e-0.2.fc39.ppc64le

# rpm -q ruby-doc
ruby-doc-3.3.0~20230905git7c8932365f-182.fc40.noarch

# rpm -qa kernel*
kernel-modules-core-6.6.0-0.rc1.14.fc40.ppc64le
kernel-core-6.6.0-0.rc1.14.fc40.ppc64le
kernel-modules-6.6.0-0.rc1.14.fc40.ppc64le
kernel-6.6.0-0.rc1.14.fc40.ppc64le
kernel-headers-6.6.0-0.rc1.git0.1.fc40.ppc64le
kernel-devel-6.6.0-0.rc1.14.fc40.ppc64le
kernel-debuginfo-common-ppc64le-6.6.0-0.rc1.14.fc40.ppc64le
kernel-debuginfo-6.6.0-0.rc1.14.fc40.ppc64le
~~~

The ruby-doc comes from this scratch build:

https://koji.fedoraproject.org/koji/taskinfo?taskID=106136976

But I don't think it really matters. The build in Rawhide does not differ.

Comment 1 Vít Ondruch 2023-09-13 16:13:49 UTC
I tried some downgrades, but no luck :(

~~~
# rpm -q systemtap* -a
systemtap-devel-4.9-3.fc39.ppc64le
systemtap-runtime-4.9-3.fc39.ppc64le
systemtap-client-4.9-3.fc39.ppc64le
systemtap-4.9-3.fc39.ppc64le

# rpm -q kernel* -a
kernel-modules-core-6.6.0-0.rc1.14.fc40.ppc64le
kernel-core-6.6.0-0.rc1.14.fc40.ppc64le
kernel-modules-6.6.0-0.rc1.14.fc40.ppc64le
kernel-6.6.0-0.rc1.14.fc40.ppc64le
kernel-devel-6.6.0-0.rc1.14.fc40.ppc64le
kernel-debuginfo-common-ppc64le-6.6.0-0.rc1.14.fc40.ppc64le
kernel-debuginfo-6.6.0-0.rc1.14.fc40.ppc64le
kernel-modules-core-6.5.3-300.fc39.ppc64le
kernel-core-6.5.3-300.fc39.ppc64le
kernel-modules-6.5.3-300.fc39.ppc64le
kernel-debuginfo-common-ppc64le-6.5.3-300.fc39.ppc64le
kernel-debuginfo-6.5.3-300.fc39.ppc64le
kernel-6.5.3-300.fc39.ppc64le
kernel-devel-6.5.3-300.fc39.ppc64le
kernel-headers-6.5.0-1.fc39.ppc64le
~~~

Comment 2 William Cohen 2023-09-13 16:44:04 UTC
The upstream git repo version of Systemtap works with the 6.5 kernels, but as seen in this bug report it does not work with Linux 6.6.  There is an upstream bug tracking this problem, https://sourceware.org/bugzilla/show_bug.cgi?id=30831, and we are working to address the problem.

Comment 3 Vít Ondruch 2023-09-14 09:10:07 UTC
IOW, I should downgrade even further, right? But even backporting the 6.5 kernel would be helpfull. Of course I have no idea how feasible it is ...

Comment 4 Vít Ondruch 2023-09-14 09:58:52 UTC
Hm, I am already at 6.4.4 kernel and still no luck :/

However, it seems that updating back to systemtap-5.0~pre16891249ge891a37e-0.2.fc39.ppc64le makes the compilation succeed \o/

Comment 5 William Cohen 2023-09-14 13:34:11 UTC
Yes, on my Fedora rawhide x86_64 system I have the following verified as working:
kernel-6.5.0-57.fc40.x86_64
systemtap-5.0~pre16891249ge891a37e-0.2.fc39.x86_64

However, when the kernel-6.6.0-0.rc1.14.fc40.x86_64 is used I get the same warning/error messages as the initial bugzilla report.

Systemtap should work with the 6.4.4 kernel. What was the particular setup that caused problems with the 6.4.4?

Comment 6 Vít Ondruch 2023-09-15 09:42:37 UTC
(In reply to William Cohen from comment #5)
> Systemtap should work with the 6.4.4 kernel. What was the particular setup
> that caused problems with the 6.4.4?

The problem was that as one of the first steps to try to get it working, I have downgraded to systemtap-4.9 (see comment #1). And I stayed with the version downgrading Kernels. But after all, I have realized that I probably should go back to the systemtap-5.0~pre16891249ge891a37e, which might support more recent kernel and there is pre-release just because of it. Now I am running systemtap-5.0 together with kernel 6.4.15. But as you said, kernel 6.5 would probably also work.

Actually thinking about my experience once again, would there be some chance to improve SystemTap somehow? The issues was that I tried SystemTap and it failed. Not being expert, seeing some SystemTap pre-release, the first idea was to downgrade SystemTap. But if I downgraded kernel instead, I would have working setup much faster. Therefore I wonder, is there any chance that SystemTap would recommend downgrading Kernel if there is build failure? Or maybe just reporting something such as "Tested up to Kernel 6.5". Or maybe even having some versioned requires/provides in the package, which would prevent me from installing SystemTap with incompatible kernel?

Comment 7 Frank Ch. Eigler 2023-09-15 12:48:05 UTC
When there is a build failure, stap reports 

#  Pass 4: compilation failed.  [man error::pass4]

If you run "man error::pass4", it'll explain what likely happened.
If you run "stap -V" (version), it'll explain this:

Systemtap translator/driver (version 4.9/0.189, rpm 4.9-2.fc38)
Copyright (C) 2005-2023 Red Hat, Inc. and others
This is free software; see the source for copying conditions.
tested kernel versions: 2.6.32 ... 6.3.0-rc1
enabled features: AVAHI BOOST_STRING_REF DYNINST BPF JAVA PYTHON3 LIBRPM LIBSQLITE3 LIBVIRT LIBXML2 NLS NSS READLINE MONITOR_LIBS JSON_C

I suppose we could connect the former and the latter a little tighter?

Comment 8 Vít Ondruch 2023-09-15 13:07:15 UTC
It would never occur to me that some kernel versions might be referenced in version output. If I need version information, then `rpm -q systemtap` would be my choice.

I had the "man error::pass4" opened. But the suggestion "build systemtap from git for use with very young kernels" have not felt compelling. But if there was remark about tested Kernels (or even reference to `stap -V`), I would probably notice.

Still, in the context of Fedora, the require on specific version of Kernel would be probably better choice. Looking at systemtap.spec, I am a bit surprised that there is no reference to kernel-headers. But I am probably missing something.

Comment 9 William Cohen 2023-09-15 13:31:55 UTC
When a release is done there the announcement email include some information about which kernels the release has been tested on.  For example for 4.9 announcement towards the bottom of announcement see a list of kernels:

https://sourceware.org/pipermail/systemtap/2023q2/027641.html

However, this might not be quite as clear as it could be.

kernel-headers are not always needed to use systemtap.  If one is using the dyninst back end and probing userspace, there is no need for kernel-headers.  Also someone could be using systemtap on a locally built kernel which there is no kernel-headers for instrumenting that kernel.

Comment 10 Vít Ondruch 2023-09-15 14:04:07 UTC
I can provide you just my POV as a packager of Ruby, who uses SystemTap just randomly to make sure the probes in Ruby works. And from this point of view:

1) SystemTap release announcements are completely invisible to me.
2) I am heavy distribution user and I think that the packages should works the best primarily in the context of the distribution and should provide the most convenience, therefore custom kernel build is not the scenario which would deserve too much focus.
3) I cannot judge if my use case is just niche scenario. However, it seems to me that `Recommends: kernel-headers >= 2.6.32 with kernel-headers <= 6.3.0-rc1` would provide nice default, while still keeping the flexibility of installing SystemTap withou kernel-headers, if required.

But this is mostly me just thinking loud and it is partly OT for this ticket, therefore my apologies. After all, I am happy I have found working setup ;)

Comment 11 Frank Ch. Eigler 2023-09-15 15:00:20 UTC
Making suggestions at the RPM level is an interesting idea.   On the other hand, an rpm level version designation that would prevent a kernel from being upgraded to a version that is not designated as tested with systemtap could also lead to complaints.  Not sure RPM can make it a default-warning that is still useful.

In the mean time, we just pushed https://sourceware.org/PR30858 to git master, which should improve the diagnostic for this case.

Comment 12 Vít Ondruch 2023-09-18 09:47:20 UTC
(In reply to Frank Ch. Eigler from comment #11)
> Making suggestions at the RPM level is an interesting idea.   On the other
> hand, an rpm level version designation that would prevent a kernel from
> being upgraded to a version that is not designated as tested with systemtap
> could also lead to complaints.  Not sure RPM can make it a default-warning
> that is still useful.

I have extracted this proposal into separate bug 2239421, because this will likely need more tinkering and it is out of scope of this.

> In the mean time, we just pushed https://sourceware.org/PR30858 to git
> master, which should improve the diagnostic for this case.

I love it. Thx a lot.

Comment 13 William Cohen 2023-09-27 20:51:22 UTC
Did some work on the upstream systemtap pr30831.  The git repo version of systemtap now works with linux-6.6 kernels. A rawhide koji build of that has been made, systemtap-5.0~pre16958465gca71442b-1.fc40.  It should address this issue.


Note You need to log in before you can comment on or make changes to this bug.