Bug 1891509

Summary: stap's debuginfod lookup fails in some cases on aarch64 and s390x (none ELF images)
Product: Red Hat Enterprise Linux 8 Reporter: Martin Cermak <mcermak>
Component: systemtapAssignee: Frank Ch. Eigler <fche>
systemtap sub component: system-version QA Contact: qe-baseos-tools-bugs
Status: CLOSED CANTFIX Docs Contact:
Severity: medium    
Priority: medium CC: fche, lberk, mcermak, mjw
Version: 8.4Keywords: FutureFeature, Triaged
Target Milestone: rc   
Target Release: 8.0   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-28 10:49:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Cermak 2020-10-26 13:50:47 UTC
I found a stap/debuginfod use-case that works fine with x86_64 and ppc64le, but fails in aarch64 and s390x.  Here is how it works fine on x86_64:

=======
shlvl1   8.4 Server x86_64 # uname -r
4.18.0-240.7.el8.x86_64
shlvl1   8.4 Server x86_64 # export DEBUGINFOD_URLS=http://debuginfod.usersys.redhat.com:3632/
shlvl1   8.4 Server x86_64 # export DEBUGINFOD_PROGRESS=1
shlvl1   8.4 Server x86_64 # rm -rf ~/.cache/debuginfod_client;  stap -ve 'probe kernel.function("vfs_read"){ log("hey!"); exit() }'
Pass 1: parsed user script and 479 library scripts using 239240virt/81856res/12832shr/68568data kb, in 190usr/40sys/224real ms.
Downloading from http://debuginfod.usersys.redhat.com:3632/ 820958080/820958080
Pass 2: analyzed script: 2 probes, 2 functions, 0 embeds, 0 globals using 373476virt/139236res/13992shr/133232data kb, in 2480usr/3750sys/22752real ms.
Pass 3: using cached /root/.systemtap/cache/60/stap_601b59d3ae5fc33ff330055fa6674810_1413.c
Pass 4: using cached /root/.systemtap/cache/60/stap_601b59d3ae5fc33ff330055fa6674810_1413.ko
Pass 5: starting run.
hey!
Pass 5: run completed in 10usr/30sys/379real ms.
shlvl1   8.4 Server x86_64 # 
=======

Here is how the same thing fails on aarch64:

=======
shlvl1   8.4 Server aarch64 # uname -r
4.18.0-240.7.el8.aarch64
shlvl1   8.4 Server aarch64 # export DEBUGINFOD_URLS=http://debuginfod.usersys.redhat.com:3632/
shlvl1   8.4 Server aarch64 # export DEBUGINFOD_PROGRESS=1
shlvl1   8.4 Server aarch64 # rm -rf ~/.cache/debuginfod_client;  stap -ve 'probe kernel.function("vfs_read"){ log("hey!"); exit() }'
Pass 1: parsed user script and 461 library scripts using 105344virt/80704res/13056shr/67840data kb, in 490usr/50sys/540real ms.
semantic error: resolution failed in DWARF builder

semantic error: resolution failed in DWARF builder

semantic error: while resolving probe point: identifier 'kernel' at <input>:1:7
        source: probe kernel.function("vfs_read"){ log("hey!"); exit() }
                      ^

semantic error: no match

Pass 2: analyzed script: 0 probes, 0 functions, 0 embeds, 0 globals using 107200virt/87232res/13824shr/69696data kb, in 280usr/20sys/296real ms.
Pass 2: analysis failed.  [man error::pass2]
(1) shlvl1   8.4 Server aarch64 # 
=======

Comment 1 Mark Wielaard 2020-10-29 16:53:04 UTC
I assume that on both arches the same versions of systemtap and elfutils are used?
Could you provide the rpm nvrs of each package?

Asking because there have been no updates for 8.4 yet.

So I assume the same behavior can be seen on 8.3?

Comment 2 Mark Wielaard 2020-11-02 11:56:09 UTC
Martin, see comment #1. Answers would help me setup a replicator.

Comment 3 Martin Cermak 2020-11-09 17:44:32 UTC
Reproduced again with elfutils-0.182-2.el8 and systemtap-4.3-4.el8.

I've only indexed the kernel-`uname -r` by going to /mnt/redhat/brewroot/packages/kernel/4.18.0/240.10.el8/`arch` and from there running `debuginfod -R -vvv .`.  Then `export DEBUGINFOD_URLS=http://localhost:8002/`, `export DEBUGINFOD_PROGRESS=1` and finally:

rm -rf ~/.cache/debuginfod_client;  stap -ve 'probe kernel.function("vfs_read"){ log("hey!"); exit() }'

This works as expected on ppc64le and x86_64 (stap shows Downloading \ ... and finally succeedes to run the script).  But on aarch64 and s390x, stap doesn't show the expected 'Downloading \', and eventually fails running the script because of "semantic error: resolution failed in DWARF builder", just as originally reported.

Comment 4 Mark Wielaard 2020-11-11 15:21:40 UTC
The issue is the vmlinuz format.
It is simplest to check with eu-readelf -h:

eu-readelf -h /boot/vmlinuz-4.18.0-240.10.el8.x86_64
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Ident Version:                     1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           AMD x86-64
  Version:                           1 (current)
  Entry point address:               0x1000000
  Start of program headers:          64 (bytes into file)
  Start of section headers:          56623512 (bytes into file)
  Flags:                             
  Size of this header:               64 (bytes)
  Size of program header entries:    56 (bytes)
  Number of program headers entries: 5
  Size of section header entries:    64 (bytes)
  Number of section headers entries: 37
  Section header string table index: 36

eu-readelf -h /boot/vmlinuz-4.18.0-240.10.el8.s390x  
eu-readelf: failed reading '/boot/vmlinuz-4.18.0-240.10.el8.s390x': not a valid ELF file

eu-readelf -h /boot/vmlinuz-4.18.0-240.10.el8.ppc64le 
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Ident Version:                     1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           PowerPC64
  Version:                           1 (current)
  Entry point address:               0xc000000000000000
  Start of program headers:          64 (bytes into file)
  Start of section headers:          28741512 (bytes into file)
  Flags:                             0x2
  Size of this header:               64 (bytes)
  Size of program header entries:    56 (bytes)
  Number of program headers entries: 3
  Size of section header entries:    64 (bytes)
  Number of section headers entries: 50
  Section header string table index: 49

eu-readelf -h /boot/vmlinuz-4.18.0-240.10.el8.aarch64 
eu-readelf: failed reading '/boot/vmlinuz-4.18.0-240.10.el8.aarch64': not a valid ELF file

Comment 7 Martin Cermak 2021-05-21 16:53:18 UTC
*** Bug 1963037 has been marked as a duplicate of this bug. ***

Comment 8 Martin Cermak 2021-05-21 16:53:21 UTC
*** Bug 1962735 has been marked as a duplicate of this bug. ***

Comment 9 Martin Cermak 2021-05-21 16:54:17 UTC
*** Bug 1963042 has been marked as a duplicate of this bug. ***

Comment 10 Frank Ch. Eigler 2021-05-23 18:27:46 UTC
upstream systemtap commit 294b7a53ec2f forces #stap-prep to fail on these
platforms, so as to avoid giving users a false confidence

Comment 11 Mark Wielaard 2021-11-08 10:06:07 UTC
No progress on this on the elfutils side. But see comment #10 for a systemtap "workaround". Unclear what a good solution would be given that these images aren't ELF.

Comment 13 Mark Wielaard 2022-03-28 10:43:40 UTC
I think this cannot be fixed on the elfutils side since we don't have an ELF image for these architectures. Maybe the systemtap workaround mentioned in comment #10 is enough?