Bug 1539664 - [ppc64le] Setting breakpoint on function name doesn't work
Summary: [ppc64le] Setting breakpoint on function name doesn't work
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: annobin
Version: rawhide
Hardware: ppc64le
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Nick Clifton
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-01-29 12:07 UTC by Severin Gehwolf
Modified: 2018-02-09 21:25 UTC (History)
6 users (show)

Fixed In Version: annobin-3.4-1.fc28
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-02-09 21:25:47 UTC


Attachments (Terms of Use)
java-1.8.0-openjdk rawhide build fix. (3.48 KB, patch)
2018-01-29 12:07 UTC, Severin Gehwolf
no flags Details | Diff

Description Severin Gehwolf 2018-01-29 12:07:49 UTC
Created attachment 1387714 [details]
java-1.8.0-openjdk rawhide build fix.

Description of problem:
This bug was originally observed when building OpenJDK. The java-1.8.0-openjdk spec file contains this sequence in the spec in its %check section. The intention is to verify basic backtraces work based on source-code line numbers:

gdb -q "$JAVA_HOME/bin/java" <<EOF | tee gdb.out
handle SIGSEGV pass nostop noprint
handle SIGILL pass nostop noprint
set breakpoint pending on
break javaCalls.cpp:1
commands 1
backtrace
quit
end
run -version
EOF
grep 'JavaCallWrapper::JavaCallWrapper' gdb.out

Version-Release number of selected component (if applicable):
$ rpm -q gdb
gdb-8.0.90.20180109-2.fc28.ppc64le

How reproducible:
100%

Some failed tasks:
https://koji.fedoraproject.org/koji/taskinfo?taskID=24471849
https://koji.fedoraproject.org/koji/taskinfo?taskID=24530050
https://koji.fedoraproject.org/koji/taskinfo?taskID=24467529 (passed on other arches)

Steps to Reproduce:
1. $ fedpkg clone -a java-1.8.0-openjdk
2. $ cd java-1.8.0-openjdk
3. $ git am 0001-Fix-FTBFS-due-to-link-failure-of-libfontmanager.so.patch
   (see attachment)
4. $ fedpkg build --srpm java-1.8.0-openjdk-1.8.0.161-1.b14.fc28 \
     --scratch --arches ppc64le

Actual results:
Build of package fails, due to broken gdb check on ppc64le.

+ gdb -q /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/build/jdk8.build/images/j2sdk-image/bin/java
Reading symbols from /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/build/jdk8.build/images/j2sdk-image/bin/java...done.
(gdb) Signal        Stop	Print	Pass to program	Description
SIGSEGV       No	No	Yes		Segmentation fault
(gdb) Signal        Stop	Print	Pass to program	Description
SIGILL        No	No	Yes		Illegal instruction
(gdb) (gdb) No source file named javaCalls.cpp.
Breakpoint 1 (javaCalls.cpp:1) pending.
(gdb) >>>(gdb) Starting program: /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/build/jdk8.build/images/j2sdk-image/bin/java -version
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.26.9000-48.fc28.ppc64le
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff6c1f180 (LWP 19333)]
[New Thread 0x7ffff635f180 (LWP 19334)]
[New Thread 0x7ffff613f180 (LWP 19335)]
[New Thread 0x7ffff5f1f180 (LWP 19336)]
[New Thread 0x7ffff5cff180 (LWP 19337)]
[New Thread 0x7fffddbbf180 (LWP 19338)]
[New Thread 0x7fffdd99f180 (LWP 19339)]
[New Thread 0x7fffdd77f180 (LWP 19340)]
[New Thread 0x7fffdd55f180 (LWP 19341)]
[New Thread 0x7fffdd33f180 (LWP 19342)]
[New Thread 0x7fffdcf1f180 (LWP 19343)]
[New Thread 0x7fffdcaff180 (LWP 19344)]
[New Thread 0x7fffdc8df180 (LWP 19345)]
openjdk version "1.8.0_161"
OpenJDK Runtime Environment (build 1.8.0_161-b14)
OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)
[Thread 0x7fffdc8df180 (LWP 19345) exited]
[Thread 0x7fffddbbf180 (LWP 19338) exited]
[Thread 0x7ffff6c1f180 (LWP 19333) exited]
[Thread 0x7fffdcaff180 (LWP 19344) exited]
[Thread 0x7fffdcf1f180 (LWP 19343) exited]
[Thread 0x7fffdd33f180 (LWP 19342) exited]
[Thread 0x7fffdd77f180 (LWP 19340) exited]
[Thread 0x7fffdd99f180 (LWP 19339) exited]
[Thread 0x7ffff5cff180 (LWP 19337) exited]
[Thread 0x7ffff5f1f180 (LWP 19336) exited]
[Thread 0x7ffff613f180 (LWP 19335) exited]
[Thread 0x7ffff635f180 (LWP 19334) exited]
[Thread 0x7ffff7ff4760 (LWP 19330) exited]
[Inferior 1 (process 19330) exited normally]
Missing separate debuginfos, use: dnf debuginfo-install libgcc-7.3.1-1.fc28.ppc64le libstdc++-7.3.1-1.fc28.ppc64le zlib-1.2.11-4.fc27.ppc64le
(gdb) quit
+ grep JavaCallWrapper::JavaCallWrapper gdb.out
RPM build errors:
error: Bad exit status from /var/tmp/rpm-tmp.5y8x8L (%check)
    Bad exit status from /var/tmp/rpm-tmp.5y8x8L (%check)
Child return code was: 1

Expected results:
gdb is expected to set the breakpoint in javaCalls.cpp:1, function 'JavaCallWrapper::JavaCallWrapper' and thus produces proper backtrace.

Additional info:
I've done some debugging on a ppc64le machine with current Fedora rawhide. Setting the breakpoint as follows works:

break share/vm/runtime/javaCalls.cpp:58

These don't:

break JavaCallWrapper::JavaCallWrapper
break javaCalls.cpp:47
break javaCalls.cpp:1

However, JavaCallWrapper::JavaCallWrapper must have been entered. The following is using javaCalls.cpp:58 syntax:

(gdb) >>>(gdb) Starting program: /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/build/jdk8.build/images/j2sdk-image/bin/java -version
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.26.9000-48.fc28.ppc64le
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff6c1f180 (LWP 391)]
[New Thread 0x7ffff635f180 (LWP 392)]
[New Thread 0x7ffff613f180 (LWP 393)]
[New Thread 0x7ffff5f1f180 (LWP 394)]
[New Thread 0x7ffff5cff180 (LWP 395)]
[New Thread 0x7ffff5adf180 (LWP 396)]
[New Thread 0x7fffd571f180 (LWP 397)]
[Switching to Thread 0x7ffff6c1f180 (LWP 391)]

Thread 2 "java" hit Breakpoint 1, JavaCallWrapper::JavaCallWrapper (this=0x7ffff6c1dcb0, callee_method=..., receiver=..., result=0x7ffff6c1de40, __the_thread__=0x7ffff000a000)
    at /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/hotspot/src/share/vm/runtime/javaCalls.cpp:58
58	  JNIHandleBlock* new_handles = JNIHandleBlock::allocate_block(thread);
#0  JavaCallWrapper::JavaCallWrapper (this=0x7ffff6c1dcb0, callee_method=..., receiver=..., result=0x7ffff6c1de40, __the_thread__=0x7ffff000a000)
    at /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/hotspot/src/share/vm/runtime/javaCalls.cpp:58
#1  0x00007ffff75e071c in JavaCalls::call_helper (result=0x7ffff6c1de40, m=<optimized out>, args=0x7ffff6c1de60, __the_thread__=0x7ffff000a000)
    at /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/hotspot/src/share/vm/runtime/javaCalls.cpp:394
#2  0x00007ffff78e036c in os::os_exception_wrapper (f=<optimized out>, value=<optimized out>, method=<optimized out>, args=<optimized out>, thread=<optimized out>)
    at /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:5296
#3  0x00007ffff75dfb88 in JavaCalls::call (result=<optimized out>, method=<error reading variable: value has been optimized out>, args=<optimized out>, __the_thread__=<optimized out>)
    at /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/hotspot/src/share/vm/runtime/javaCalls.cpp:307
#4  0x00007ffff757ea08 in InstanceKlass::call_class_initializer_impl (this_oop=..., __the_thread__=0x7ffff000a000)
    at /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/hotspot/src/share/vm/oops/instanceKlass.cpp:1239
#5  0x00007ffff757ee68 in InstanceKlass::call_class_initializer (__the_thread__=0x7ffff000a000, this=<optimized out>)
    at /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/hotspot/src/share/vm/oops/instanceKlass.cpp:1207
#6  InstanceKlass::initialize_impl (this_oop=..., __the_thread__=0x7ffff000a000) at /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/hotspot/src/share/vm/oops/instanceKlass.cpp:932
#7  0x00007ffff757f370 in InstanceKlass::initialize (this=0x7fff88000ea8, __the_thread__=0x7ffff000a000)
    at /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/hotspot/src/share/vm/oops/instanceKlass.cpp:569
#8  0x00007ffff757f0a4 in InstanceKlass::initialize_impl (this_oop=..., __the_thread__=0x7ffff000a000)
    at /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/hotspot/src/share/vm/oops/instanceKlass.cpp:894
#9  0x00007ffff757f370 in InstanceKlass::initialize (this=0x7fff88001610, __the_thread__=0x7ffff000a000)
    at /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/hotspot/src/share/vm/oops/instanceKlass.cpp:569
#10 0x00007ffff7a95d00 in initialize_class (__the_thread__=0x7ffff000a000, class_name=<optimized out>)
    at /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/hotspot/src/share/vm/runtime/thread.cpp:993
#11 Threads::create_vm (args=<optimized out>, canTryAgain=<optimized out>) at /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/hotspot/src/share/vm/runtime/thread.cpp:3491
#12 0x00007ffff761a3d4 in JNI_CreateJavaVM (vm=0x7ffff6c1e680, penv=0x7ffff6c1e688, args=<optimized out>)
    at /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/hotspot/src/share/vm/prims/jni.cpp:5221
#13 0x00007ffff7eb3434 in InitializeJVM (ifn=<synthetic pointer>, penv=<optimized out>, pvm=<optimized out>)
    at /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/jdk/src/share/bin/java.c:1215
#14 JavaMain (_args=<optimized out>) at /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/jdk/src/share/bin/java.c:376
#15 0x00007ffff7f386e4 in start_thread () from /lib64/libpthread.so.0
#16 0x00007ffff7db50d8 in clone () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install libgcc-7.3.1-1.fc28.ppc64le libstdc++-7.3.1-1.fc28.ppc64le zlib-1.2.11-4.fc27.ppc64le
(gdb) quit
A debugging session is active.

	Inferior 1 [process 388] will be killed.

Quit anyway? (y or n) [answered Y; input not from terminal]

The source code of javaCalls.cpp looks as follows:
#include "precompiled.hpp"
#include "classfile/systemDictionary.hpp"
#include "classfile/vmSymbols.hpp"
#include "code/nmethod.hpp"
#include "compiler/compileBroker.hpp"
#include "interpreter/interpreter.hpp"
#include "interpreter/linkResolver.hpp"
#include "memory/universe.inline.hpp"
#include "oops/oop.inline.hpp"
#include "prims/jniCheck.hpp"
#include "runtime/compilationPolicy.hpp"
#include "runtime/handles.inline.hpp"
#include "runtime/interfaceSupport.hpp"
#include "runtime/javaCalls.hpp"
#include "runtime/mutexLocker.hpp"
#include "runtime/signature.hpp"
#include "runtime/stubRoutines.hpp"
#include "runtime/thread.inline.hpp"

// -----------------------------------------------------
// Implementation of JavaCallWrapper

JavaCallWrapper::JavaCallWrapper(methodHandle callee_method, Handle receiver, JavaValue* result, TRAPS) {
  JavaThread* thread = (JavaThread *)THREAD;
  bool clear_pending_exception = true;

  guarantee(thread->is_Java_thread(), "crucial check - the VM thread cannot and must not escape to Java code");
  assert(!thread->owns_locks(), "must release all locks when leaving VM");
  guarantee(!thread->is_Compiler_thread(), "cannot make java calls from the compiler");
  _result   = result;

  // Allocate handle block for Java code. This must be done before we change thread_state to _thread_in_Java_or_stub,
  // since it can potentially block.
  JNIHandleBlock* new_handles = JNIHandleBlock::allocate_block(thread);

'JNIHandleBlock* new_handles = ...' is on line 58.

Comment 1 Severin Gehwolf 2018-01-29 12:09:51 UTC
# nm -Cl /builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/build/jdk8.build/images/j2sdk-image/jre/lib/ppc64le/server/libjvm.so | grep JavaCallWrapper
000000000060cfe0 t JavaCallWrapper::oops_do(OopClosure*)	/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/hotspot/src/share/vm/runtime/javaCalls.cpp:144
000000000060cc30 t JavaCallWrapper::JavaCallWrapper(methodHandle, Handle, JavaValue*, Thread*)	/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/hotspot/src/share/vm/runtime/javaCalls.cpp:47
000000000060cc30 t JavaCallWrapper::JavaCallWrapper(methodHandle, Handle, JavaValue*, Thread*)
000000000060cf30 t JavaCallWrapper::~JavaCallWrapper()	/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.161-1.b14.fc28.ppc64le/openjdk/hotspot/src/share/vm/runtime/javaCalls.cpp:111
000000000060cf30 t JavaCallWrapper::~JavaCallWrapper()

Thus, one of:

break JavaCallWrapper::JavaCallWrapper
break javaCalls.cpp:47

should work for setting a break point, but doesn't.

Comment 2 Severin Gehwolf 2018-01-29 13:30:53 UTC
To be honest, I'm not sure whether this is a gdb bug as this used to work with 8.0.50.20171204-33.fc28[1], but downgrading that in a local mock doesn't fix the problem at hand. Any pointers as to what the issue might be are appreciated. Sorry for the noise. I'll close as not-a-bug.

[1] https://koji.fedoraproject.org/koji/buildinfo?buildID=1008667

Comment 3 Jan Kratochvil 2018-01-29 19:04:41 UTC
So could you attach here a binary which fails for GDB?

Comment 5 Severin Gehwolf 2018-01-30 09:18:13 UTC
Provided a link to an image privately.

Comment 6 Jan Kratochvil 2018-01-31 10:41:31 UTC
FYI with gcc-7.3.1-1.fc28.ppc64le I got some missing functions linkage error.
After upgrade to Rawhide gcc-8.0.1-0.6.fc28.ppc64le I get now:
  cc1: warning: command line option '-std=gnu++98' is valid for C++/ObjC++ but not for C

Comment 7 Severin Gehwolf 2018-01-31 11:09:00 UTC
(In reply to Jan Kratochvil from comment #6)
> FYI with gcc-7.3.1-1.fc28.ppc64le I got some missing functions linkage error.
> After upgrade to Rawhide gcc-8.0.1-0.6.fc28.ppc64le I get now:
>   cc1: warning: command line option '-std=gnu++98' is valid for C++/ObjC++
> but not for C

Not sure what you are trying to do, but if you attempt to build java-1.8.0-openjdk on ppc64le on rawhide, you'll run into bug 1539812.

The build warning should be benign.

Comment 8 Jan Kratochvil 2018-01-31 12:17:44 UTC
It is a valid bug.

<JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+2296>:
   bl <JavaCallWrapper::JavaCallWrapper+8>
=> ld r4,96(r31)

<JavaCallWrapper::JavaCallWrapper(methodHandle, Handle, JavaValue*, Thread*)>:
<JavaCallWrapper::JavaCallWrapper>:   addis r2,r12,101
<JavaCallWrapper::JavaCallWrapper+4>: addi  r2,r2,-20016
<JavaCallWrapper::JavaCallWrapper+8>: mflr  r0

ppc64 sometimes calls skipping some first instructions. GDB should handle that but in this case it does not - it places the breakpoint on the very first instruction.

Probably something in rs6000-tdep.c skip_prologue().

It does affect also upstream trunk: GNU gdb (GDB) 8.1.50.20180131-git

Comment 9 Sergio Durigan Junior 2018-01-31 18:54:58 UTC
https://sourceware.org/ml/gdb-patches/2018-01/msg00669.html

This may be related to the bug.

Comment 10 Jan Kratochvil 2018-01-31 18:57:24 UTC
(In reply to Sergio Durigan Junior from comment #9)
> https://sourceware.org/ml/gdb-patches/2018-01/msg00669.html

It does not fix it.  And I believe it is a different issue.

Comment 11 Jan Kratochvil 2018-01-31 19:23:51 UTC
I have asked IBM upstream:
PowerPC: Call of a function+8 address
https://sourceware.org/ml/gdb/2018-01/msg00028.html

Comment 12 Jan Kratochvil 2018-02-07 18:22:47 UTC
It is because annobin started to create some new unusual symbols which confuse GDB:

PASS:
echo 'void f(void){}' >filename.c;/usr/bin/gcc -o filename.o -c filename.c ;readelf -Ws filename.o|grep filename.c
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS filename.c

FAIL:
echo 'void f(void){}' >filename.c;/usr/bin/gcc -o filename.o -c filename.c -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1;readelf -Ws filename.o|grep filename.c
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS filename.c
     5: 0000000000000000     2 OBJECT  LOCAL  DEFAULT    1 filename.c
     6: 0000000000000002     0 NOTYPE  LOCAL  DEFAULT    1 filename.c_end

The problem is that first function in a CU has the same address as "filename.c" data symbol (which has its address and it has its non-zero size).  This disables ppc64le global vs. local entrypoint detection for first function in a file:
  https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=591a12a1d4c8843343eb999145d8bcc1efedf408

Couldn't be the symbol of other type than a regular runtime-type STT_OBJECT?
I am not aware how is it useful for some static analysis tools using annobin info but GDB is not happy from it.

Comment 13 Nick Clifton 2018-02-08 10:00:30 UTC
(In reply to Jan Kratochvil from comment #12)

Hi Jan,
 
>      1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS filename.c
>      5: 0000000000000000     2 OBJECT  LOCAL  DEFAULT    1 filename.c
>      6: 0000000000000002     0 NOTYPE  LOCAL  DEFAULT    1 filename.c_end
> 
> The problem is that first function in a CU has the same address as
> "filename.c" data symbol (which has its address and it has its non-zero
> size).  This disables ppc64le global vs. local entrypoint detection for
> first function in a file:
>  
> https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;
> h=591a12a1d4c8843343eb999145d8bcc1efedf408

Is the problem that ppc_elfv2_elf_make_msymbol_special () only expects
to find *one* symbol at a given PC address, and does not check to see if
there are any others ?  If so, would it be possible for the function to
be extended to keep searching until it finds one with a non-zero at_other 
field ?  (I am not sure how to do this though.  I am not a GDB expert).


> Couldn't be the symbol of other type than a regular runtime-type STT_OBJECT?

It could, but isn't STT_OBJECT the correct type for this symbol ?  The symbol is being used to mark that start of the code for a given object file (or CU if you
prefer) inside a larger binary.  Hence the STT_OBJECT type.

The symbol that should be used to determine the global-vs-local entry point
status is the function symbol for the address though, not the object symbool.  So shouldn't lookup_minimal_symbol_by_pc_section_1() prefer symbols of type STT_FUNC over symbols of type STT_OBJECT ?  (Can it do this ?)

Cheers
  Nick

Comment 14 Jan Kratochvil 2018-02-08 12:35:09 UTC
Hi Nick,

(In reply to Nick Clifton from comment #13)
> Is the problem that ppc_elfv2_elf_make_msymbol_special () only expects
> to find *one* symbol at a given PC address, and does not check to see if
> there are any others ?

GDB creates the ELF symbols correctly (correctly means that STT_OBJECT for the filename is NOT marked by MSYMBOL_TARGET_FLAG_1.

The problem is that ppc_elfv2_skip_entrypoint() finds the STT_OBJECT symbol instead of the expected STT_FUNC symbol.


> If so, would it be possible for the function to be extended to keep searching
> until it finds one with a non-zero at_other field ?

Sure it is a GDB bug.  First I do not know why lookup_minimal_symbol_by_pc() ever returns data symbols when it is named "*_by_pc".  Besides that ppc_elfv2_skip_entrypoint() could also iterate through all the symbols at that PC address if any of them does not have MSYMBOL_TARGET_FLAG_1 (but there isn't available such underlying symbol interating interface to do that).

But touching such fragile GDB code always opens a can of worms so I wanted to clarify first if it is what we want to do.  As I do not think there exists any real data variable spanning over all the text=code of the whole file.  Besides that even if it did the text can be discontiguous which cannot be expressed by ELF symbols.  And then there is already DW_AT_{{low,high}_pc,ranges} so I do not understand the purpose of these new ELF symbols myself which is why I am asking. DWARF is present the *-debuginfo.rpm the same way as this .symtab.

As I do not think this new annobin extension will affect only GDB, various binary tools always had problems with overlapping address ranges, including elfutils for example.  I am not happy with another incompatibility from mainstream Linux binary formats.  For example current DWZ being used only by Red Hat is making Red Hat OSes incompatible to other binary tools (incl. LLVM).


> It could, but isn't STT_OBJECT the correct type for this symbol ?

No, there does not exist any runtime data entity covering the whole PC address range of a file.  Maybe STT_FILE could be used for it or at least STT_NOTYPE or even best something new from STT_LOOS..STT_HIOS.


> The symbol is being used to mark that start of the code for a given object
> file (or CU if you prefer) inside a larger binary.  Hence the STT_OBJECT type.

STT_OBJECT: "a data object, such as a variable, an array, and so on."
does not correspond to me for "start of the code"


> The symbol that should be used to determine the global-vs-local entry point
> status is the function symbol for the address though, not the object
> symbool.  So shouldn't lookup_minimal_symbol_by_pc_section_1() prefer
> symbols of type STT_FUNC over symbols of type STT_OBJECT ?  (Can it do this
> ?)

This is what I wrote above.  Just I did not want to open a can of worms in GDB before a justification the new annobin symbol is really the thing we want to do as personally I do not think so.  Maybe it could be discussed more on some list?

Comment 15 Nick Clifton 2018-02-08 13:21:06 UTC
Hi Jan,

(In reply to Jan Kratochvil from comment #14)

> Just I did not want to open a can of worms in
> GDB before a justification the new annobin symbol is really the thing we
> want to do as personally I do not think so.  Maybe it could be discussed
> more on some list?

We could - but there is no annobin developers mailing list, and there are
only two annobin developers (myself and Florian), so here would be just as 
good as anywhere else.


> But touching such fragile GDB code always opens a can of worms so I wanted
> to clarify first if it is what we want to do.

OK, that makes sense.


> STT_OBJECT: "a data object, such as a variable, an array, and so on."
> does not correspond to me for "start of the code"

*sigh*  True,  I was going by the name of the type, not its definition.

OK - so if I change the symbol type, to say STT_LOOS + 1, would that
actually help GDB ?  There would still be multiple symbols at the same 
address.  Do you know if GDB's mimisymbol reading code will skip OS 
specific symbols ?

Cheers
  Nick

Comment 16 Nick Clifton 2018-02-08 14:03:49 UTC
(In reply to Nick Clifton from comment #15)
 
> OK - so if I change the symbol type, to say STT_LOOS + 1, would that
> actually help GDB ?   

Make that STT_NOTYPE.  I have just discovered that GAS does not support setting symbols to arbitrary numeric values.  (It does allow some values, but only ones
that it recognises).

Comment 17 Jan Kratochvil 2018-02-08 15:56:42 UTC
Hi Nick,

(In reply to Nick Clifton from comment #15)
> OK - so if I change the symbol type, to say STT_LOOS + 1, would that
> actually help GDB ?

GDB does not read ELF symbols directly, it uses bfd/ for that. From bfd/ elf_slurp_symbol_table() I do not see it would ignore such symbols, they just do not get any special BSF_* flags.


(In reply to Nick Clifton from comment #16)
> Make that STT_NOTYPE.

That should be the same for bfd/ ELF symbol reader.  GDB will still consider it as mst_file_text due to 'sym->section->flags & SEC_CODE' in elf_symtab_read().


I would like to reuse existing STT_FILE - is it really a no go?
Otherwise I can just post for GDB upstream something dealing with the current state of annobin.  Still I agree STT_NOTYPE is more correct than STT_OBJECT.

Comment 18 Nick Clifton 2018-02-08 16:07:14 UTC
Hi Jan,

> That should be the same for bfd/ ELF symbol reader.  GDB will still consider
> it as mst_file_text due to 'sym->section->flags & SEC_CODE' in
> elf_symtab_read().

So basically changing the type of the symbols generated by annobin is
not really going to help GDB, but it would help reduce confusion over
the meaning/use of the symbol.  Right ?


> I would like to reuse existing STT_FILE - is it really a no go?

Sadly yes.  STT_FILE symbols have special semantics, and they have to be
absolute.  I need a symbol that is attached to a specific section and which
will not confuse tools like GDB because they do not behave like normal file
symbols.


> Otherwise I can just post for GDB upstream something dealing with the
> current state of annobin.  Still I agree STT_NOTYPE is more correct than
> STT_OBJECT.

OK, I will make that change and update the annobin rpm.

Cheers
  Nick

Comment 19 Florian Weimer 2018-02-08 16:08:44 UTC
Nick, could use a truly local symbol (.L… or whatever syntax they use)?  I don't think these symbols have to go into .symtab at all.

Comment 20 Jan Kratochvil 2018-02-08 16:13:06 UTC
Hi Nick,

(In reply to Nick Clifton from comment #18)
> So basically changing the type of the symbols generated by annobin is
> not really going to help GDB, but it would help reduce confusion over
> the meaning/use of the symbol.  Right ?

Right.


(In reply to Florian Weimer from comment #19)
> Nick, could use a truly local symbol (.L… or whatever syntax they use)?  I
> don't think these symbols have to go into .symtab at all.

That would be nice, GDB already ignores such symbols:
                       || ((sym->flags & BSF_LOCAL)
                           && sym->name[0] == '$'
                           && sym->name[1] == 'L'))

I expected the ELF symbol name should really match the filename.  But such a prefix would fix that even for existing GDBs.

Comment 21 Nick Clifton 2018-02-08 16:33:16 UTC
Hi Guys,

(In reply to Florian Weimer from comment #19)
> Nick, could use a truly local symbol (.L… or whatever syntax they use)?  I
> don't think these symbols have to go into .symtab at all.

Hmm, investigating....

I need symbols that do go into the symbol table, so that they can be extracted by the tools that parse the annobin notes.  But it should not matter if they are local.  I will run some tests.

Cheers
  Nick

Comment 22 Mark Wielaard 2018-02-08 17:14:19 UTC
I haven't tried with elfutils yet. But various elfutils based tools use a variant of dwfl_module_addrsym/dwfl_module_addrinfo/dwfl_module_addrname to match an address to the "closest" symbol/name. The rules to pick the closest symbol as follows:

- Ignore symbols that:
  - have no name
  - have no section associated (SHN_UNDEF)
  - have an address/value larger than the one we are looking for
  - that are STT_SECTION, STT_FILE or STT_TLS.
- A sized symbol is preferred to a sizeless symbol, if the address falls in the
  symbol range.
- A symbol with address, smaller, but closer to the given address is preferred
  over a symbol with a lower address, or a more global symbol is preferred
  over a more local one (STB_GLOBAL > STB_WEAK > STB_LOCAL).
- For sizeless symbol prefer the closest one that is in the same section as the
  address if we can determine that.
- For a sized symbol, if the beginning of its range is no closer,
  prefer the one with the smallest range (that still contains the address).
- If everything above is equal, prefer the first symbol found in the table.

Comment 23 Jan Kratochvil 2018-02-08 17:21:17 UTC
(In reply to Mark Wielaard from comment #22)
> - A sized symbol is preferred to a sizeless symbol, if the address falls in
>   the symbol range.

GDB also has this preference.

   Num:    Value          Size Type    Bind   Vis      Ndx Name
    35: 0000000000400536     2 OBJECT  LOCAL  DEFAULT   12 filename.c
    36: 0000000000400538     0 NOTYPE  LOCAL  DEFAULT   12 filename.c_end

Why is there both filename.c_end and the size of filename.c?
Maybe if filename.c was sizeless we can still get the size from filename.c_end and it would not confuse the consumers so much.

Comment 24 Nick Clifton 2018-02-08 17:24:48 UTC
[Just read comment #23 - it came in whilst I was typing this.
 The end symbol is there so that the size of the start symbol can be calculated.
 But it would be possible to move the size to the end symbol and always have
 a zero size for the start symbol.  I will run some tests locally].


Right.  .L prefixed symbols do not work because they are too local, and end
being completely stripped from the symbol table. :-(

But I do have another idea... Jan would GDB be happy if the address of the start symbol was increased by one ?  So for example in your dump from comment #13 if
the STT_OBJECT symbol filename.c was now:

      5: 0000000000000001     2 NOTYPE  LOCAL  DEFAULT    1 filename.c

(ie assigned to address 0x1 not address 0x0) would that work ?  Or would the symbol increment have to be larger than an instruction size ?

What I could do is tweak the annobin plugin to change the address of these
symbols so that they reference just after the start of the file.  It also
needs a tweak to the consumer of the annobin information (ie readelf) so that
it knows to look for symbols that are 1 byte beyond the target address, but
this can be done too.

Would this work for you ?

Comment 25 Jan Kratochvil 2018-02-08 17:52:44 UTC
(In reply to Nick Clifton from comment #24)
>  But it would be possible to move the size to the end symbol and always have
>  a zero size for the start symbol.

I expected both symbols would be sizeless and the size would be calculated by annobin-consumer (obviously as a difference between the two symbols).


> Jan would GDB be happy if the address of the
> start symbol was increased by one ?

One can have 'func1:retq; func2:retq' (with -Os), that is ambiguous again.
And I do not think it is too clean design.

I am not trying to just deal with this bugreport - I could workaround it some way in GDB.  I am just trying to prevent problems with binary tools in general in the future.  But you have significantly more experience with all of this so I am not sure I am doing the right thing.

Comment 26 Nick Clifton 2018-02-09 08:22:23 UTC
(In reply to Jan Kratochvil from comment #25)

> I expected both symbols would be sizeless and the size would be calculated
> by annobin-consumer (obviously as a difference between the two symbols).

Originally I only planned on having one symbol, and so that size was needed to calculate the end of the region it covered.  Obviously now that I have two symbols this is no longer necessary, and my tests show that zero-sized symbols
work for the decoding of the annotations, so I will check this change in.

Comment 27 Jan Kratochvil 2018-02-09 21:25:47 UTC
I expected Nick will CLOSE the Bug but I find the Comment 0 fixed now:
https://koji.fedoraproject.org/koji/taskinfo?taskID=24898920
ppc64le:
https://kojipkgs.fedoraproject.org//work/tasks/8920/24898920/build.log
...
(gdb) (gdb) No source file named javaCalls.cpp.
Breakpoint 1 (javaCalls.cpp:1) pending.
...
+ grep JavaCallWrapper::JavaCallWrapper gdb.out
Thread 2 "java" hit Breakpoint 1, JavaCallWrapper::JavaCallWrapper (
47      JavaCallWrapper::JavaCallWrapper(methodHandle callee_method, Handle receiver, JavaValue* result, TRAPS) {
#0  JavaCallWrapper::JavaCallWrapper (this=0x7ffff660d9b8, callee_method=...,
+ grep sun.misc.Unsafe


Note You need to log in before you can comment on or make changes to this bug.