Bug 1196181 - glibc: STAP_PROBE generates unparsable stap annotations on armhfp, related to "nor" constraint handling in GCC
Summary: glibc: STAP_PROBE generates unparsable stap annotations on armhfp, related to...
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc   
(Show other bugs)
Version: 26
Hardware: armhfp
OS: Linux
urgent
urgent
Target Milestone: ---
Assignee: Florian Weimer
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Keywords: Reopened
Depends On: 1674280
Blocks: ARMTracker 1239574
TreeView+ depends on / blocked
 
Reported: 2015-02-25 12:47 UTC by Matej Stuchlik
Modified: 2019-04-04 17:32 UTC (History)
30 users (show)

Fixed In Version: glibc-2.29-4.fc30
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2019-04-04 17:32:03 UTC
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Sourceware 24164 None None None 2019-04-02 11:40 UTC
GNU Compiler Collection 89146 None None None 2019-04-02 11:40 UTC
Red Hat Bugzilla 1259132 None CLOSED Make the probes-based dynamic linker interface more robust to errors 2019-04-02 11:40 UTC

Internal Trackers: 1259132

Description Matej Stuchlik 2015-02-25 12:47:07 UTC
Description of problem:
Python test suite's test_gdb started failing several days ago [1] on arm with

FAIL: test_modern_class (test.test_gdb.PrettyPrintTests)
Verify the pretty-printing of new-style class instances
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/Python-2.7.9/Lib/test/test_gdb.py", line 408, in test_modern_class
    print foo''')
  File "/builddir/build/BUILD/Python-2.7.9/Lib/test/test_gdb.py", line 233, in get_gdb_repr
    import_site=import_site)
  File "/builddir/build/BUILD/Python-2.7.9/Lib/test/test_gdb.py", line 218, in get_stack_trace
    self.assertEqual(unexpected_errlines, [])
AssertionError: Lists differ: ['warning: Probes-based dynami... != []
First list contains 4 additional elements.
First extra element 0:
warning: Probes-based dynamic linker interface failed.
+ []
- ['warning: Probes-based dynamic linker interface failed.',
-  'Reverting to original interface.',
-  '',
-  "Cannot parse expression `.L1057 4@r4'."]

I initially thought it's python specific, but I get the same error by just trying to run any executable in gdb. E.g. 'gdb vim', followed by 'run'.

[1] http://koji.fedoraproject.org/koji/taskinfo?taskID=9018657


Version-Release number of selected component (if applicable):
gdb-7.9-10.fc23.armv7hl


Steps to Reproduce:
1. gdb vim
2. run

Comment 1 Sergio Durigan Junior 2015-02-25 22:07:40 UTC
I'm looking into this problem.

Comment 2 Sergio Durigan Junior 2015-02-26 05:57:25 UTC
This expression is strange; GCC should not generate a label there in the beginning of the instruction AFAIK.

Anyway, I think I have a patch for the issue, but I still could not test it because I'm still waiting for an ARM machine to be available here...

I expect to have some results by tomorrow.

Comment 3 Sergio Durigan Junior 2015-02-26 05:58:08 UTC
When I said "in the beginning of the instruction", I mean "in the beginning of the expression".

Comment 4 Sergio Durigan Junior 2015-02-26 23:00:14 UTC
This is not GDB's fault.  What happens is that glibc's ld contains:

  provide: rtld
  name: init_start

And the following arguments:

  -4@.L1052 4@[fp,#-96]

The second argument is correct, but the first one is incomplete/wrong/incorrect.  The problem is that .L1052 is a local label in the asm file, and when gas processes this file, it removes the local labels.  There is also the fact that the probe's arguments are represented as string in the asm file, so gas does not convert this label into a proper address.

I talked to Frank Eigler (from SystemTap), and he mentioned this problem has already happened with them.  The easy solution, according to him, was to provide different parameters to the probe, or move the probe's definition slightly in the code.  I am not really sure which one would be better in this case.  Frank also mentioned that another possible solution would be to pass flags to preserve those local labels, though this would require changes in the GDB code as well.  I strongly prefer the first solution.

I am marking this bug as urgent because it prevents GDB to work optimally in ARM, if the user is using glibc-2.21.90-4.fc23.armv7hl (which contains the problem).

Comment 5 Sergio Durigan Junior 2015-02-26 23:02:06 UTC
Forgot to mention: you can see the probe and its arguments by doing:

readelf -x .note.stapsdt /lib/ld-linux-armhf.so.3

Comment 6 Carlos O'Donell 2015-02-27 04:52:01 UTC
The probe can't move, it's where it is to indicate the start of objects being loaded by the dynamic loader. The arguments are also part of a documented interface, which we might be able to change, but that requires serious discussion.

All that glibc is doing is following systemtap recommendations for how to use stap probe points. Why isn't the STAP_PROBE macro working correctly? If STAP_PROBE can't work correctly what workarounds are useful and what can we do in the future to make it work?

What exact things do the SystemTap developers recommend? Could you please provide concrete examples of solutions?

Switching to systemtap component.

Comment 7 Sergio Durigan Junior 2015-02-27 05:56:01 UTC
(In reply to Carlos O'Donell from comment #6)
> The probe can't move, it's where it is to indicate the start of objects
> being loaded by the dynamic loader. The arguments are also part of a
> documented interface, which we might be able to change, but that requires
> serious discussion.
> 
> All that glibc is doing is following systemtap recommendations for how to
> use stap probe points. Why isn't the STAP_PROBE macro working correctly? If
> STAP_PROBE can't work correctly what workarounds are useful and what can we
> do in the future to make it work?

That's fair enough.  I was initially thinking to assign this bug to SystemTap indeed, because ultimately it should be <sys/sdt.h>'s role to make sure that STAP_PROBE* macros work correctly.

> What exact things do the SystemTap developers recommend? Could you please
> provide concrete examples of solutions?
> 
> Switching to systemtap component.

Thanks for doing that.

Comment 8 Frank Ch. Eigler 2015-02-27 12:25:19 UTC
Carlos, the sdt.h stuff is working correctly in the sense that there is no bug, just a system-compositional accident.  GCC emits whatever it does in response to the inline-asm code, and the rest of the toolchain just can't consume it as is.

Various workarounds are possible, including the ones listed above.  (Folks may have misunderstood my suggestion as to changing the parameters - I was not talking about the value, but about the operand/expression used to compute it, for example making something volatile, copying it into a local/register first, those sorts of things.)

Anyway, another possible workaround is to use a different STAP_SDT_ARG_CONSTRAINT macro, as per https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation, basically forcing gcc to put the parameter somewhere handy.  For example,

#define STAP_SDT_ARG_CONSTRAINT "r"
#include <sys/sdt.h>

ought to force the value into a register.  We don't have that as a default in the sys/sdt.h header file, because that is usually unnecessarily restrictive.

The gas local-label-preserving option is another possible workaround.  (I don't quite see why gdb would have to change -- if it can resolve symbols at all in these operand expressions, the .L* stuff should just work.)

Both of the above changes are on the glibc source side.

Comment 9 Severin Gehwolf 2015-03-09 14:53:49 UTC
This problem prevents us from debugging the ARM OpenJDK 8 build failures in koji on f22/rawhide:

http://koji.fedoraproject.org/koji/taskinfo?taskID=9123210
http://koji.fedoraproject.org/koji/taskinfo?taskID=9131279

$ gdb --args java --version
GNU gdb (GDB) Fedora 7.9-10.fc22
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "armv7hl-redhat-linux-gnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from java...Reading symbols from /home/sgehwolf/java-1.8.0-openjdk/java-1.8.0-openjdk-1.8.0.40-21.b25.fc22.arm/jdk8/java...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Missing separate debuginfos, use: debuginfo-install java-1.8.0-openjdk-headless-1.8.0.40-19.b12.fc22.armv7hl
(gdb) break JLI_Launch
Breakpoint 1 at 0x104bc
(gdb) run
Starting program: /usr/bin/java --version
warning: Probes-based dynamic linker interface failed.
Reverting to original interface.

Cannot parse expression `.L976 4@r4'.
(gdb) where
#0  dl_main (phdr=<optimized out>, phnum=<optimized out>, user_entry=<optimized out>, auxv=<optimized out>) at rtld.c:2167
#1  0xb6fe7b90 in _dl_sysdep_start (start_argptr=start_argptr@entry=0xbefff530, dl_main=0xb6fd13c8 <dl_main>) at ../elf/dl-sysdep.c:249
#2  0xb6fd1338 in _dl_start_final (arg=0xbefff530, arg@entry=0x0, info=0xbefff2b0, info@entry=0xbefff2a8) at rtld.c:306
#3  0xb6fd4d0c in _dl_start (arg=0x0) at rtld.c:414
#4  0xb6fd0ad0 in _start () from /lib/ld-linux-armhf.so.3
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

$ rpm -q glibc
glibc-2.21-5.fc22.armv7hl

Comment 10 Orion Poplawski 2015-04-02 03:53:52 UTC
I'm running into this trying to debug openmpi on arm.

Comment 11 Milan Bouchet-Valat 2015-04-05 16:01:23 UTC
I suffer from this too when trying to debug Julia on arm.

Comment 12 Dennis Gilmore 2015-04-14 13:58:15 UTC
hitting this trying to debug why firefox is segfaulting

Comment 13 Robert Kuska 2015-04-15 11:52:02 UTC
Python tests fail on arm because of this.

https://kojipkgs.fedoraproject.org//work/tasks/5375/9485375/build.log

Comment 14 Jaroslav Škarvada 2015-05-25 13:59:18 UTC
Blocked by this when debugged rrdtool failure on ARM.

Comment 15 Mark Wielaard 2015-06-01 12:21:17 UTC
Note that this seems to have slipped into the f22 release and it is a pretty annoying issue trying to debug anything on arm/f22.

Comment 16 Peter Robinson 2015-06-12 14:00:58 UTC
Hey Carlos, any update on this?

Comment 17 Carlos O'Donell 2015-08-18 03:50:52 UTC
(In reply to Peter Robinson from comment #16)
> Hey Carlos, any update on this?

No update yet, sorry. This issue has resurfaced in a few other contexts and so I'm back to look at this again to see if I can fix this easily without a kludge, but maybe a kludge is all we can do e.g. limit the change to aarch64, add constraints, and worsen performance for those paths (should measure).

Comment 18 Sergio Durigan Junior 2015-08-19 07:27:31 UTC
I think that GDB can be more resilient to such breakages/bad arguments.  The debugger cannot really fix this, but it can (for example) not require the user to issue a "continue" command when it fails to parse some malformed argument.  I will work on a patch for this.

Comment 19 Sergio Durigan Junior 2015-08-22 00:34:53 UTC
GDB patch to improve the error handling: https://sourceware.org/ml/gdb-patches/2015-08/msg00629.html

Comment 20 Sergio Durigan Junior 2015-09-02 04:26:49 UTC
The GDB patches have just been pushed:

<https://sourceware.org/ml/gdb-cvs/2015-09/msg00001.html>

<https://sourceware.org/ml/gdb-cvs/2015-09/msg00002.html>

With them, GDB should be able to continue executing the inferior even when there is a problem with the probes-based dynamic linker interface.

I'll backport these patches to Fedora GDB tomorrow.

Comment 21 Sergio Durigan Junior 2015-09-02 22:30:37 UTC
So, I've just pushed two updates to Fedora GDB that should workaround this problem:

gdb-7.9.1-18.fc22
gdb-7.10-17.fc23

Please try them and report (on Bodhi) if things got better.

This is not a fix for this bug, but a workaround that should allow GDB to work better when there is an error on the probes-based dynamic linker interface.

Comment 23 Fedora End Of Life 2016-07-19 20:17:55 UTC
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 24 Carlos O'Donell 2016-07-19 23:21:35 UTC
Reopening since we want this fixed or verified.

Comment 25 Jan Kurik 2016-07-26 04:09:41 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 25 development cycle.
Changing version to '25'.

Comment 26 Fedora End Of Life 2017-02-28 09:41:23 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 26 development cycle.
Changing version to '26'.

Comment 27 Florian Weimer 2018-01-12 10:48:25 UTC
This is either a Systemtap or glibc issue.  glibc simply uses the STAP_PROBE macro, but the macros' inline assembler constraints could be considered problematic (see comment 8).

Comment 28 Sergio Durigan Junior 2019-01-31 20:34:43 UTC
(In reply to Florian Weimer from comment #27)
> This is either a Systemtap or glibc issue.  glibc simply uses the STAP_PROBE
> macro, but the macros' inline assembler constraints could be considered
> problematic (see comment 8).

Any reason why this bugs was closed as NOTABUG?  I've just seen another manifestation of it on armv7hl...

Comment 29 Florian Weimer 2019-01-31 20:41:04 UTC
(In reply to Sergio Durigan Junior from comment #28)
> (In reply to Florian Weimer from comment #27)
> > This is either a Systemtap or glibc issue.  glibc simply uses the STAP_PROBE
> > macro, but the macros' inline assembler constraints could be considered
> > problematic (see comment 8).
> 
> Any reason why this bugs was closed as NOTABUG?  I've just seen another
> manifestation of it on armv7hl...

It is not a glibc bug.  We have not investigated whether this is a bug in GDB, binutils, or Systemtap.  I assumed it had been fixed?

Do you see this with current glibc?

Comment 30 Frank Ch. Eigler 2019-01-31 20:42:59 UTC
The problem here is not assembler constraints per se, but that the compiler is emitting references to local symbols (".Lfoobar") that the assembler resolves, and has no existence within the object file or the finished executable.  gdb/stap have no chance to resolve this operand.

If the operand constraints were changed to force operands into registers, then this particular problem would not occur, but then we'd have increased register pressure (and outright failures sometimes).

If the assembler temporary symbols were passed through into the symbol tables (AFLAGS -L / --keep-locals), this could work.

Comment 31 Sergio Durigan Junior 2019-01-31 20:47:24 UTC
(In reply to Florian Weimer from comment #29)
> (In reply to Sergio Durigan Junior from comment #28)
> > Any reason why this bugs was closed as NOTABUG?  I've just seen another
> > manifestation of it on armv7hl...
> 
> It is not a glibc bug.  We have not investigated whether this is a bug in
> GDB, binutils, or Systemtap.  I assumed it had been fixed?
> 
> Do you see this with current glibc?

Yes, the bug is happening with current glibc on armv7hl.  You can take a look at the logs here:

https://kojipkgs.fedoraproject.org//work/tasks/8666/32358666/build.log

I'm still investigating this problem, but it looks exactly like the problem first described in this bug.

I'm taking the liberty to reopen this issue.

Comment 32 Florian Weimer 2019-01-31 21:10:28 UTC
Is this still about the init_start probe, or something else?

Comment 33 Sergio Durigan Junior 2019-01-31 21:13:13 UTC
I'm in the process of reserving an ARM machine to test this out.  I'll confirm when I know.

Comment 34 Florian Weimer 2019-01-31 21:24:17 UTC
Looking at glibc-2.28.9000-37.fc30.armv7hl, it still looks it's about the same probes:

Note section [22] '.note.stapsdt' of 836 bytes at offset 0x21868:
  Owner          Data size  Type
  stapsdt               43  Version: 3
    PC: 0x35dc, Base: 0x203dc, Semaphore: 0
    Provider: rtld, Name: init_start, Args: '-4@.L1209 4@r4'
  stapsdt               46  Version: 3
    PC: 0x3ef0, Base: 0x203dc, Semaphore: 0
    Provider: rtld, Name: init_complete, Args: '-4@.L1212 4@r4'

We have this in the glibc sources:

elf/rtld.c:1597:  LIBC_PROBE (init_start, 2, LM_ID_BASE, r);
elf/rtld.c:2296:  LIBC_PROBE (init_complete, 2, LM_ID_BASE, r);

And we've got:

dlfcn/dlfcn.h:47:# define LM_ID_BASE    0       /* Initial namespace.  */

include/stap-probe.h:43:# define LIBC_PROBE(name, n, ...)       \
include/stap-probe.h-44-  LIBC_PROBE_1 (MODULE_NAME, name, n, ## __VA_ARGS__)
include/stap-probe.h-45-
include/stap-probe.h:46:# define LIBC_PROBE_1(lib, name, n, ...) \
include/stap-probe.h-47-  STAP_PROBE##n (lib, name, ## __VA_ARGS__)

So this should turn into:

  STAP_PROBE2(rtld, init_start, LM_ID_BASE, r);
  STAP_PROBE2(rtld, init_complete, LM_ID_BASE, r);

The question now is why GCC emits 0 in this way.  On x86-64, we get this instead:

  stapsdt               53  Version: 3
    PC: 0x3bfb, Base: 0x25760, Semaphore: 0
    Provider: rtld, Name: init_start, Args: '-4@$0 8@%rbx'
  stapsdt               56  Version: 3
    PC: 0x43d7, Base: 0x25760, Semaphore: 0
    Provider: rtld, Name: init_complete, Args: '-4@$0 8@%rbx'

Comment 35 Jakub Jelinek 2019-01-31 21:29:43 UTC
If you want answer about GCC< you need to include the exact STAP_PROBE* macros too, so that it is clear what the constraint is, what the operand modifiers are
and what exactly it is referring to.

Comment 36 Sergio Durigan Junior 2019-01-31 21:34:04 UTC
(In reply to Florian Weimer from comment #34)
> Looking at glibc-2.28.9000-37.fc30.armv7hl, it still looks it's about the
> same probes:

Ah, thanks for looking into this.  It's really hard to get an ARM machine on Beaker now.

> Note section [22] '.note.stapsdt' of 836 bytes at offset 0x21868:
>   Owner          Data size  Type
>   stapsdt               43  Version: 3
>     PC: 0x35dc, Base: 0x203dc, Semaphore: 0
>     Provider: rtld, Name: init_start, Args: '-4@.L1209 4@r4'
>   stapsdt               46  Version: 3
>     PC: 0x3ef0, Base: 0x203dc, Semaphore: 0
>     Provider: rtld, Name: init_complete, Args: '-4@.L1212 4@r4'

Yeah, this is the same problem originally reported.

> We have this in the glibc sources:
> 
> elf/rtld.c:1597:  LIBC_PROBE (init_start, 2, LM_ID_BASE, r);
> elf/rtld.c:2296:  LIBC_PROBE (init_complete, 2, LM_ID_BASE, r);
> 
> And we've got:
> 
> dlfcn/dlfcn.h:47:# define LM_ID_BASE    0       /* Initial namespace.  */
> 
> include/stap-probe.h:43:# define LIBC_PROBE(name, n, ...)       \
> include/stap-probe.h-44-  LIBC_PROBE_1 (MODULE_NAME, name, n, ## __VA_ARGS__)
> include/stap-probe.h-45-
> include/stap-probe.h:46:# define LIBC_PROBE_1(lib, name, n, ...) \
> include/stap-probe.h-47-  STAP_PROBE##n (lib, name, ## __VA_ARGS__)
> 
> So this should turn into:
> 
>   STAP_PROBE2(rtld, init_start, LM_ID_BASE, r);
>   STAP_PROBE2(rtld, init_complete, LM_ID_BASE, r);

Is glibc also using the STAP_SDT_ARG_CONSTRAINT macro?

Comment 37 Frank Ch. Eigler 2019-01-31 21:36:25 UTC
I expect this default for non-ppc is in effect here:

# define STAP_SDT_ARG_CONSTRAINT        nor

and the usual /usr/include/sys/sdt.h - which hardly ever is modified.

Is the problem that the ARM platform can't encode the (int)0 literal any other way than hiding it inside the text/code segment as data, or at least in this context?

Comment 38 Florian Weimer 2019-01-31 21:51:50 UTC
(In reply to Frank Ch. Eigler from comment #37)
> I expect this default for non-ppc is in effect here:
> 
> # define STAP_SDT_ARG_CONSTRAINT        nor
> 
> and the usual /usr/include/sys/sdt.h - which hardly ever is modified.

glibc does not override, either.

> Is the problem that the ARM platform can't encode the (int)0 literal any
> other way than hiding it inside the text/code segment as data, or at least
> in this context?

No, looking at cross-compiler output, the problem is the offsetable memory operand, constraint "o".  I suspect that due to the way the architecture works, the machine description does not prefer constant operands over memory operands (because a constraint such as "no" can never appear for a real instruction, so the alternative simply does not arise in GCC itself).  I'm not actually familiar with Arm, though.

I wonder if Systemtap should just default to "nr" on Arm.

Comment 39 Jakub Jelinek 2019-01-31 21:59:56 UTC
The pushing of the constants into minipool happens in arm_reorg -> note_invalid_constants
      /* Things we need to fix can only occur in inputs.  */
      if (recog_data.operand_type[opno] != OP_IN)
        continue;

      /* If this alternative is a memory reference, then any mention
         of constants in this alternative is really to fool reload
         into allowing us to accept one there.  We need to fix them up
         now so that we output the right code.  */
      if (op_alt[opno].memory_ok)
        {
          rtx op = recog_data.operand[opno];

          if (CONSTANT_P (op))
            {
              if (do_pushes)
                push_minipool_fix (insn, address, recog_data.operand_loc[opno],
                                   recog_data.operand_mode[opno], op);
            }
and the rule it uses is simple, if the constraint allows a memory, then it pushes it into memory, no matter whether it is also allowed to be a constant or not.
Perhaps a PR should be filed upstream and see what the rationale is for that and if inline asm where the constants are allowed directly in addition to memory could avoid that, but I guess you need a workaround for older GCC versions anyway.
As Florian said, if the constraints don't allow memory, nothing is changed, so "n" or "nr" etc. works.

Comment 40 Florian Weimer 2019-01-31 22:23:48 UTC
(In reply to Jakub Jelinek from comment #39)
> Perhaps a PR should be filed upstream and see what the rationale is for that
> and if inline asm where the constants are allowed directly in addition to
> memory could avoid that, but I guess you need a workaround for older GCC
> versions anyway.

Thanks, I filed an upstream bug.

Comment 41 Frank Ch. Eigler 2019-01-31 23:10:53 UTC
(In reply to Florian Weimer from comment #38)

> I wonder if Systemtap should just default to "nr" on Arm.

That's possible, but would pessimize all code that uses this header.

Another option is to have glibc put that 0 into a static global var
thing, and pass a reference to that to the stap probe.

Comment 42 Florian Weimer 2019-02-01 10:43:29 UTC
Frank, do you think we can disable glibc Systemtap probes on 32-bit Arm?

Comment 43 Florian Weimer 2019-02-01 10:48:16 UTC
Sergio, what exactly does GDB need here?  I'm surprised by the strong dependency GDB has here.  The Systemtap probes are not supposed to be part of the ABI.

Can we rename the init_start and init_complete probes and drop the constant LM_ID_BASE argument?

Comment 44 Frank Ch. Eigler 2019-02-01 11:30:52 UTC
systemtap per se doesn't require any of these probes.  They are for our users, not ourselves.

GDB may benefit from some of them for shared library event tracking; Sergio etc. may advise.  (These should probably be specially marked in the glibc source code, and perhaps made as simple & portable as possible.)

Just turning off sdt.h markers in arm glibc would be a last resort IMHO.

Comment 45 Sergio Durigan Junior 2019-02-01 18:47:42 UTC
(In reply to Florian Weimer from comment #43)
> Sergio, what exactly does GDB need here?  I'm surprised by the strong
> dependency GDB has here.  The Systemtap probes are not supposed to be part
> of the ABI.

GDB uses these probes for the probes-based dynamic linker interface.  There's a very thorough message describing the problem being tackled here: <https://sourceware.org/ml/gdb-patches/2013-05/msg00624.html>.

It is important to notice that if the probes are not present or if they're not usable due to some error (as in this case), GDB will continue working normally.  The only thing that should be noticeable by the user is a longer time to load information about shared libraries.

> Can we rename the init_start and init_complete probes and drop the constant
> LM_ID_BASE argument?

If you rename the probes, GDB should be adjusted accordingly.

Comment 46 Florian Weimer 2019-02-04 11:49:10 UTC
I filed a glibc upstream bug and will try to fix it there.

Comment 47 Jakub Jelinek 2019-02-04 11:54:36 UTC
As a workaround, if you know for the specific arguments they are integral constants, you could use modified stap macro that has just "nr" for those arguments.

Comment 48 Florian Weimer 2019-02-04 12:00:40 UTC
(In reply to Jakub Jelinek from comment #47)
> As a workaround, if you know for the specific arguments they are integral
> constants, you could use modified stap macro that has just "nr" for those
> arguments.

We don't know which constants the compiler can propagate.  Given that this only affects 32-bit Arm, I would like to fix it once and forget about it.

Comment 49 Gary Benson 2019-02-04 16:38:02 UTC
(In reply to Florian Weimer from comment #43)
> Sergio, what exactly does GDB need here?  I'm surprised by the strong
> dependency GDB has here.  The Systemtap probes are not supposed to be part
> of the ABI.
> 
> Can we rename the init_start and init_complete probes and drop the constant
> LM_ID_BASE argument?

Is the problem here that LM_ID_BASE == 0, and 32-bit arm cannot easily pass a literal zero as a probe argument?  If so then it should be fixed in SDT, I don't think glibc's especially pushing the envelope here.

As for changing the probes, I'd rather you didn't, they're part of a documented interface (see elf/rtld-debugger-interface.txt).

You could kludge it with some arm-specific hack?

Comment 50 Frank Ch. Eigler 2019-02-04 17:38:54 UTC
Possible glibc-side kludges:

 - add a "#define STAP_SDT_ARG_CONSTRAINT nr" to a glibc config header - or the subject .c file
 - copy LM_ID_BASE into a static int and pass that symbol to the sys/sdt.h macro

Comment 51 Florian Weimer 2019-02-04 18:08:54 UTC
(In reply to Gary Benson from comment #49)
> Is the problem here that LM_ID_BASE == 0, and 32-bit arm cannot easily pass
> a literal zero as a probe argument?  If so then it should be fixed in SDT, I
> don't think glibc's especially pushing the envelope here.

GCC will never generate a literal constant when the "o" constraint is present.  What Arm real instructions can do in practice only matters as far as that this problem can never happen with a real instruction because there nothing that would fit a "no" alternative.

> As for changing the probes, I'd rather you didn't, they're part of a
> documented interface (see elf/rtld-debugger-interface.txt).

We document elsewhere that the Systemtap probes are not consider ABI, so I find this a bit surprising.

(In reply to Frank Ch. Eigler from comment #50)
> Possible glibc-side kludges:
> 
>  - add a "#define STAP_SDT_ARG_CONSTRAINT nr" to a glibc config header - or
> the subject .c file
>  - copy LM_ID_BASE into a static int and pass that symbol to the sys/sdt.h
> macro

I posted a real patch: https://sourceware.org/ml/libc-alpha/2019-02/msg00082.html

Comment 52 Florian Weimer 2019-02-06 11:45:42 UTC
I believe this bug will be fixed in glibc-2.29-3.fc30.  (The initial attempt to build it failed with an unrelated SIGKILL on ppc64le, though.)

Comment 53 Florian Weimer 2019-02-06 13:52:50 UTC
(In reply to Florian Weimer from comment #52)
> I believe this bug will be fixed in glibc-2.29-3.fc30.  (The initial attempt
> to build it failed with an unrelated SIGKILL on ppc64le, though.)

Sorry, rebuild of glibc in rawhide is currently blocked by bug 1673018.

Comment 54 Florian Weimer 2019-02-07 18:45:30 UTC
I managed to work around the ppc64le bug, and the armhfp fix for this bug here should be in glibc-2.29-4.fc30.  Sergio, could you give this a try, please?

If it fixes the issue, I will backport the change to all active Fedora releases.

Comment 55 Florian Weimer 2019-02-11 20:54:07 UTC
Note: Need to fix bug 1674280 before backporting this, otherwise we'll have a regression on Arm.

Comment 56 Sergio Durigan Junior 2019-02-20 19:23:33 UTC
(In reply to Florian Weimer from comment #54)
> I managed to work around the ppc64le bug, and the armhfp fix for this bug
> here should be in glibc-2.29-4.fc30.  Sergio, could you give this a try,
> please?
> 
> If it fixes the issue, I will backport the change to all active Fedora
> releases.

So far, I haven't managed to reserve an ARM machine to test this.  I tried using one of Fedora's Test Machines, and I also tried generating a scratch build and submit it to Koji.  In both attempts, the "test_gdb" test failed, but for other reasons, which suggests that the probe issue has been fixed.  I'm trying to contact the Python 3 guys and ask them to run some tests on their end, since they seem to have easy access to a setup that triggers this problem.  I'll let you know when I hear back from them.

Comment 57 Victor Stinner 2019-04-03 09:59:56 UTC
The test_gdb test of the python3 package was skipped on armv7hl because of this bug. I wrote a PR to reenable the test on armv7hl:
https://src.fedoraproject.org/rpms/python3/pull-request/104

Comment 58 Sergio Durigan Junior 2019-04-04 16:07:44 UTC
Victor and I have been trying to get an ARM machine to test this, but it seems to be nearly impossible.  I'm afraid we will have to move forward with this without doing a proper test.  Python's "test_gdb" still fails when we build on ARM, but, *apparently*, it is not the same failure that was happening before.  Which means that, *apparently*, the bug has been fixed there.  I believe we can close this bug.  We can always reopen it if needed.

Comment 60 Florian Weimer 2019-04-04 17:32:03 UTC
Fedora 30 and later will have the the upstream change.  Given the testing situation outlined in comment 58, I will not attempt to backport this change.


Note You need to log in before you can comment on or make changes to this bug.