132315 – Problems with libthread_db startup when no-threads

Bug 132315 - Problems with libthread_db startup when no-threads

Summary: Problems with libthread_db startup when no-threads

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	glibc
Sub Component:
Version:	3.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Roland McGrath
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	108897
TreeView+	depends on / blocked

Reported:	2004-09-10 20:43 UTC by Andrew Cagney
Modified:	2007-11-30 22:07 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2005-07-25 22:42:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Andrew Cagney 2004-09-10 20:43:03 UTC

Roland writes:

PTRACE_GET_THREAD_AREA should never fail unless its argument is bogus
(or the
thread is not stopped or such conditions that make all ptrace calls fail).
Have you caught the failing ptrace call and seen its arguments and
error code?
If it's not an EINVAL failure due to a bogus argument, then the ptrace
failure
may be some new kernel bug I'm not aware of.

Hmm, I think I might have a clue as to the problem.  td_ta_thr_iter has a
special case for the early-startup situation, which uses td_ta_map_lwp2thr
(whereas normally it will just work by reading the inferior's data
structures).
 In a dynamically linked program, the thread register setup is done in the
dynamic linker early on, before it ever loads libpthread.  So I bet
what happens
is that gdb never tries to use these libthread_db functions so early
on in that
case (td_ta_new won't work until libpthread is loaded in the
inferior).  By the
time libpthread has been loaded and libthread_db starts being used,
the dynamic
linker has set up the thread registers and all is hunky dory.  In the
static
case, gdb is trying to figure out life via libthread_db before the
setup has
happened.  

In early startup, the thread ID (pthread_t value, delivered in
th_unique) has
not actually been assigned.  So there is no way to have
td_ta_map_lwp2thr and
td_ta_thr_iter deliver as a special case at startup the same th_unique
value
that will be seen later on for the initial LWP.  On machines that use
a normal
register instead of the thread_area magic, there is no call here to
fail, and in
th_unique you will just get the value of that register before it's
been set up
as the thread register (possibly garbage, but I'm pretty sure those
are always
going to be zero).  

Can you file a bug on glibc and assign to me for this?

I think the solution will have to be making libthread_db fail
gracefully at this
point.  If gdb can cope with a routine like td_ta_thr_iter or
td_ta_map_lwp2thr
failing here, we can just have them do so with a more distinct error code.
To be consistent with what gdb sees from libthread_db in the dynamic case,
td_ta_new would fail with TD_NOLIBTHREAD until the inferior has done
enough
initialization for these calls to work.  That might be a little more
difficult
to make happen, but is most likely doable if it is best for gdb.

Comment 1 Roland McGrath 2004-09-11 03:08:07 UTC

Does the obvious test case for this work with some available gdb?
Or else, please provide a test case (pointing me at some hacked gdb is
fine).

Comment 2 Andrew Cagney 2004-09-13 22:25:36 UTC

> Please report what arguments are passed to ptrace in the call that
fails.

> Please use strace on gdb to see what it says ptrace is returning.
> Then please verify thrice that in fact ptrace fails to set errno
nonzero.
> If verified, that is a kernel or libc bug for failing to report the
right value.

here's a strace extract:

ptrace(PTRACE_GETFPXREGS, 8295, 0, 0xbfff8ef8) = 0
ptrace(0x19 /* PTRACE_??? */, 8295, 0, 0xbfff9218) = -1 EINVAL
(Invalid argument)
open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) =
-1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY) =
-1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1
ENOENT (No such file or directory)
open("/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1
ENOENT (No such file or directory)
open("/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1
ENOENT (No such file or directory)
open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT
(No such file or directory)
write(2, "ptrace get thread area: Invalid "..., 41ptrace get thread
area: Invalid argument
) = 41

it is returning EINVAL (disregard my other comment)

Comment 3 Andrew Cagney 2004-09-13 23:18:56 UTC

> Does the obvious test case for this work with some available gdb?
> Or else, please provide a test case (pointing me at some hacked gdb is
> fine).

The obvious testcase doesn't work - GDB needs to be hacked to force an
attempted thread-db load at process startup.  Try:

cagney@tomago$
~cagney/PENDING/rh-th-static/N-tomago-i686-pc-linux-gnu/gdb/gdb
~cagney/PENDING/rh-th-static/N-tomago-i686-pc-linux-gnu/gdb/a.out
GNU gdb 2004-09-08-cvs
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and
you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i686-pc-linux-gnu"...
(gdb) set debug lin-lwp 1
(gdb) b main
Breakpoint 1 at 0x8048274: file
../../src/gdb/testsuite/gdb.threads/staticthreads.c, line 53.
(gdb) run
Starting program:
/home/cagney/PENDING/rh-th-static/N-tomago-i686-pc-linux-gnu/gdb/a.out 
CW:  waitpid 10730 received Trace/breakpoint trap (stopped)
CW:  waitpid 10730 received Child exited (stopped)
CW:  waitpid 10730 received Trace/breakpoint trap (stopped)
Using host libthread_db library "/lib/tls/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
ptrace get thread area: Invalid argument
warning: Cannot find new threads: generic error
LLR: PTRACE_SINGLESTEP process 10730, 0 (resume event thread)
LLW: waitpid 10730 received Trace/breakpoint trap (stopped)

At this point GDB has locked up due to confusion over there being no
threads yet it is trying to wait on threads.

The ptrace fails due to a zero ADDR.

What do you mean by:

> So there is no way to have td_ta_map_lwp2thr and td_ta_thr_iter
deliver as a special case at startup the same th_unique value
that will be seen later on for the initial LWP.

You write:

> To be consistent with what gdb sees from libthread_db in the dynamic
case, td_ta_new would fail with TD_NOLIBTHREAD until the inferior has
done enough initialization for these calls to work.  That might be a
little more difficult to make happen, but is most likely doable if it
is best for gdb.

For the static case, if td_ta_new returned TD_NOLIBTHREAD, GDB would
have to assume that there's no libthread-db at all and hence would not
set up any event trap that could lead to it seeing that libthread-db
could be loaded.  Oops.

That would mean an additional state ``libthread db loaded but not
active'' and a corresponding event to indicate that it is active.

--

I'm left wondering if GDB can instead trigger on a ptrace event and/or
just always assume there are threads.

Comment 4 Roland McGrath 2004-09-14 23:42:41 UTC

To clarify my earlier remark, basically for an early part of the
execution of the program, the "thread" abstraction simply does not
exist yet--only the LWP layer exists (with just the initial LWP alive
so far).  The "thread ID" that the libthread_db interface deals with
is determined dynamically by code in the inferior that has not run
yet.  There is no meaningful value to tell you, and in fact the data
structures that things like td_thr_event_enable modify do not even
exist yet.

I think the thing to do is leave td_ta_new as it is, so it will work
right away (even before the process has run if the libpthread symbols
and text are available in the static binary).  

I will fix td_ta_map_lwp2thr so that it fails characteristically in
the situation of an LWP that is not yet initialized to correspond to
any thread ID, so it returns TD_NOTHR instead of TD_ERR (and won't
attempt the bogus ps_get_thread_area callback).

This will make td_ta_thr_iter also fail with TD_NOTHR when there are
no threads at all to find.  If gdb attempts this and just handles it
failing during early phases, meaning there is just the known LWP
state, then that should be fine.

The issue remains of how to make sure gdb stops to make this attempt
before the first time a new thread is created.  Until startup has
progressed far enough that map_lwp2thr/thr_iter work on the main LWP,
there is no way to enable the event reporting so that pthread_create
will hit the event breakpoint.

My first idea is to abuse the event reporting interface a bit so that
there is a certain event address that the initial LWP will always hit
sometime in startup before thread creation is possible, and it will do
so even without event reporting being enabled.  TD_SWITCHTO actually
has an appropriate meaning.  I could make td_ta_event_addr on
TD_SWITCHTO return an address in the program that will be run before
any user code that might create threads.

Comment 5 Andrew Cagney 2004-09-15 23:29:33 UTC

Lets see if I can remember the tools meeting discussion.

We can:

- have td_ta_new succeed when the thread-db is valid
So the staticthreads call would fail.

- have gdb try to load libthread-db when:
--- it's just attached (or started an inferior or ...)
--- it sees a CLONE event
--- for compatibility, it sees an shlib load

This should give us a fairly high level of forward / backward
compatibility.

Comment 6 Roland McGrath 2004-09-16 01:12:37 UTC

In fact, I think it is best not to change td_ta_new.  It will continue
to do as it does now, which means it will fail with TD_NOLIBPTHREAD
when the libpthread symbols are not found.  In a static binary, they
will always be found and so it will work the first time.  In a dynamic
binary, it will work as soon as the DSO is loaded.

So, at startup/attach try td_ta_new.  Until td_ta_new has returned
something other than TD_NOLIBPTHREAD, try again on a shlib load.
In a static binary where td_ta_new failed the first time, it will
never work later (the binary just doesn't have libpthread in it),
so there is no need to try it at stops other than shlib load.

However, td_ta_thr_iter and td_ta_map_lwp2thr will fail with TD_NOTHR
during the period when the initial LWP has not yet initialized its
thread register.  After td_ta_new works, then try td_ta_thr_iter.
If td_ta_thr_iter fails with TD_NOTHR, try it again at the first CLONE
event stop.  

Btw, both 2.6 and RHEL3 do have the CLONE tracing facilities.
Vanilla 2.4 does not.  However, vanilla 2.4 only supports linuxthreads
anyway, and linuxthreads doesn't have this issue: libthread_db will
fake the correct results at any time during startup, they don't depend
on the same kind of dynamic initialization.

Comment 8 Ulrich Drepper 2005-07-25 22:42:34 UTC

Closing it since there has been no reaction.

Note You need to log in before you can comment on or make changes to this bug.