Red Hat Bugzilla – Full Text Bug Listing
|Summary:||Problems with libthread_db startup when no-threads|
|Product:||Red Hat Enterprise Linux 3||Reporter:||Andrew Cagney <cagney>|
|Component:||glibc||Assignee:||Roland McGrath <roland>|
|Status:||CLOSED WONTFIX||QA Contact:|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2005-07-25 18:42:34 EDT||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Bug Depends On:|
Description Andrew Cagney 2004-09-10 16:43:03 EDT
Roland writes: PTRACE_GET_THREAD_AREA should never fail unless its argument is bogus (or the thread is not stopped or such conditions that make all ptrace calls fail). Have you caught the failing ptrace call and seen its arguments and error code? If it's not an EINVAL failure due to a bogus argument, then the ptrace failure may be some new kernel bug I'm not aware of. Hmm, I think I might have a clue as to the problem. td_ta_thr_iter has a special case for the early-startup situation, which uses td_ta_map_lwp2thr (whereas normally it will just work by reading the inferior's data structures). In a dynamically linked program, the thread register setup is done in the dynamic linker early on, before it ever loads libpthread. So I bet what happens is that gdb never tries to use these libthread_db functions so early on in that case (td_ta_new won't work until libpthread is loaded in the inferior). By the time libpthread has been loaded and libthread_db starts being used, the dynamic linker has set up the thread registers and all is hunky dory. In the static case, gdb is trying to figure out life via libthread_db before the setup has happened. In early startup, the thread ID (pthread_t value, delivered in th_unique) has not actually been assigned. So there is no way to have td_ta_map_lwp2thr and td_ta_thr_iter deliver as a special case at startup the same th_unique value that will be seen later on for the initial LWP. On machines that use a normal register instead of the thread_area magic, there is no call here to fail, and in th_unique you will just get the value of that register before it's been set up as the thread register (possibly garbage, but I'm pretty sure those are always going to be zero). Can you file a bug on glibc and assign to me for this? I think the solution will have to be making libthread_db fail gracefully at this point. If gdb can cope with a routine like td_ta_thr_iter or td_ta_map_lwp2thr failing here, we can just have them do so with a more distinct error code. To be consistent with what gdb sees from libthread_db in the dynamic case, td_ta_new would fail with TD_NOLIBTHREAD until the inferior has done enough initialization for these calls to work. That might be a little more difficult to make happen, but is most likely doable if it is best for gdb.
Comment 1 Roland McGrath 2004-09-10 23:08:07 EDT
Does the obvious test case for this work with some available gdb? Or else, please provide a test case (pointing me at some hacked gdb is fine).
Comment 2 Andrew Cagney 2004-09-13 18:25:36 EDT
> Please report what arguments are passed to ptrace in the call that fails. > Please use strace on gdb to see what it says ptrace is returning. > Then please verify thrice that in fact ptrace fails to set errno nonzero. > If verified, that is a kernel or libc bug for failing to report the right value. here's a strace extract: ptrace(PTRACE_GETFPXREGS, 8295, 0, 0xbfff8ef8) = 0 ptrace(0x19 /* PTRACE_??? */, 8295, 0, 0xbfff9218) = -1 EINVAL (Invalid argument) open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) write(2, "ptrace get thread area: Invalid "..., 41ptrace get thread area: Invalid argument ) = 41 it is returning EINVAL (disregard my other comment)
Comment 3 Andrew Cagney 2004-09-13 19:18:56 EDT
> Does the obvious test case for this work with some available gdb? > Or else, please provide a test case (pointing me at some hacked gdb is > fine). The obvious testcase doesn't work - GDB needs to be hacked to force an attempted thread-db load at process startup. Try: cagney@tomago$ ~cagney/PENDING/rh-th-static/N-tomago-i686-pc-linux-gnu/gdb/gdb ~cagney/PENDING/rh-th-static/N-tomago-i686-pc-linux-gnu/gdb/a.out GNU gdb 2004-09-08-cvs Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"... (gdb) set debug lin-lwp 1 (gdb) b main Breakpoint 1 at 0x8048274: file ../../src/gdb/testsuite/gdb.threads/staticthreads.c, line 53. (gdb) run Starting program: /home/cagney/PENDING/rh-th-static/N-tomago-i686-pc-linux-gnu/gdb/a.out CW: waitpid 10730 received Trace/breakpoint trap (stopped) CW: waitpid 10730 received Child exited (stopped) CW: waitpid 10730 received Trace/breakpoint trap (stopped) Using host libthread_db library "/lib/tls/libthread_db.so.1". [Thread debugging using libthread_db enabled] ptrace get thread area: Invalid argument warning: Cannot find new threads: generic error LLR: PTRACE_SINGLESTEP process 10730, 0 (resume event thread) LLW: waitpid 10730 received Trace/breakpoint trap (stopped) At this point GDB has locked up due to confusion over there being no threads yet it is trying to wait on threads. The ptrace fails due to a zero ADDR. What do you mean by: > So there is no way to have td_ta_map_lwp2thr and td_ta_thr_iter deliver as a special case at startup the same th_unique value that will be seen later on for the initial LWP. You write: > To be consistent with what gdb sees from libthread_db in the dynamic case, td_ta_new would fail with TD_NOLIBTHREAD until the inferior has done enough initialization for these calls to work. That might be a little more difficult to make happen, but is most likely doable if it is best for gdb. For the static case, if td_ta_new returned TD_NOLIBTHREAD, GDB would have to assume that there's no libthread-db at all and hence would not set up any event trap that could lead to it seeing that libthread-db could be loaded. Oops. That would mean an additional state ``libthread db loaded but not active'' and a corresponding event to indicate that it is active. -- I'm left wondering if GDB can instead trigger on a ptrace event and/or just always assume there are threads.
Comment 4 Roland McGrath 2004-09-14 19:42:41 EDT
To clarify my earlier remark, basically for an early part of the execution of the program, the "thread" abstraction simply does not exist yet--only the LWP layer exists (with just the initial LWP alive so far). The "thread ID" that the libthread_db interface deals with is determined dynamically by code in the inferior that has not run yet. There is no meaningful value to tell you, and in fact the data structures that things like td_thr_event_enable modify do not even exist yet. I think the thing to do is leave td_ta_new as it is, so it will work right away (even before the process has run if the libpthread symbols and text are available in the static binary). I will fix td_ta_map_lwp2thr so that it fails characteristically in the situation of an LWP that is not yet initialized to correspond to any thread ID, so it returns TD_NOTHR instead of TD_ERR (and won't attempt the bogus ps_get_thread_area callback). This will make td_ta_thr_iter also fail with TD_NOTHR when there are no threads at all to find. If gdb attempts this and just handles it failing during early phases, meaning there is just the known LWP state, then that should be fine. The issue remains of how to make sure gdb stops to make this attempt before the first time a new thread is created. Until startup has progressed far enough that map_lwp2thr/thr_iter work on the main LWP, there is no way to enable the event reporting so that pthread_create will hit the event breakpoint. My first idea is to abuse the event reporting interface a bit so that there is a certain event address that the initial LWP will always hit sometime in startup before thread creation is possible, and it will do so even without event reporting being enabled. TD_SWITCHTO actually has an appropriate meaning. I could make td_ta_event_addr on TD_SWITCHTO return an address in the program that will be run before any user code that might create threads.
Comment 5 Andrew Cagney 2004-09-15 19:29:33 EDT
Lets see if I can remember the tools meeting discussion. We can: - have td_ta_new succeed when the thread-db is valid So the staticthreads call would fail. - have gdb try to load libthread-db when: --- it's just attached (or started an inferior or ...) --- it sees a CLONE event --- for compatibility, it sees an shlib load This should give us a fairly high level of forward / backward compatibility.
Comment 6 Roland McGrath 2004-09-15 21:12:37 EDT
In fact, I think it is best not to change td_ta_new. It will continue to do as it does now, which means it will fail with TD_NOLIBPTHREAD when the libpthread symbols are not found. In a static binary, they will always be found and so it will work the first time. In a dynamic binary, it will work as soon as the DSO is loaded. So, at startup/attach try td_ta_new. Until td_ta_new has returned something other than TD_NOLIBPTHREAD, try again on a shlib load. In a static binary where td_ta_new failed the first time, it will never work later (the binary just doesn't have libpthread in it), so there is no need to try it at stops other than shlib load. However, td_ta_thr_iter and td_ta_map_lwp2thr will fail with TD_NOTHR during the period when the initial LWP has not yet initialized its thread register. After td_ta_new works, then try td_ta_thr_iter. If td_ta_thr_iter fails with TD_NOTHR, try it again at the first CLONE event stop. Btw, both 2.6 and RHEL3 do have the CLONE tracing facilities. Vanilla 2.4 does not. However, vanilla 2.4 only supports linuxthreads anyway, and linuxthreads doesn't have this issue: libthread_db will fake the correct results at any time during startup, they don't depend on the same kind of dynamic initialization.
Comment 8 Ulrich Drepper 2005-07-25 18:42:34 EDT
Closing it since there has been no reaction.