Description of problem: This is still broken in GDB mainline. If you run GDB's testsuite's print-threads program within GDB and single-step main until some thread enters zombie state, and at that point run info threads, GDB will crash because prune_threads() in the beginning of info_threads_command() deletes the thread from the thread_list because it's no longer active, but iterating over threads still sees them, and then crashes as described below. Version-Release number of selected component (if applicable): gdb-6.3.0.0-1.130, rawhide, GDB trunk in upstream CVS How reproducible: Sometimes, but not always Steps to Reproduce: 1.Start print-threads within GDB 2.Single-step in main 3.Run `info threads' after every pthread_create Alternate reproducer (works every time, at least in rawhide): 1.Start print-threads within GDB 2.Set a breakpoint in siglongjmp or in __pthread_unwind 3.Run the program 4.When the program stops at the breakpoint, run `info threads' Actual results: Most often, GDB will crash at `info threads', when switch_to_thread() tries to read the PC from the deleted thread, which ultimately calls thread_db_fetch_registers() with an inferior_ptid that is no longer in thread_list, such that find_thread_pid() returns NULL, and then thread_db_map_id2thr(thread_info, 1) crashes when it dereferences thread_info. Sometimes GDB will print errors such as: Cannot insert breakpoint -16. Error accessing memory address 0x1170650b: Input/output error. with nonsensical memory addresses. Such addresses are extracted from a jmpbuf that is longjmp()ed to as part of thread unwinding started by pthread_exit(). This might be related to the crash, or it might just be that GDB extracts the wrong address for some yet-to-be-determined reason. Expected results: No such crash Additional info:
longjmp problem split into bug 195449.
Created attachment 131139 [details] Draft patch (for GDB-CVS) proposal for handling+tracing behing pthread_exit(3) Patch disables handling of the TD_DEATH notification from __nptl_death_event() and tries to keep the thread in TD_THR_ZOMBIE state as long as possible before its LWP ceases to be. While expecting regressions so far no such was found, statistically it improves some testcases (+24 passes,-14 fails), no negatives. ('longjmp' part not resolved; not reproducible for GDB-CVS)
It has been fixed (and I successfully tested it) different way in the upstream CVS. To be closed as CLOSED-NEXTRELEASE? It will not make it to RHEL5, though. 2006-07-12 Daniel Jacobowitz <dan> * linux-thread-db.c (td_thr_getfpregs_p, td_thr_getgregs_p) (td_thr_setfpregs_p, td_thr_setgregs_p, thread_db_get_info) (thread_db_fetch_registers, thread_db_store_registers) (thread_db_thread_alive): Delete. (thread_db_load): Don't look up regset functions. (thread_db_pid_to_str): Simplify. (init_thread_db_ops): Do not set to_fetch_registers, to_store_registers, or to_thread_alive.
Can we backport the fixes, please?
Created attachment 134272 [details] Trivia backport of the CVS patch Patch backported and successfully tested on RHEL4U3 (x86_64-4as.lab.boston.redhat.com): kernel-smp-2.6.9-34.0.2.EL.x86_64 on top of gdb-6.3.0.0-1.132.EL4 Still the patch should not be needed for setting the RHEL-4.5 flags, if I understand the process.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
* Tue Oct 5 2006 Jan Kratochvil <jan.kratochvil> - 6.3.0.0-1.135 - Avoid crash of 'info threads' if stale threads exist (BZ 195429).
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2007-0229.html