From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3b) Gecko/20030210 Description of problem: When a program crashes, I'd want a text file with a stacktrace of the program to be dumped in addition to the core file. A textual backtrace obviously wouldn't contain as much information, but it would contain *some* information and be much easier to handle. E-mailing somebody a kilobyte or so of backtrace is easily done, while mailing somebody tens of megabytes of core dump may not be. Many developers today get absolutely *no* real information about crashes on users' systems (because they are for one reason or other unable to cope with core files). If this was implemented, it would be possible for them to get at least some information. In more buzzword compliant terms, this would improve managability (sp?) for software deployed on Linux. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Write a program that crashes. 2. Run it. Actual Results: A binary core file was created. Expected Results: A text file with a backtrace of the crashed program should have been created. Also, depending on the amount of debug information present in the binary, local variables in all frames should be included in this file as well. Additional info:
this is what bug buddy is for......
According to the Bug-buddy page at "http://www.gnu.org/directory/All_Packages_in_Directory/bugbuddy.html": "The program can be started both from gmc and from the crash dialog." This doesn't help very much on a server. "It supports GNOME, KDE, Debian, and Ximian bug tracking systems." This doesn't help very much if you aren't GNOME, KDE, Debian or Ximian.
> Assiging to Dave Anderson to consider whether this is easy and > useful. This is not a kernel component. The kernel has no concept of what goes on in user-space other than the register state when the process last entered the kernel via an exception or system call; if the kernel recognizes that a problem exists that is serious enough such that the kernel cannot allow the task to continue, it writes out the core dump file (if ulimit allows it), and kills the process. The kernel is responsible for copying things like the process register state, individual thread states, the crashing thread's task_struct, and the address space contents for VM areas that are writeable (stack, data) into a core dump file. It is up to a user-space tool to take the core dump along with the exectutable file and to be able to reconstruct a back trace. Doing so is no small task -- check out the gdb code for doing back traces for all its supported processor types, it's incredibly complex! In fact, the kernel doesn't even have the smarts to create its own back traces when it does an "oops" trace -- it simply looks for all kernel addresses left on the stack and prints them, regardless whether they were left there by previous entries into the the kernel. If the problem has to do with emailing core files around, then they should just run gdb on the core file, get a back trace, and send it to the developer. I'm sorry, but this doesn't belong in the kernel.
I'm not in any way married to the idea of putting this in the kernel; the reason I choose "kernel" as a component for this was I didn't know where to put it. So you are probably right; this doesn't belong in the kernel. And as you say, all (most?) of the work must take place in userspace. The issue is, just as you say, with mailing core files around. This is about lowering the threshold of producing useful bug reports. If one looks at at least SUN's Java Virtual Machine, BEA's JRockit, and GNOME as mentioned above, all of them contain custom crash-dumping code. I'd guess other large products (Oracle?) do the same. The reason is using GDB is simply too hard for most people (yes, even some people deploying server applications). AFAIK, no generic tool for doing this currently exists for server applications. Instead of everyone needing to write this code by themselves, I think it would be better if a generic solution was shipped with RHEL.
Tim, can you reassign this to some other user-space component/owner for further investigation? Like Johan mentions, it's hard to pick out exactly where it should go.