Description of problem: My debugger reads the /proc/PID/maps file immediately following an exec of a new program so it can find where the program was actually loaded in order to successfully debug things like PIE executables that could wind up loaded anywhere. With the newest fedora 11 kernel 2.6.29.6-217.2.16.fc11.x86_64, the debugger hangs attempting to read the /proc maps file (for that matter a "cat" command hangs as well if it tries to read the same debugged process maps file). This also happens in the 2.6.31 rawhide kernels. I'm not sure exactly when it started, certainly not too long ago it was working fine. I'm guessing something somewhere is waiting for the process to get somewhere it is never gonna get as long as the debugger has it stopped? Version-Release number of selected component (if applicable): kernel-2.6.29.6-217.2.16.fc11.x86_64 How reproducible: Every time Steps to Reproduce: 1. start my debugger 2. watch it hang 3. Actual results: hung on read() Expected results: /proc/pid/maps file data showing up at read() Additional info: I'm going to try and produce a sample test program and will attach it if I succeed.
I just tried booting the 2.6.29.6-217.2.8.fc11.x86_64 kernel and the problem does not exist there, so going from 2.8.fc11 to 2.16.fc11 broke it.
Created attachment 359134 [details] test-map-read.c program to demo bug g++ -o test-map-read -g test-map-read.c ./test-map-read This test program demonstrates the hang on kernel-2.6.29.6-217.2.16.fc11.x86_64 but on kernel-2.6.29.6-217.2.8.fc11.x86_64 it runs with no hang. It forks and execs a copy of /bin/bash -i and arranges to follow all the activity of all the children by turning on all the PTRACE_SETOPTIONS flags for all new processes. Every time is gets a SIGTRAP stop with the exec event set, it attempts to open and read the /proc/pid/maps file for the pid that just execed. Since bash forks and execs a bunch of little programs like id and tputs right away, this hangs right away, printing something like: pid 4161 status: stopped with SIGTRAP (exec event) Begin map for pid 4161 but never managing to actually read and print the maps file after that. Apparently the setoptions stuff is necessary to cause the bug. A simpler program that just does a traceme and an exec of a child has no problem reading the maps file for that child at the first status following the exec. On the older kernel, it runs fine, echoing all the maps files to stdout on each exec event and you can type in things like "/bin/true" and watch them be execed by the bash under debug (or type in exit and get out of the test).
If I hack the test program to PTRACE_SINGLESTEP one instruction before I try to read the maps file, the hang goes away, so I guess reading the maps file joins using PTRACE_KILL as an action that doesn't work immediately following an exec.
Created attachment 359296 [details] 2.6.31-rc6 patch which causes the problem I have bisected the problem down to this 2.6.31-rc6 patch: commit 704b836cbf19e885f8366bccb2e4b0474346c02d Author: Oleg Nesterov <oleg> Date: Fri Jul 10 03:27:40 2009 +0200 mm_for_maps: take ->cred_guard_mutex to fix the race with exec
(In reply to comment #4) > Created an attachment (id=359296) [details] > 2.6.31-rc6 patch which causes the problem > > I have bisected the problem down to this 2.6.31-rc6 patch: > > commit 704b836cbf19e885f8366bccb2e4b0474346c02d > Author: Oleg Nesterov <oleg> > Date: Fri Jul 10 03:27:40 2009 +0200 > > mm_for_maps: take ->cred_guard_mutex to fix the race with exec And that was backported to the Fedora 2.6.29.6-217.2.16.fc11 kernel.
Yes. The tracee reports PTRACE_EVENT_EXEC and stops holding that mutex. I hope that the process which hangs (cat or whatever) can be killed? Not sure what can we do, will try to think tomorrow. _Imho_, the real problem is that do_execve() holds this mutex throughout, while it is only needed to make sure PTRACE_ATTACH sees the right creds. Oh. I always disliked ->cred_guard_mutex very much ;)
>I hope that the process which hangs (cat or whatever) can be killed? It doesn't appear to be a hard hang. I've been able to kill stuff off (may have to kill the debugged child first, I'm not sure - my kill-kids scripts just works up the tree from the leaf processes, so that's always the way I do it).
Created attachment 359690 [details] exec: drop ->cred_guard_mutex earlier So. Let me repeat. I do not think 04b836cbf19e885f8366bccb2e4b0474346c02d "mm_for_maps: take ->cred_guard_mutex to fix the race with exec" is buggy. I strongly believe the usage of ->cred_guard_mutex in do_execve() pathes was wrong from the very beginning: a6f76f23d297f70e2a6b3ec607f7aeeea9e37e8d "CRED: Make execve() take advantage of copy-on-write credentials" I think we need something like this patch. And I think it would be great to find the way to kill this mutex in ->task_struct. I have no idea how to test this patch, and I don't really understand the new creds management, but I am going to send this patch to lkml.
fyi, this patch was commited in 2.6.31 tree: exec: do not sleep in TASK_TRACED under ->cred_guard_mutex a2a8474c3fff88d8dd52d05cb450563fb26fd26c
This message is a reminder that Fedora 11 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 11. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '11'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 11's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 11 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.