Bug 520206 - ptrace and /proc/pid/maps file deadlock
Summary: ptrace and /proc/pid/maps file deadlock
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 11
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-08-28 22:20 UTC by Tom Horsley
Modified: 2010-06-28 14:21 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2010-06-28 14:21:59 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
test-map-read.c program to demo bug (7.08 KB, text/plain)
2009-08-29 02:09 UTC, Tom Horsley
no flags Details
2.6.31-rc6 patch which causes the problem (1.65 KB, patch)
2009-08-31 17:59 UTC, Joe Korty
no flags Details | Diff
exec: drop ->cred_guard_mutex earlier (3.42 KB, patch)
2009-09-03 14:59 UTC, Oleg Nesterov
no flags Details | Diff

Description Tom Horsley 2009-08-28 22:20:57 UTC
Description of problem:

My debugger reads the /proc/PID/maps file immediately following an exec
of a new program so it can find where the program was actually loaded
in order to successfully debug things like PIE executables that could
wind up loaded anywhere.

With the newest fedora 11 kernel 2.6.29.6-217.2.16.fc11.x86_64, the
debugger hangs attempting to read the /proc maps file (for that matter
a "cat" command hangs as well if it tries to read the same debugged
process maps file).

This also happens in the 2.6.31 rawhide kernels.

I'm not sure exactly when it started, certainly not too long ago it
was working fine.

I'm guessing something somewhere is waiting for the process to get
somewhere it is never gonna get as long as the debugger has it stopped?

Version-Release number of selected component (if applicable):
kernel-2.6.29.6-217.2.16.fc11.x86_64

How reproducible:
Every time

Steps to Reproduce:
1. start my debugger
2. watch it hang
3.
  
Actual results:
hung on read()

Expected results:
/proc/pid/maps file data showing up at read()

Additional info:
I'm going to try and produce a sample test program and will attach it
if I succeed.

Comment 1 Tom Horsley 2009-08-28 22:27:27 UTC
I just tried booting the 2.6.29.6-217.2.8.fc11.x86_64 kernel and the
problem does not exist there, so going from 2.8.fc11 to 2.16.fc11 broke
it.

Comment 2 Tom Horsley 2009-08-29 02:09:55 UTC
Created attachment 359134 [details]
test-map-read.c program to demo bug

g++ -o test-map-read -g test-map-read.c
./test-map-read

This test program demonstrates the hang on kernel-2.6.29.6-217.2.16.fc11.x86_64
but on kernel-2.6.29.6-217.2.8.fc11.x86_64 it runs with no hang.

It forks and execs a copy of /bin/bash -i and arranges to follow all the
activity of all the children by turning on all the PTRACE_SETOPTIONS flags
for all new processes.

Every time is gets a SIGTRAP stop with the exec event set, it attempts to
open and read the /proc/pid/maps file for the pid that just execed.

Since bash forks and execs a bunch of little programs like id and tputs
right away, this hangs right away, printing something like:

pid 4161 status: stopped with SIGTRAP (exec event)
Begin map for pid 4161

but never managing to actually read and print the maps file after that.

Apparently the setoptions stuff is necessary to cause the bug. A simpler
program that just does a traceme and an exec of a child has no problem reading
the maps file for that child at the first status following the exec.

On the older kernel, it runs fine, echoing all the maps files to stdout on
each exec event and you can type in things like "/bin/true" and watch them
be execed by the bash under debug (or type in exit and get out of the test).

Comment 3 Tom Horsley 2009-08-29 02:27:35 UTC
If I hack the test program to PTRACE_SINGLESTEP one instruction before I
try to read the maps file, the hang goes away, so I guess reading the maps
file joins using PTRACE_KILL as an action that doesn't work immediately
following an exec.

Comment 4 Joe Korty 2009-08-31 17:59:29 UTC
Created attachment 359296 [details]
2.6.31-rc6 patch which causes the problem

I have bisected the problem down to this 2.6.31-rc6 patch:

commit 704b836cbf19e885f8366bccb2e4b0474346c02d
Author: Oleg Nesterov <oleg>
Date:   Fri Jul 10 03:27:40 2009 +0200

    mm_for_maps: take ->cred_guard_mutex to fix the race with exec

Comment 5 Chuck Ebbert 2009-09-02 23:51:03 UTC
(In reply to comment #4)
> Created an attachment (id=359296) [details]
> 2.6.31-rc6 patch which causes the problem
> 
> I have bisected the problem down to this 2.6.31-rc6 patch:
> 
> commit 704b836cbf19e885f8366bccb2e4b0474346c02d
> Author: Oleg Nesterov <oleg>
> Date:   Fri Jul 10 03:27:40 2009 +0200
> 
>     mm_for_maps: take ->cred_guard_mutex to fix the race with exec  

And that was backported to the Fedora 2.6.29.6-217.2.16.fc11 kernel.

Comment 6 Oleg Nesterov 2009-09-03 00:43:02 UTC
Yes. The tracee reports PTRACE_EVENT_EXEC and stops holding that mutex.

I hope that the process which hangs (cat or whatever) can be killed?

Not sure what can we do, will try to think tomorrow. _Imho_, the real
problem is that do_execve() holds this mutex throughout, while it is
only needed to make sure PTRACE_ATTACH sees the right creds.

Oh. I always disliked ->cred_guard_mutex very much ;)

Comment 7 Tom Horsley 2009-09-03 00:55:11 UTC
>I hope that the process which hangs (cat or whatever) can be killed?

It doesn't appear to be a hard hang. I've been able to kill stuff off
(may have to kill the debugged child first, I'm not sure - my kill-kids
scripts just works up the tree from the leaf processes, so that's always
the way I do it).

Comment 8 Oleg Nesterov 2009-09-03 14:59:47 UTC
Created attachment 359690 [details]
exec: drop ->cred_guard_mutex earlier

So. Let me repeat. I do not think 04b836cbf19e885f8366bccb2e4b0474346c02d
"mm_for_maps: take ->cred_guard_mutex to fix the race with exec" is buggy.

I strongly believe the usage of ->cred_guard_mutex in do_execve() pathes
was wrong from the very beginning: a6f76f23d297f70e2a6b3ec607f7aeeea9e37e8d
"CRED: Make execve() take advantage of copy-on-write credentials"

I think we need something like this patch. And I think it would be great
to find the way to kill this mutex in ->task_struct.

I have no idea how to test this patch, and I don't really understand
the new creds management, but I am going to send this patch to lkml.

Comment 9 Oleg Nesterov 2009-09-07 16:28:49 UTC
fyi, this patch was commited in 2.6.31 tree:

    exec: do not sleep in TASK_TRACED under ->cred_guard_mutex
    a2a8474c3fff88d8dd52d05cb450563fb26fd26c

Comment 10 Bug Zapper 2010-04-28 10:03:03 UTC
This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '11'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 11's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 11 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 11 Bug Zapper 2010-06-28 14:21:59 UTC
Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.