Bug 566460

Summary: kernel 2.6.33 strips coredump when using pipe in core_pattern
Product: [Fedora] Fedora Reporter: Jiri Moskovcak <jmoskovc>
Component: kernelAssignee: Neil Horman <nhorman>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 13CC: anton, dfediuck, dougsland, gansalmon, itamar, jlaska, jonathan, kernel-maint, kklic, kmcmartin, kparal, nhorman
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.33-0.47.rc8.git1.fc13 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-02-24 19:14:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 538273    
Attachments:
Description Flags
patch to skip uid check in do_coredump none

Description Jiri Moskovcak 2010-02-18 14:41:16 UTC
Description of problem:
the new kernel strips the coredump when the is a pipe in core_pattern

Version-Release number of selected component (if applicable):
2.6.33

How reproducible:
100%

Steps to Reproduce:
1. install a simple hook to core_pattern
2. set core_pipe_limit to != 0 (I use 4)
3. kill some app with SEGV
4. the hook is invoked, but the saved coredump has 0 size
  
Actual results:
empty coredump

Expected results:
non-empty coredump

Additional info:
I'm in the middle of testing various kernel's version(+patches) will post the results in a while to help narrowing this down.

Comment 1 Jiri Moskovcak 2010-02-18 15:13:40 UTC
My test results: packages were taken from Fedora cvs, built in koji(.32 in
brew).

== 2.6.32 - without umh-refactor patch: ==

$ cat /proc/sys/kernel/core_pipe_limit 
4

$ cat /proc/sys/kernel/core_pattern 
|/usr/libexec/abrt-hook-ccpp /var/cache/abrt %p %s %u %c

result: hook was able to write the coredump

== 2.6.32-with the patch ==
$ cat /proc/sys/kernel/core_pipe_limit 
4

$ cat /proc/sys/kernel/core_pattern 
|/usr/libexec/abrt-hook-ccpp /var/cache/abrt %p %s %u %c

- no coredump gets to helper
- setting ulimit -c doesn't help

== kernel-2.6.33-0.47.rc8.git1 ==
0 size coredump, ulimit -c doesn't help

== kernel-2.6.33-0.47.rc8.git1 without-umh-refactor ==
works fine when ulimit -c is set

Comment 2 James Laska 2010-02-18 17:04:23 UTC
Adding to the F13Alpha blocker list for review at the next blocker review meeting.  Jiri noted on IRC that this affects only C/C++ program failures.  Python and kerneloops failures will still be caught.

The Alpha release criteria [1] do not explicitly call out that ABRT must be able to capture and report failures to Bugzilla, but a similar criteria exists for the installer.

[1] https://fedoraproject.org/wiki/Fedora_13_Alpha_Release_Criteria

Comment 3 Neil Horman 2010-02-18 18:19:59 UTC
I'm pretty sure this was introduced w/ andi kleens work that I sucekd in with that uhm-refactor.  I just tried it on the latest -mm and get the same results.

Comment 4 Karel Klíč 2010-02-19 20:23:56 UTC
Looked at the uhm-refactor.patch. 
Just guessing how it might work. 
Please ignore if I am completely wrong.

umh_pipe_setup() in exec.c calls create_write_pipe() and create_read_pipe(). Those calls were previously done in "main" thread, but now umh_pipe_setup() is called in ____call_usermodehelper thread (the number of underscores in the name is important here).

__call_usermodehelper() in kmod.c calls kernel_thread(____call_usermodehelper), and in the case of pipes it is called _without_ CLONE_FS and CLONE_FILES flags. Previously that worked, because the pipes were created in the main thread, and the child process inherited a copy of them. Now that does not work, because the pipes are created in the child thread, and that does NOT affect the main thread, which dumps the core. The core is not written to the write side of the pipe.

So I would try to add CLONE_FILES and CLONE_FS flags to the second kernel_thread() call in the __call_usermodehelper() function in kmod.c.

Comment 5 Neil Horman 2010-02-19 20:33:10 UTC
Good analysis, but I'm not sure its accurate, given that the whole setup works properly, just as long as we don't set core_pipe_limit to a non-zero value.  I'm not sure what the interaction there is.

Comment 6 Neil Horman 2010-02-19 21:01:39 UTC
Hmm, this is odd, I thought I had re-created the problem upstream, but not that I try it with the latest -mm the problem seems gone.  I'm going to re-install with the latest rawhide kernel and debug from there.

Comment 7 Neil Horman 2010-02-21 18:25:37 UTC
so, I figured out how I reproduced this previously.  I was testing with abrt specifically.  I just tried the latest upstream -mm tree and rawhide with a simplified core collector, and everything is working fine:

cat /usr/bin/catch_core
#!/bin/sh
/usr/bin/logger -s "SLEEPING"
sleep 10

/usr/bin/logger -s "CATCHING CORE"


cat >> /tmp/newcore
####End /usr/bin/catch_core





echo "|/usr/bin/catch_core" > /proc/sys/kernel/core_pattern
echo 4 > /proc/sys/kernel/core_pipe_limit

if I crash a process with this setup, I can get a core file in /tmp/newcore that is full sized and recognizable to crash consistently.

This leads me to believe that the problem is in ABRT.

Comment 8 Jiri Moskovcak 2010-02-21 22:55:06 UTC
I tried you script with this results: if I crash something under root I get full coredump, but if I try it as a non-root, I get zero size coredump. The same applies for the ABRT's hook.

J.

Comment 9 Neil Horman 2010-02-22 14:10:55 UTC
dang it, apparently yes, you're supposed to need to run the core_collector as root (i.e. suid), but apparently thats not working now either.

Comment 10 Neil Horman 2010-02-22 17:53:49 UTC
Created attachment 395526 [details]
patch to skip uid check in do_coredump

found the problem.   Additional check in do_coredump tests the value of the process uid against the fsuid to make sure they match.  Thats relevant for files (to prevent ownership hacks and sealing of information out of cores), but irrelevant for pipes.  This patch fixes the issue

Comment 11 Neil Horman 2010-02-22 17:57:03 UTC
comitted to rawhide.  I'll need to send this to -mm as well