Description of problem: the new kernel strips the coredump when the is a pipe in core_pattern Version-Release number of selected component (if applicable): 2.6.33 How reproducible: 100% Steps to Reproduce: 1. install a simple hook to core_pattern 2. set core_pipe_limit to != 0 (I use 4) 3. kill some app with SEGV 4. the hook is invoked, but the saved coredump has 0 size Actual results: empty coredump Expected results: non-empty coredump Additional info: I'm in the middle of testing various kernel's version(+patches) will post the results in a while to help narrowing this down.
My test results: packages were taken from Fedora cvs, built in koji(.32 in brew). == 2.6.32 - without umh-refactor patch: == $ cat /proc/sys/kernel/core_pipe_limit 4 $ cat /proc/sys/kernel/core_pattern |/usr/libexec/abrt-hook-ccpp /var/cache/abrt %p %s %u %c result: hook was able to write the coredump == 2.6.32-with the patch == $ cat /proc/sys/kernel/core_pipe_limit 4 $ cat /proc/sys/kernel/core_pattern |/usr/libexec/abrt-hook-ccpp /var/cache/abrt %p %s %u %c - no coredump gets to helper - setting ulimit -c doesn't help == kernel-2.6.33-0.47.rc8.git1 == 0 size coredump, ulimit -c doesn't help == kernel-2.6.33-0.47.rc8.git1 without-umh-refactor == works fine when ulimit -c is set
Adding to the F13Alpha blocker list for review at the next blocker review meeting. Jiri noted on IRC that this affects only C/C++ program failures. Python and kerneloops failures will still be caught. The Alpha release criteria [1] do not explicitly call out that ABRT must be able to capture and report failures to Bugzilla, but a similar criteria exists for the installer. [1] https://fedoraproject.org/wiki/Fedora_13_Alpha_Release_Criteria
I'm pretty sure this was introduced w/ andi kleens work that I sucekd in with that uhm-refactor. I just tried it on the latest -mm and get the same results.
Looked at the uhm-refactor.patch. Just guessing how it might work. Please ignore if I am completely wrong. umh_pipe_setup() in exec.c calls create_write_pipe() and create_read_pipe(). Those calls were previously done in "main" thread, but now umh_pipe_setup() is called in ____call_usermodehelper thread (the number of underscores in the name is important here). __call_usermodehelper() in kmod.c calls kernel_thread(____call_usermodehelper), and in the case of pipes it is called _without_ CLONE_FS and CLONE_FILES flags. Previously that worked, because the pipes were created in the main thread, and the child process inherited a copy of them. Now that does not work, because the pipes are created in the child thread, and that does NOT affect the main thread, which dumps the core. The core is not written to the write side of the pipe. So I would try to add CLONE_FILES and CLONE_FS flags to the second kernel_thread() call in the __call_usermodehelper() function in kmod.c.
Good analysis, but I'm not sure its accurate, given that the whole setup works properly, just as long as we don't set core_pipe_limit to a non-zero value. I'm not sure what the interaction there is.
Hmm, this is odd, I thought I had re-created the problem upstream, but not that I try it with the latest -mm the problem seems gone. I'm going to re-install with the latest rawhide kernel and debug from there.
so, I figured out how I reproduced this previously. I was testing with abrt specifically. I just tried the latest upstream -mm tree and rawhide with a simplified core collector, and everything is working fine: cat /usr/bin/catch_core #!/bin/sh /usr/bin/logger -s "SLEEPING" sleep 10 /usr/bin/logger -s "CATCHING CORE" cat >> /tmp/newcore ####End /usr/bin/catch_core echo "|/usr/bin/catch_core" > /proc/sys/kernel/core_pattern echo 4 > /proc/sys/kernel/core_pipe_limit if I crash a process with this setup, I can get a core file in /tmp/newcore that is full sized and recognizable to crash consistently. This leads me to believe that the problem is in ABRT.
I tried you script with this results: if I crash something under root I get full coredump, but if I try it as a non-root, I get zero size coredump. The same applies for the ABRT's hook. J.
dang it, apparently yes, you're supposed to need to run the core_collector as root (i.e. suid), but apparently thats not working now either.
Created attachment 395526 [details] patch to skip uid check in do_coredump found the problem. Additional check in do_coredump tests the value of the process uid against the fsuid to make sure they match. Thats relevant for files (to prevent ownership hacks and sealing of information out of cores), but irrelevant for pipes. This patch fixes the issue
comitted to rawhide. I'll need to send this to -mm as well