Description of problem: When two or more crashes happen at the same time (within one second) their core dumps are truncated when handled with custom hook. $ cat /proc/sys/kernel/core_pattern |/tmp/zzz/simple.sh %s %c %p %u %g %t e simple.sh: #!/bin/sh logger "$0: $3" exec cat > /tmp/zzz/mycore.$3 Version-Release number of selected component (if applicable): 3.1.1-1.fc16.x86_64 How reproducible: Always Steps to Reproduce: 1. install postgresql-9.1rc1 2. patch it to produce concurrent crashes 3. use simple.sh as core_pattern 4. run make installcheck-parallel Actual results: BFD: Warning: /tmp/zzz/new/mycore.12516 is truncated: expected core file size >= 35057664, found: 15433728. Expected results: No truncated cores Additional info: - This doesn't happen when using core.%p as core_pattern - I'm unable to reproduce the same thing with smaller binaries (sleep, top)
Created attachment 539296 [details] ls showing truncated core dumps
Created attachment 539297 [details] ls showing truncated core dumps
Created attachment 539298 [details] ls when using core.%p
Created attachment 539299 [details] patch postgres to cause problem at issue
Created attachment 539300 [details] patch postgres to cause problem at issue
Further steps for reproducing with postgresql: /usr/local/pgsql/bin/initdb -D $PGDATA /usr/local/pgsql/bin/postgres -D $PGDATA > logfile 2>&1 make installcheck-parallel
That script seems overly simple. And I'm not sure why one would desire to use something like that anyway, but it's fairly immaterial to the problem I guess. If you take a look here: http://www.mail-archive.com/xorg-devel@lists.x.org/msg04573.html you'll see that signals delivered to the helper process during dump can cause truncated core dumps. Since the script isn't doing anything with signals at all, it's certainly plausible that a signal is pending and causing this. You might try adding: trap '' SIGHUP SIGALRM SIGSEGV SIGABRT SIGINT etc to the top of the script to ignore the signals that can be ignored and see if that helps.
Oleg and Neil, both of you have worked in the coredump and coredump helper code before (at least according to git). Do either of you have any comments on the potential issues with using a pipe helper and getting truncated core files? The "interrupted by signal" issue is pretty clear in the kernel code, but I don't know if there are other potential causes.
(In reply to comment #8) > ... signals delivered to the helper process during dump can cause > truncated core dumps. Since the script isn't doing anything with signals at > all, it's certainly plausible that a signal is pending and causing this. Well, that's plausible I guess, but then the question becomes what's signaling the helper process and why. It's entirely likely that Postgres would be attempting to send signals to the now-crashed process. Is it possible that the kernel is delivering those to the coredump helper? If so, I'd argue that that's a bug.
I can't be certain, but I think there is a bug in the wait_for_dump_helpers code in the do_coredump path. We're checking signal_pending(current), but IIRC the intent is to make sure that the core dump helper is making forward progress but making sure the the SIGIO we send to it every loop iteration is getting handled. The problem is that current points to the crashing process, not the reading process, so if the crashing process gets a signal delivered from somewhere, we drop out of that loop before the pipe reader is done doing its work. I'll write a patch for this in the AM for you to test.
Created attachment 539460 [details] patch to point signal_pending to reader task Here you go, untested as of yet, but I'll build it in the morning.
(In reply to comment #8) > You might try adding: > > trap '' SIGHUP SIGALRM SIGSEGV SIGABRT SIGINT > > etc to the top of the script to ignore the signals that can be ignored and see > if that helps. Trapping the signals doesn't help. Same results.
Created attachment 539650 [details] updated patch updated patch that fixes a missing hunk in the last version. Build is running here: http://koji.fedoraproject.org/koji/taskinfo?taskID=3557610 Please test and confirm if this fixes the problem for you.
(In reply to comment #14) > > Created attachment 539650 [details] > updated patch Doesn't look correct. signal_pending(cp->reader_task) is not what we want. This is the known problem, should be fixed upstream. It is not that trivial. See http://marc.info/?l=linux-kernel&m=131989970411759 and the whole thread.
(In reply to comment #15) > (In reply to comment #14) > > > > Created attachment 539650 [details] > > updated patch > > Doesn't look correct. signal_pending(cp->reader_task) is not > what we want. > > This is the known problem, should be fixed upstream. It is not > that trivial. See http://marc.info/?l=linux-kernel&m=131989970411759 > and the whole thread. By "should be fixed upstream" do you mean "it still needs to be fixed upstream"? I looked upstream but don't see anything that stands out as having fixed this issue. Oh, and the thread is rather short basically ending with the message you point out. Was there anything further in another thread?
Oleg, why doesn't it look correct? IIRC the signal_pending check was there to ensure that the SIGIO we send in the loop gets handled. The only problem is that we're sending the signal to the reader task, but we're checking for pending on the writer task.
(In reply to comment #16) > > > This is the known problem, should be fixed upstream. It is not > > that trivial. See http://marc.info/?l=linux-kernel&m=131989970411759 > > and the whole thread. > > By "should be fixed upstream" do you mean "it still needs to be fixed > upstream"? Yes. I'll try to fix this in 3.3. > Was there anything further in another thread? There was another short thread http://marc.info/?t=131983286300002 Plus this was discussed before... But I that message I point accurately summarises the problems and the possible solutions. I hope.
(In reply to comment #17) > IIRC the signal_pending check was there to > ensure that the SIGIO we send in the loop gets handled. No, no. From http://marc.info/?l=linux-kernel&m=131989970411759 here are 2 reasons. if signal_pending() == T then: - pipe_wait() is pointless, it won't block. We do not want a busywait loop. - And probably even wait_for_dump_helpers() is pointless, it is quite possible that pipe_write() already failed before and the reader doesn't know this. > The only problem is > that we're sending the signal to the reader task, but we're checking for > pending on the writer task. The reader should take care of signals itself. How (and why?) the kernel should help? I'll try to make the debugging patch today...
Forgot to mention... (In reply to comment #17) > > ensure that the SIGIO we send in the loop gets handled. Please note that we do not actually send SIGIO. Of course, unless the reader does F_SETOWN/F_SETSIG.
Created attachment 539731 [details] [patch] suppress the signals during the coredump Suppress the signals during the coredump. This is the wrong/incomplete hack, only for testing.
Created attachment 539733 [details] [patch v2] suppress the signals during the coredump Suppress the signals during the coredump. This is the wrong/incomplete hack, only for testing. v2: check signal_pending() later, after wait_for...
(In reply to comment #22) > Created attachment 539733 [details] > [patch v2] suppress the signals during the coredump > > Suppress the signals during the coredump. > > This is the wrong/incomplete hack, only for testing. > > v2: check signal_pending() later, after wait_for... Richard, do you need a scratch kernel built to test with this, or can you build that yourself?
(In reply to comment #23) > Richard, do you need a scratch kernel built to test with this, or can you build > that yourself? Please build it for me. Thanks.
(In reply to comment #24) > (In reply to comment #23) > > Richard, do you need a scratch kernel built to test with this, or can you build > > that yourself? > > Please build it for me. Thanks. http://koji.fedoraproject.org/koji/taskinfo?taskID=3564158 When the above completes, please test.
(In reply to comment #25) > http://koji.fedoraproject.org/koji/taskinfo?taskID=3564158 > > When the above completes, please test. This build fixes the issue.
Ok, Oleg, it fixes the problem, how do you want to handle it from here? Since you've been participating more on this upstream, I'll just reassign this to you.
(In reply to comment #27) > Ok, Oleg, it fixes the problem, how do you want to handle it from here? As I said, I am going to send the patch(es) upstream first. Unfortunately, not immediately, I am a bit busy now. I hope I can do this before the next 3.3 merge window. > Since > you've been participating more on this upstream, I'll just reassign this to > you. OK.
Since this is still broken upstream (and has been for a while), let's move this bug to rawhide for now. The fix will show up there first and we can look at backporting to various stable releases after that.
(In reply to comment #28) > (In reply to comment #27) > > Ok, Oleg, it fixes the problem, how do you want to handle it from here? > > As I said, I am going to send the patch(es) upstream first. Unfortunately, > not immediately, I am a bit busy now. I hope I can do this before the > next 3.3 merge window. Out of curiosity, did this get fixed for 3.3?
*** Bug 678807 has been marked as a duplicate of this bug. ***
(In reply to comment #30) > (In reply to comment #28) > > (In reply to comment #27) > > > Ok, Oleg, it fixes the problem, how do you want to handle it from here? > > > > As I said, I am going to send the patch(es) upstream first. Unfortunately, > > not immediately, I am a bit busy now. I hope I can do this before the > > next 3.3 merge window. > > Out of curiosity, did this get fixed for 3.3? No, sorry :/ still on my TODO list.
Moving this to ASSIGNED so it doesn't keep showing up as NEW.
Has there been any work on this? We're running into it. Thanks, J
Oleg, I think this should be fixed with 3.10-rc1, correct? With the following commits? dc7ee2aac830e5423f41de87d50441f138f648da coredump: change wait_for_dump_helpers() to use wait_event_interruptible() 079148b919d0c58b796f9ae98bdb53028dbcd5e7 coredump: factor out the setting of PF_DUMPCORE 528f827ee0bb508af5c9466562f474675540473e coredump: introduce dump_interrupted() acdedd99b0f3bff9b4bb2103a6b1268c03d1f963 coredump: sanitize the setting of signal->group_exit_code 6cd8f0acae3420afce37bf51a9ff8c2c20342af5 coredump: ensure that SIGKILL always kills the dumping thread 403bad72b67d8b3f5a0240af5023adfa48132a65 coredump: only SIGKILL should interrupt the coredumping task
Yes. See also https://bugzilla.redhat.com/show_bug.cgi?id=773215#c10 We do not need all these commits if we only want to protect the coredumping from non-SIGKILL signals. Probably this bug should bug can be closed? (I'll backport these changes for rhel).
(In reply to comment #39) > Yes. > > See also https://bugzilla.redhat.com/show_bug.cgi?id=773215#c10 > > We do not need all these commits if we only want to protect the > coredumping from non-SIGKILL signals. We have them all in rawhide because we just follow Linus' tree ;). > Probably this bug should bug can be closed? (I'll backport these > changes for rhel). Yep, agreed. Thanks!