Bug 168392
Summary: | kill -6 of multi-threaded application takes 30 minutes to finish | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Issue Tracker <tao> | ||||||
Component: | kernel | Assignee: | Dave Anderson <anderson> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 3.0 | CC: | lwang, peterm, petrides, tao | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | RHSA-2006-0144 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2006-03-15 16:36:42 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 168424 | ||||||||
Attachments: |
|
Description
Issue Tracker
2005-09-15 16:41:09 UTC
Does a "ps -l" on the vmcore file show that the PID 2357 (the core-dumping yield() caller) has not run for many minutes based upon its "last_run" jiffies value? Would the customer be willing to test a kernel with the yield() call removed? Dave, crash> ps -l 2357 [131154] PID: 2357 TASK: f2d2c000 CPU: 1 COMMAND: "mta" I believe Neil already tried a yield()-less kernel. See IT# 75584: 07-18-2005 10:35am by nhorman Ok, so I would call that a non-starter on the simple removal of the yield call then. > crash> ps -l 2357 > [131154] PID: 2357 TASK: f2d2c000 CPU: 1 COMMAND: "mta" Sorry, I should have made myself clearer. If you do "ps -l" alone, you will get the last_run stats for all processes, with the task with the largest value (jiffies snapshot) at the top of the list, which would be the most recently-run. Then for RHEL3 with an HZ of 100, if you took the difference in jiffies values between the top-most task and the "mta" task above (131154), divided that by 100, you would get the number of seconds since the PID 2357 last ran. Given that they were driven to the point of doing an alt-sysrq-c, and the backtrace *still* showed that the core-dumping task was still in the yield() call, that would seemingly point to the problem. In other words, the yield() call would result in exactly one call to schedule(), and I'm very curious whether it just never came back. That *seems* to be what happened -- correct me if I'm wrong. > I believe Neil already tried a yield()-less kernel. See IT# 75584: That would have been interesting info to put in the bugzilla when it was escalated. In any case, looking at the history of the 75584, I see this: > I tried the test with debug kernel (remove yield() from coredump_wait() ) > four times. On all case, core dump finished without being locked up soon. But > first time, core file was not created. After that , core file was created on > three tries. The core file seems to be normal. ( we can examine them by gdb) > As a result, removing yield() from coredump_wait() is effective for this > issue, but it does not seem to be perfect. Please discuss this measure with > RedHat ...and, therefore I'm sure I don't know what goose to chase here. Who the hell knows what happened the 4th time? If they ran without the yield(), which to me looks like a reasonable thing, even Ingo seems to agree in the upstream discussion, although it doesn't seem to have ever been posted/accepted (?), and then they were able to get a core dump, then what's the problem? If the system still hung without the yield(), try getting another vmcore (and the vmlinux and vmlinux.debug) while it's in that state. The information so far is way too nebulous. > Ok, so I would call that a non-starter on the simple removal of the yield call > then. Actually it seems that the removal of the yield() call is a pre-requisite. So -- anything more from me is also a non-starter... OK, given that this bugzilla was put on RHEL3U7Proposed, let's try this yet again... We need two pieces of information: 1. From the vmcore file referenced in this bugzilla, please give me the crash utility output of the command: "ps -l" (Do *not* add the PID argument as done in comment #7) I want to see indisputable evidence that the yield() call in coredump_wait() is the culprit behind the "30 minutes to finish" problem that this bugzilla is purportedly complaining about. 2. Running Neil Horman's kernel, which by looking into our CVS archives, which I'm presuming is "kernel-smp-2.4.21-36.nhorman.IT75584.i686.rpm", please reproduce the problem, and after a signficant number of minutes with the core-dumping task hung, enter "alt-sysrq-c", and make the resultant vmcore available. Either that, or declare that the patch addresses the problem. Now, in the quagmire that this IT degenerated into, the reporter states that the when running nhorman's debug kernel the core dump operations completed without delay: > I tried the test with debug kernel (remove yield() from coredump_wait() ) > four times. On all case, core dump finished without being locked up soon. But > first time, core file was not created. After that , core file was created on > three tries. The core file seems to be normal. ( we can examine them by gdb) > As a result, removing yield() from coredump_wait() is effective for this > issue, but it does not seem to be perfect. Please discuss this measure with > RedHat I find this hard to believe. But let's take it on its face value. What exactly happened the 1 time when the "core file was not created"? If the core-dumping thread ran to completion, then the only conceivable reasons for no resultant core dump file (that I can think of) would be if: - the prerequisite "ulimit -c unlimited" had *not* been done first - the filesystem was full - the filesystem was unwriteable - the dumping application had done a chdir(), and the core file is in some other directory. But if none of those "ifs" were true, then it makes no sense that the core file was not created. That being the case, what "measure" is there to discuss? Again, even if it were true that the core file was magically not created -- despite all logical reasoning -- then at this point in time, and without being there to confirm it, how on earth are we supposed to know what happened? Please -- enlighten me... In any case, getting back to the reason behind this bugzilla, if the problem cannot be reproduced with nhorman's kernel, then let's consider the yield()-less patch effective, and put it up in front of rhkernel-list for discussion. As far as the core file not being created, we're going to have to have some solid evidence as opposed to some nebulous statement in the IT. Created attachment 119682 [details]
ps -l output from vmcore
Created attachment 119683 [details]
remove yield()
Chris, thanks for the info. Looking at the relevant parts of the "ps -l" output, the yield()'ing "mta" task (PID 2357) has been blocked for ~11 minutes: crash> ps -l [196314] PID: 2798 TASK: f17f4000 CPU: 0 COMMAND: "mta" [196208] PID: 3198 TASK: f150c000 CPU: 1 COMMAND: "mta" [196208] PID: 3602 TASK: f1458000 CPU: 1 COMMAND: "bash" [196208] PID: 20835 TASK: f22ac000 CPU: 1 COMMAND: "in.telnetd" ... [131154] PID: 0 TASK: c5d78000 CPU: 1 COMMAND: "swapper" [131154] PID: 2357 TASK: f2d2c000 CPU: 1 COMMAND: "mta" [131154] PID: 2376 TASK: f1db8000 CPU: 0 COMMAND: "mta" [131154] PID: 10256 TASK: f3bcc000 CPU: 0 COMMAND: "spam_flt" ... crash> The current jiffies counter is at least 196314, whereas the "mta" task last ran when it was 131154. That's a delta of of 65160 clock ticks, at 100 ticks/second, or 651 seconds, which would be 10:51. 537 of the 539 "mta" threads have run since the core-dumping one last ran. It's not "30 minutes" as the bugzilla complains, but I'm guessing that they may have previously seen a ~30 minute timespan, and in this particular attempt as registered in the vmcore, they had just entered alt-sysrq-c after hanging ~11 minutes. In your tests, Chris, perhaps the typical 8-minute interval was smaller than theirs because you weren't simultaneously running a whole bunch of other stuff as seen in the vmcore? In any case, it's definitely worth posting the patch to rhkernel-list, and we'll see how much flame it will generate... A fix for this problem has just been committed to the RHEL3 U7 patch pool this evening (in kernel version 2.4.21-37.6.EL). An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0144.html |