Bug 168392

Summary: kill -6 of multi-threaded application takes 30 minutes to finish
Product: Red Hat Enterprise Linux 3 Reporter: Issue Tracker <tao>
Component: kernelAssignee: Dave Anderson <anderson>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: lwang, peterm, petrides, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2006-0144 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-03-15 16:36:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 168424    
Attachments:
Description Flags
ps -l output from vmcore
none
remove yield() none

Description Issue Tracker 2005-09-15 16:41:09 UTC
Escalated to Bugzilla from IssueTracker

Comment 6 Dave Anderson 2005-09-27 18:27:37 UTC
Does a "ps -l" on the vmcore file show that the PID 2357 (the core-dumping
yield() caller) has not run for many minutes based upon its "last_run" jiffies
value?
 
Would the customer be willing to test a kernel with the yield() call
removed?




Comment 7 Chris Williams 2005-09-28 18:25:24 UTC
Dave,

crash> ps -l 2357
[131154]  PID: 2357   TASK: f2d2c000  CPU: 1   COMMAND: "mta"

I believe Neil already tried a yield()-less kernel. See IT# 75584:

07-18-2005 10:35am by nhorman
Ok, so I would call that a non-starter on the simple removal of the yield call then.

Comment 8 Dave Anderson 2005-09-28 19:44:18 UTC
> crash> ps -l 2357
> [131154]  PID: 2357   TASK: f2d2c000  CPU: 1   COMMAND: "mta"

Sorry, I should have made myself clearer.  If you do "ps -l" alone,
you will get the last_run stats for all processes, with the task
with the largest value (jiffies snapshot) at the top of the list,
which would be the most recently-run.  Then for RHEL3 with an HZ
of 100, if you took the difference in jiffies values between the
top-most task and the "mta" task above (131154), divided that by 100,
you would get the number of seconds since the PID 2357 last ran.

Given that they were driven to the point of doing an alt-sysrq-c,
and the backtrace *still* showed that the core-dumping task was
still in the yield() call, that would seemingly point to the
problem.  In other words, the yield() call would result in exactly
one call to schedule(), and I'm very curious whether it just never
came back.  That *seems* to be what happened -- correct me if I'm wrong.


> I believe Neil already tried a yield()-less kernel. See IT# 75584:

That would have been interesting info to put in the bugzilla when it was
escalated.  In any case, looking at the history of the 75584, I see this:

> I tried the test with debug kernel (remove  yield() from coredump_wait() )
> four times. On all case, core dump finished without being locked up soon.  But
> first time, core file was not created.  After that , core file was created  on
> three tries. The core file seems to be normal. ( we can examine them by gdb)
> As a result, removing yield() from coredump_wait()  is effective for this
> issue, but it does not seem to be perfect.  Please discuss  this measure with
> RedHat

...and, therefore I'm sure I don't know what goose to chase here.  Who the
hell knows what happened the 4th time? 

If they ran without the yield(), which to me looks like a reasonable thing,
even Ingo seems to agree in the upstream discussion, although it doesn't
seem to have ever been posted/accepted (?), and then they were able to get a
core dump, then what's the problem?

If the system still hung without the yield(), try getting another vmcore
(and the vmlinux and vmlinux.debug) while it's in that state.  The information
so far is way too nebulous.

> Ok, so I would call that a non-starter on the simple removal of the yield call
> then.

Actually it seems that the removal of the yield() call is a pre-requisite.

So -- anything more from me is also a non-starter...  







Comment 10 Dave Anderson 2005-10-06 18:30:26 UTC
OK, given that this bugzilla was put on RHEL3U7Proposed, let's try this yet
again...

We need two pieces of information:

1. From the vmcore file referenced in this bugzilla, please give me the
   crash utility output of the command: "ps -l"

   (Do *not* add the PID argument as done in comment #7) 

   I want to see indisputable evidence that the yield() call in coredump_wait()
   is the culprit behind the "30 minutes to finish" problem that this bugzilla
   is purportedly complaining about.

2. Running Neil Horman's kernel, which by looking into our CVS archives,
   which I'm presuming is "kernel-smp-2.4.21-36.nhorman.IT75584.i686.rpm",
   please reproduce the problem, and after a signficant number of minutes
   with the core-dumping task hung, enter "alt-sysrq-c", and make the
   resultant vmcore available.  Either that, or declare that the patch
   addresses the problem.

Now, in the quagmire that this IT degenerated into, the reporter states that
the when running nhorman's debug kernel the core dump operations completed 
without delay: 

> I tried the test with debug kernel (remove  yield() from coredump_wait() )
> four times. On all case, core dump finished without being locked up soon.  But
> first time, core file was not created.  After that , core file was created  on
> three tries. The core file seems to be normal. ( we can examine them by gdb)
> As a result, removing yield() from coredump_wait()  is effective for this
> issue, but it does not seem to be perfect.  Please discuss  this measure with
> RedHat

I find this hard to believe.  But let's take it on its face value.

What exactly happened the 1 time when the "core file was not created"?  If 
the core-dumping thread ran to completion, then the only conceivable reasons
for no resultant core dump file (that I can think of) would be if:

 - the prerequisite "ulimit -c unlimited" had *not* been done first 
 - the filesystem was full 
 - the filesystem was unwriteable
 - the dumping application had done a chdir(), and the core file is in some
   other directory. 

But if none of those "ifs" were true, then it makes no sense that the core file
was not created.  That being the case, what "measure" is there to discuss?  
Again, even if it were true that the core file was magically not created --
despite all logical reasoning -- then at this point in time, and without being
there to confirm it, how on earth are we supposed to know what happened?  

Please -- enlighten me...

In any case, getting back to the reason behind this bugzilla, if the problem
cannot be reproduced with nhorman's kernel, then let's consider the yield()-less
patch effective, and put it up in front of rhkernel-list for discussion.

As far as the core file not being created, we're going to have to have some
solid evidence as opposed to some nebulous statement in the IT.


 



    

Comment 12 Chris Williams 2005-10-06 19:27:11 UTC
Created attachment 119682 [details]
ps -l output from vmcore

Comment 14 Chris Williams 2005-10-06 19:59:06 UTC
Created attachment 119683 [details]
remove yield()

Comment 15 Dave Anderson 2005-10-06 20:37:02 UTC
Chris, thanks for the info.

Looking at the relevant parts of the "ps -l" output, the yield()'ing "mta"
task (PID 2357) has been blocked for ~11 minutes:

crash> ps -l
[196314]  PID: 2798   TASK: f17f4000  CPU: 0   COMMAND: "mta"
[196208]  PID: 3198   TASK: f150c000  CPU: 1   COMMAND: "mta"
[196208]  PID: 3602   TASK: f1458000  CPU: 1   COMMAND: "bash"
[196208]  PID: 20835  TASK: f22ac000  CPU: 1   COMMAND: "in.telnetd"
...
[131154]  PID: 0      TASK: c5d78000  CPU: 1   COMMAND: "swapper"
[131154]  PID: 2357   TASK: f2d2c000  CPU: 1   COMMAND: "mta"
[131154]  PID: 2376   TASK: f1db8000  CPU: 0   COMMAND: "mta"
[131154]  PID: 10256  TASK: f3bcc000  CPU: 0   COMMAND: "spam_flt"
...
crash>

The current jiffies counter is at least 196314, whereas the "mta" task
last ran when it was 131154.  That's a delta of of 65160 clock ticks, at
100 ticks/second, or 651 seconds, which would be 10:51.  537 of the 539
"mta" threads have run since the core-dumping one last ran.

It's not "30 minutes" as the bugzilla complains, but I'm guessing that
they may have previously seen a ~30 minute timespan, and in this particular
attempt as registered in the vmcore, they had just entered alt-sysrq-c after
hanging ~11 minutes.  In your tests, Chris, perhaps the typical 8-minute
interval was smaller than theirs because you weren't simultaneously running
a whole bunch of other stuff as seen in the vmcore?

In any case, it's definitely worth posting the patch to rhkernel-list, and
we'll see how much flame it will generate...


Comment 18 Ernie Petrides 2005-10-20 05:43:49 UTC
A fix for this problem has just been committed to the RHEL3 U7
patch pool this evening (in kernel version 2.4.21-37.6.EL).


Comment 21 Red Hat Bugzilla 2006-03-15 16:36:42 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0144.html