Bug 107900 - nptl scheduling lockout triggered by page swaps
Summary: nptl scheduling lockout triggered by page swaps
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: glibc
Version: 9
Hardware: athlon
OS: Linux
medium
high
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-10-24 09:30 UTC by gregrwm
Modified: 2016-11-24 14:56 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-01-03 09:35:10 UTC
Embargoed:


Attachments (Terms of Use)

Description gregrwm 2003-10-24 09:30:58 UTC
i'm getting frequent hangs recently..  but get this..  i attach to the
process with gdb, then just quit gdb, and it unhangs!

this system is nothing but latest stable rpm software.  these hangs have
happened (repeatedly) so far in evolution(1.4.4), galeon(1.2.7),
gthumb(2.0.1), and gnucash(1.8.7), in either kde or gnome, under redhat
9, all up2date.  i've tried various kernels, 2.4.20-18, 2.4.20-19,
2.4.20-20, no difference.

before 3-6 weeks or so ago it wasn't happening.  dunno if i should blame
up2dates or what.  could be i just wasn't yet putting quite so heavy a
load on the machine back then?

my best guess is it's some sort of latent scheduling bug, brought out
under load, that is, when i've got enough going to really call
significantly upon swap.

Comment 1 gregrwm 2003-10-24 10:43:19 UTC
sometimes when something (eg galeon) is clearly stuck, all that is required is
to bring forward another window, then return to the "stuck" window, and presto,
it's fine again.  then again sometimes the gdb hack above seems needed.  more
rarely, even that doesn't help, and i end up killing and relaunching the app.

Comment 2 gregrwm 2003-10-25 09:09:00 UTC
the summary is a self diagnosis that needs confirmation.  how would i go about
confirming it?  i'm getting evolution and galeon lockups several times daily.

for an evolution example see http://bugzilla.ximian.com/show_bug.cgi?id=49373

Comment 3 Ulrich Drepper 2003-11-10 21:59:42 UTC
Can you try the test version of the RHL9 errata at

  ftp://people.redhat.com/jakub/glibc/errata/2.3.2-27.9.4/

and let us know whether it works?  This code should have a backport of
the problems in NPTL we know of.

Comment 4 gregrwm 2003-11-12 06:57:55 UTC
ok, after downloading and installing, neither kde nor gnome will
launch anymore.  the versions i have are from up2date.  do i need
something even more recent?

Comment 5 Ulrich Drepper 2003-11-12 07:34:03 UTC
There shouldn't be any problems at all.  Did you download the i686
version (I assume that is what you used before)?  What problems are
reported?

Comment 6 gregrwm 2003-11-12 19:44:52 UTC
ah, yes, glibc 686 is much better, thank you.

but alas, the hang problem is still here.  several hours of light duty
computing with no problem, but a few extra apps running, swap space
more active, and evolution hung again.

i've learned a new trick for getting out of the hangs.  just STOP the
process, and CONT again.  usually works, tho not always.

<sigh>.

Comment 7 Ulrich Drepper 2003-11-12 21:31:42 UTC
If you can try the glibc in Fedora Core 1 (which you only should try
with a complete installation) this would help.  I very much doubt that
there is any problem in FC1 and ordinarily I'd say the backport to
RHL9 has the important pieces.  But who knows, there have been tons of
changes.  I'm not going to try hunting down the bug in RHL9.  If
somebody identifies it and it indeed is a libc problem, we can look
into fixing it.  But the really up-to-date code is in FC1 and RHEL3.

Comment 8 gregrwm 2004-01-03 09:35:10 UTC
successful workaround:  invoke as follows:

$  LD_ASSUME_KERNEL=2.2.5 evolution &


Note You need to log in before you can comment on or make changes to this bug.