Red Hat Bugzilla – Bug 107900
nptl scheduling lockout triggered by page swaps
Last modified: 2007-04-18 12:58:44 EDT
i'm getting frequent hangs recently.. but get this.. i attach to the
process with gdb, then just quit gdb, and it unhangs!
this system is nothing but latest stable rpm software. these hangs have
happened (repeatedly) so far in evolution(1.4.4), galeon(1.2.7),
gthumb(2.0.1), and gnucash(1.8.7), in either kde or gnome, under redhat
9, all up2date. i've tried various kernels, 2.4.20-18, 2.4.20-19,
2.4.20-20, no difference.
before 3-6 weeks or so ago it wasn't happening. dunno if i should blame
up2dates or what. could be i just wasn't yet putting quite so heavy a
load on the machine back then?
my best guess is it's some sort of latent scheduling bug, brought out
under load, that is, when i've got enough going to really call
significantly upon swap.
sometimes when something (eg galeon) is clearly stuck, all that is required is
to bring forward another window, then return to the "stuck" window, and presto,
it's fine again. then again sometimes the gdb hack above seems needed. more
rarely, even that doesn't help, and i end up killing and relaunching the app.
the summary is a self diagnosis that needs confirmation. how would i go about
confirming it? i'm getting evolution and galeon lockups several times daily.
for an evolution example see http://bugzilla.ximian.com/show_bug.cgi?id=49373
Can you try the test version of the RHL9 errata at
and let us know whether it works? This code should have a backport of
the problems in NPTL we know of.
ok, after downloading and installing, neither kde nor gnome will
launch anymore. the versions i have are from up2date. do i need
something even more recent?
There shouldn't be any problems at all. Did you download the i686
version (I assume that is what you used before)? What problems are
ah, yes, glibc 686 is much better, thank you.
but alas, the hang problem is still here. several hours of light duty
computing with no problem, but a few extra apps running, swap space
more active, and evolution hung again.
i've learned a new trick for getting out of the hangs. just STOP the
process, and CONT again. usually works, tho not always.
If you can try the glibc in Fedora Core 1 (which you only should try
with a complete installation) this would help. I very much doubt that
there is any problem in FC1 and ordinarily I'd say the backport to
RHL9 has the important pieces. But who knows, there have been tons of
changes. I'm not going to try hunting down the bug in RHL9. If
somebody identifies it and it indeed is a libc problem, we can look
into fixing it. But the really up-to-date code is in FC1 and RHEL3.
successful workaround: invoke as follows:
$ LD_ASSUME_KERNEL=2.2.5 evolution &