Bug 5866

Summary:	High load average under smp kernel when using gnome or E
Product:	[Retired] Red Hat Linux	Reporter:	Gordon Messmer <gordon.messmer>
Component:	kernel	Assignee:	Alan Cox <alan>
Status:	CLOSED NOTABUG	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	6.1	CC:	alan, juanco
Target Milestone:	---
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2000-02-05 23:51:25 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Gordon Messmer 1999-10-12 05:51:56 UTC

When using GNOME or E (alone), I experience a high load
average.  This is a stock installation of Red Hat 6.1 (I've
updated E, but that hasn't changed the symptoms)

My system consists of an ABIT BP-6 motherboard, (with either
Intel Multiprocessor specification 1.1 or 1.4, results are
the same) 128MB PC100 RAM, two Celeron 333Mhz processors,
and an AGP Matrox G400.

Applets in the panel seem to recieve an undue number of X
events.  I am running multiload_app, mixer_applet,
gnomexmms, and asclock_applet.  Among these, mixer_applet is
consistently using more cpu time than the others.  These
shift, of course, but top reports at one instant
mixer_applet:12% gnomexmms:8.1% enlightenment:7.7% X:7.3%
multiload_app:1.9%  Load avg:0.21

If I use E by itself, it continues to use about 10% of the
CPU time.  GNOME with wmaker or FVWM2 continues to use lots
of processor time.

This occurs with the smp kernel distributed with Red Hat
6.1, and with 2.2.12 release kernel compiled for SMP.  It
does not occur with the 2.2.12 uniprocessor kernel as
distributed with Red Hat 6.1.  Also, it occurs with the smp
kernel when either one or both processors are present, so it
doesn't seem to be related to more than one processor, just
with the SMP kernel.

Other X software, such as wmaker alone or KDE does not seem
to suffer this problem.

If I left out anything important, email me for more info.

Comment 1 Gordon Messmer 1999-10-12 21:28:59 UTC

On further investigation, all of the software mentioned have
exceedingly short timeouts specified in calls to select() or to
poll().  They redraw often even under the uniprocessor kernel, but
don't take nearly as much processor time.  The problem probably stems
from a kernel lock when reading/writing to the UNIX socket to the X
server.

Comment 2 Gordon Messmer 1999-10-13 00:54:59 UTC

It seems like the kernel locks introduced in 2.2.11 have really hurt
the performance of UNIX sockets.  Kernels prior to this don't exhibit
this behavior.

Comment 3 Alan Cox 1999-10-15 21:26:59 UTC

2.2.11 doesnt change the AF_UNIX locking at all. In fact we've been
reducing locking. It looks like its just applications ticking over
it may also be artificially high as the sample rate for load average
is only 100HZ

Comment 4 Gordon Messmer 1999-10-18 02:29:59 UTC

Based on alan's comments, I've investigated further.  I wrote a simple
program that would emulate the behavior that I observed in GNOME's
panel apps.  The program forks, the parent creates a UNIX socket and
reads 10000 packets of a user-specified size from that socket after
the child connects.  The child writes 10000 packets of data, with a
100 nanosecond delay in between writes, on exit it displays the total
amount of time spent in writes and the average write time.  This can
be had from ftp://duke.eburg.com/pub/linux/test_socket.c if you like.

OK, under the uniprocessor kernel, under all tested circumstances,
this program never takes more than a fraction of a percentage of the
CPU time.  This is what I expected to see, as it's how the X apps
behave.

Now, I tested it under the 2.2.12 smp kernel on the console, and
observed the same results.  No high load, no unusual CPU utilization.
That had me for a second.  Then, I went into X, with GNOME running.
GNOME was behaving as described, taking 35-45 percent of the available
CPU time.  Then, I ran the test_socket program, and watched both the
parent and child processes take ~10% of the available CPU time, unlike
what I'd observed on the console.  The load may have been due to
something else (incidental), as it fell to normal during tests, but
the CPU utilization was still very high.  This leads me to believe
that when many AF_UNIX sockets are in use concurrently, performance
suffers.

2.2.11 is the first 2.2.x series kernel that I can find that had the
file linux/smp_lock.h included.  The patch for 2.2.11 did contain some
smp locking changes to net/unix/af_unix.c (by davem??) as seen on
http://www.kernelnotes.org/v22patch/patch-2.2.11/linux_net_unix_af_unix.c.html

And, I tested the same program under a 2.2.5 smp kernel with no
abnormal effects.  If you would like me to test any kernel in between,
please let me know.

Comment 5 Alan Cox 1999-10-20 17:47:59 UTC

Can you run your test combined with something eating CPU. The uptime
load data is a 100Hz sample. It isnt real accurate informtion when
you have timing synchronizations involved. Without data on slow down
of a CPU intensive task in both cases its iffy to assume the cpu
load data is accurate.

Comment 6 Gordon Messmer 1999-10-21 05:39:59 UTC

I wasn't quite sure what you wanted, so what I did was this:
In the uniprocessor kernel, I first created a set of ten directories
and spawned terminals within each one.  Then, I ran the test app in
each of these directories concurrently.  There was no additional load
or processor time used significant enought to show up in 'top' or
'xosview'.  Good enough, that's what I expect.

Then, I rebooted to the smp kernel, got into X and again spawned ten
xterms.  In each one of the terminals I spawned the test app, so there
were ten processes communicating in addition to the X applications
already running.  'multiload_app' showed about 10% "user" processor
use and about 90% "nice" processor use.  The average transport time
increased only from about 11ns to 12ns.  Hardly a bother.  I noticed
that each of the 'socket_test' were using about 4.1% processor time in
'top', very consistently.  So was X and a couple other applications.
So, does this 100Hz sample also apply to processor use, as shown by
'top' and 'xosview'?

What had me curious in the first place, was that af_unix uses
(un)lock_kernel, while the ipv4 sources do not.  I appreciate your
time.  (and I know how hard it is to find)