When using GNOME or E (alone), I experience a high load average. This is a stock installation of Red Hat 6.1 (I've updated E, but that hasn't changed the symptoms) My system consists of an ABIT BP-6 motherboard, (with either Intel Multiprocessor specification 1.1 or 1.4, results are the same) 128MB PC100 RAM, two Celeron 333Mhz processors, and an AGP Matrox G400. Applets in the panel seem to recieve an undue number of X events. I am running multiload_app, mixer_applet, gnomexmms, and asclock_applet. Among these, mixer_applet is consistently using more cpu time than the others. These shift, of course, but top reports at one instant mixer_applet:12% gnomexmms:8.1% enlightenment:7.7% X:7.3% multiload_app:1.9% Load avg:0.21 If I use E by itself, it continues to use about 10% of the CPU time. GNOME with wmaker or FVWM2 continues to use lots of processor time. This occurs with the smp kernel distributed with Red Hat 6.1, and with 2.2.12 release kernel compiled for SMP. It does not occur with the 2.2.12 uniprocessor kernel as distributed with Red Hat 6.1. Also, it occurs with the smp kernel when either one or both processors are present, so it doesn't seem to be related to more than one processor, just with the SMP kernel. Other X software, such as wmaker alone or KDE does not seem to suffer this problem. If I left out anything important, email me for more info.
On further investigation, all of the software mentioned have exceedingly short timeouts specified in calls to select() or to poll(). They redraw often even under the uniprocessor kernel, but don't take nearly as much processor time. The problem probably stems from a kernel lock when reading/writing to the UNIX socket to the X server.
It seems like the kernel locks introduced in 2.2.11 have really hurt the performance of UNIX sockets. Kernels prior to this don't exhibit this behavior.
2.2.11 doesnt change the AF_UNIX locking at all. In fact we've been reducing locking. It looks like its just applications ticking over it may also be artificially high as the sample rate for load average is only 100HZ
Based on alan's comments, I've investigated further. I wrote a simple program that would emulate the behavior that I observed in GNOME's panel apps. The program forks, the parent creates a UNIX socket and reads 10000 packets of a user-specified size from that socket after the child connects. The child writes 10000 packets of data, with a 100 nanosecond delay in between writes, on exit it displays the total amount of time spent in writes and the average write time. This can be had from ftp://duke.eburg.com/pub/linux/test_socket.c if you like. OK, under the uniprocessor kernel, under all tested circumstances, this program never takes more than a fraction of a percentage of the CPU time. This is what I expected to see, as it's how the X apps behave. Now, I tested it under the 2.2.12 smp kernel on the console, and observed the same results. No high load, no unusual CPU utilization. That had me for a second. Then, I went into X, with GNOME running. GNOME was behaving as described, taking 35-45 percent of the available CPU time. Then, I ran the test_socket program, and watched both the parent and child processes take ~10% of the available CPU time, unlike what I'd observed on the console. The load may have been due to something else (incidental), as it fell to normal during tests, but the CPU utilization was still very high. This leads me to believe that when many AF_UNIX sockets are in use concurrently, performance suffers. 2.2.11 is the first 2.2.x series kernel that I can find that had the file linux/smp_lock.h included. The patch for 2.2.11 did contain some smp locking changes to net/unix/af_unix.c (by davem??) as seen on http://www.kernelnotes.org/v22patch/patch-2.2.11/linux_net_unix_af_unix.c.html And, I tested the same program under a 2.2.5 smp kernel with no abnormal effects. If you would like me to test any kernel in between, please let me know.
Can you run your test combined with something eating CPU. The uptime load data is a 100Hz sample. It isnt real accurate informtion when you have timing synchronizations involved. Without data on slow down of a CPU intensive task in both cases its iffy to assume the cpu load data is accurate.
I wasn't quite sure what you wanted, so what I did was this: In the uniprocessor kernel, I first created a set of ten directories and spawned terminals within each one. Then, I ran the test app in each of these directories concurrently. There was no additional load or processor time used significant enought to show up in 'top' or 'xosview'. Good enough, that's what I expect. Then, I rebooted to the smp kernel, got into X and again spawned ten xterms. In each one of the terminals I spawned the test app, so there were ten processes communicating in addition to the X applications already running. 'multiload_app' showed about 10% "user" processor use and about 90% "nice" processor use. The average transport time increased only from about 11ns to 12ns. Hardly a bother. I noticed that each of the 'socket_test' were using about 4.1% processor time in 'top', very consistently. So was X and a couple other applications. So, does this 100Hz sample also apply to processor use, as shown by 'top' and 'xosview'? What had me curious in the first place, was that af_unix uses (un)lock_kernel, while the ipv4 sources do not. I appreciate your time. (and I know how hard it is to find)