Description of problem: Using flock() over NFS under the 2.6.x SMP kernels causes bizarre system-wide performance degradation. Non-SMP kernels and 2.4.x kernels do not exhibit this problem. This is what I see: I was debugging a performance degradation of our compute-intensive application after upgrading from RHL8.0 to FC3. On our dual-cpu (Athlon and P4) machines, when running two of these applications at the same time, the machine feels very sluggish (interactive response is slow, NFS mounts sometimes timeout), "top" reports 50% "user" and 50% "system" CPU usage (oprofile, vmstat and "/usr/bin/time" show the same), "strace" shows that both applications do not make any system calls (they just compute). What?!? No system calls but 50% "system" CPU usage?!? I traced the problem to the use of flock() over NFS early in the application. As we know, flock() over NFS "does not work", but this application worked fine until the 2.6.x kernels in FC2 and FC3. (The locking code in this application probably predates Linux) (The locking code in the application is useless and I am presently removing it). The problem is with closing the lockfile before releasing the lock. "man flock" says that it should "just release the lock" and I guess it works in the 2.4 and the non-SMP 2.6 kernels. I reduced our application to this example that exhibits the performance problem: // flock.cc #include <stdio.h> #include <unistd.h> #include <sys/file.h> int main(int argc,char*argv[]) { char *file = argv[1]; int fd = open(file,O_RDWR); if (fd == -1) { perror("open"); return -1; } if (flock(fd, LOCK_EX | LOCK_NB )) { perror("flock"); return -1; } //flock(fd, LOCK_UN); // add this time to fix the problem close(fd); // or remove this line to fix the problem while (1) { /* compute stuff */ } } Compile and run on an SMP machine: [olchansk@tw04 flock]$ g++ flock.cc -o flock [olchansk@tw04 flock]$ touch /tmp/aaa [olchansk@tw04 flock]$ ./flock /tmp/aaa <---- use local file In "vmstat" observe 50% "user", 50% idle and 0% "system" cpu usage: procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 2 0 1074880 1261436 6868 46624 0 0 0 0 1121 178 50 0 50 0 Now run: [olchansk@tw04 flock]$ ./flock /some/nfs/mounted/file <--- NFS mounted In "vmstat" observe high "system" cpu usage: procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 4 0 1074880 1261364 6932 46628 0 0 0 0 1217 162 50 37 13 0 1 0 1074880 1261364 6940 46620 0 0 0 16 1158 279 50 26 24 0 Also try typing shell commands, observe sluggish interactive response. Inspect the example source code: there are no system calls, nothing to cause "system" cpu usage or slow down the machine. Remove the "close" or add the "flock-unlock" statement, recompile, rerun, observe "system" cpu usage goes back to zero, interactive response returns to same as for "local file locking". Version-Release number of selected component (if applicable): SMP FC2, FC3 are affected. Anything older and non-SMP FC2, FC3 are fine (no performance degradation). AMD64, Athlon and Pentium4 machines all behave the same way. [olchansk@bench flock]$ uname -a Linux bench.triumf.ca 2.6.10-1.770_FC3smp #1 SMP Thu Feb 24 18:36:43 EST 2005 x86_64 x86_64 x86_64 GNU/Linux [olchansk@bench flock]$ rpm -q glibc glibc-2.3.5-0.fc3.1 [olchansk@bench flock]$ rpm -q gcc gcc-3.4.2-6.fc3 K.O.
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which may contain a fix for your problem. Please update to this new kernel, and report whether or not it fixes your problem. If you have updated to Fedora Core 4 since this bug was opened, and the problem still occurs with the latest updates for that release, please change the version field of this bug to 'fc4'. Thank you.
This is a mass-update to all currently open Fedora Core 3 kernel bugs. Fedora Core 3 support has transitioned to the Fedora Legacy project. Due to the limited resources of this project, typically only updates for new security issues are released. As this bug isn't security related, it has been migrated to a Fedora Core 4 bug. Please upgrade to this newer release, and test if this bug is still present there. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. Thank you.
This is a mass-update to all currently open kernel bugs. A new kernel update has been released (Version: 2.6.15-1.1830_FC4) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. Thank you.
Closing per previous comment.