Description of problem: I had my home directory (and thus ~/.ccache) mounted on an NFS share, to share it between machines. I was then running the autoconf testsuite with high parallelism (number of active cores + 2), and noticed that I was having windows of processor utilization dropping to nearly 0%, with tests taking a LOONG time to complete. Upon investigation, I noticed that when the tests were sluggish, 'time gcc --version' would take 30 seconds. Version-Release number of selected component (if applicable): $ rpm -q gcc ccache gcc-4.4.4-10.fc14.x86_64 ccache-3.0-0.2.pre1.fc14.x86_64 How reproducible: very Steps to Reproduce: 1. Point ~/.ccache to an NFSv3 mount. 2. git clone git://git.sv.gnu.org/autoconf.git 3. cd autoconf 4. autoreconf -vfi 5. make 6. make check TESTSUITEFLAGS=-j$(($(nproc) + 2)) 7. monitor processor utilization during the exercise Actual results: During sequences where multiple processes are trying to use gcc at once (around test 250 or so in the autoconf testsuite), I noticed that processor utilization was severely dropping, and tests were taking forever to complete. Investigating partial test output to date showed that tests were getting stuck on 'gcc --version', and I was able to reproduce this in another console, with 'time gcc --version' showing 30 seconds of elapsed time. Using both strace and ltrace showed that a slow 'gcc --version' was invariably getting stuck on fcntl() call in this portion of the process: open("/home/remote/eblake/.ccache/stats", O_RDWR) = 4 fcntl(4, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=1}) = 0 In other words, the act of trying to lock ~/.ccache/stats is causing lock contention over NFS, which results in long timeouts for things to serialize correctly, and as a result, ccache performance was needlessly suffering. Expected results: Testsuite should complete within a few minutes, with nearly 100% processor utilization on all cores during the test. File locking should NOT cause such a severe performance degradation, particularly for something as trivial as 'gcc --version'. Furthermore, using fcntl for file locking is inherently broken: http://0pointer.de/blog/projects/locking.html If ccache needs locking, it should use alternatives such as atomic mkdir() or symlink() calls, rather than fcntl() locking, particularly if ~/.ccache is not a local drive. Additional info: I was able to work around the issue by relocating ~/.ccache to be a symlink to a local directory, at which point NFS locking speed no longer interferes, and my autoconf testsuite completed faster.
Forwarded upstream: https://bugzilla.samba.org/show_bug.cgi?id=7545
This bug appears to have been reported against 'rawhide' during the Fedora 14 development cycle. Changing version to '14'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
ccache 3.1 will contain two changes to tackle the problem: 1. Update one of the 16 $CCACHE_DIR/[0-9a-f]/stats files for things like "gcc --version" in one of the 16 subdirectories (selected pseudo-randomly) instead of $CCACHE_DIR/stats. This will reduce lock contention. 2. As suggested, use symlinks for locking instead of POSIX locks. -- Joel (upstream ccache maintainer)
ccache-3.1-1.fc14 has been submitted as an update for Fedora 14. https://admin.fedoraproject.org/updates/ccache-3.1-1.fc14
ccache-3.1-1.fc14 has been pushed to the Fedora 14 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update ccache'. You can provide feedback for this update here: https://admin.fedoraproject.org/updates/ccache-3.1-1.fc14
ccache-3.1-1.fc14 has been pushed to the Fedora 14 stable repository. If problems still persist, please make note of it in this bug report.