Bug 610853 - having ~/.ccache on NFS makes 'gcc --version' take 30 seconds
Summary: having ~/.ccache on NFS makes 'gcc --version' take 30 seconds
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: ccache
Version: 14
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Ville Skyttä
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-07-02 15:30 UTC by Eric Blake
Modified: 2010-09-30 06:13 UTC (History)
2 users (show)

Fixed In Version: ccache-3.1-1.fc14
Clone Of:
Environment:
Last Closed: 2010-09-30 06:13:07 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Eric Blake 2010-07-02 15:30:39 UTC
Description of problem:
I had my home directory (and thus ~/.ccache) mounted on an NFS share, to share it between machines.  I was then running the autoconf testsuite with high parallelism (number of active cores + 2), and noticed that I was having windows of processor utilization dropping to nearly 0%, with tests taking a LOONG time to complete.  Upon investigation, I noticed that when the tests were sluggish, 'time gcc --version' would take 30 seconds.

Version-Release number of selected component (if applicable):
$ rpm -q gcc ccache
gcc-4.4.4-10.fc14.x86_64
ccache-3.0-0.2.pre1.fc14.x86_64

How reproducible:
very

Steps to Reproduce:
1. Point ~/.ccache to an NFSv3 mount.
2. git clone git://git.sv.gnu.org/autoconf.git
3. cd autoconf
4. autoreconf -vfi
5. make
6. make check TESTSUITEFLAGS=-j$(($(nproc) + 2))
7. monitor processor utilization during the exercise
  
Actual results:
During sequences where multiple processes are trying to use gcc at once (around test 250 or so in the autoconf testsuite), I noticed that processor utilization was severely dropping, and tests were taking forever to complete.  Investigating partial test output to date showed that tests were getting stuck on 'gcc --version', and I was able to reproduce this in another console, with 'time gcc --version' showing 30 seconds of elapsed time.

Using both strace and ltrace showed that a slow 'gcc --version' was invariably getting stuck on fcntl() call in this portion of the process:
open("/home/remote/eblake/.ccache/stats", O_RDWR) = 4
fcntl(4, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=1}) = 0

In other words, the act of trying to lock ~/.ccache/stats is causing lock contention over NFS, which results in long timeouts for things to serialize correctly, and as a result, ccache performance was needlessly suffering.

Expected results:
Testsuite should complete within a few minutes, with nearly 100% processor utilization on all cores during the test.

File locking should NOT cause such a severe performance degradation, particularly for something as trivial as 'gcc --version'.  Furthermore, using fcntl for file locking is inherently broken:
http://0pointer.de/blog/projects/locking.html
If ccache needs locking, it should use alternatives such as atomic mkdir() or symlink() calls, rather than fcntl() locking, particularly if ~/.ccache is not a local drive.

Additional info:
I was able to work around the issue by relocating ~/.ccache to be a symlink to a local directory, at which point NFS locking speed no longer interferes, and my autoconf testsuite completed faster.

Comment 1 Ville Skyttä 2010-07-03 12:26:50 UTC
Forwarded upstream: https://bugzilla.samba.org/show_bug.cgi?id=7545

Comment 2 Bug Zapper 2010-07-30 12:24:04 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 14 development cycle.
Changing version to '14'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 3 Joel Rosdahl 2010-08-01 15:56:27 UTC
ccache 3.1 will contain two changes to tackle the problem:

1. Update one of the 16 $CCACHE_DIR/[0-9a-f]/stats files for things like "gcc --version" in one of the 16 subdirectories (selected pseudo-randomly) instead of $CCACHE_DIR/stats. This will reduce lock contention.

2. As suggested, use symlinks for locking instead of POSIX locks.

-- Joel (upstream ccache maintainer)

Comment 4 Fedora Update System 2010-09-18 15:38:54 UTC
ccache-3.1-1.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/ccache-3.1-1.fc14

Comment 5 Fedora Update System 2010-09-20 18:39:53 UTC
ccache-3.1-1.fc14 has been pushed to the Fedora 14 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update ccache'.  You can provide feedback for this update here: https://admin.fedoraproject.org/updates/ccache-3.1-1.fc14

Comment 6 Fedora Update System 2010-09-30 06:12:58 UTC
ccache-3.1-1.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.