Red Hat Bugzilla – Bug 221756
updatedb and cluster filesystems
Last modified: 2014-01-17 10:54:54 EST
When using GFS or other cluster fs one is faced with the problem that no one
really owns the filesystem. For NAS setups like NFS one can exclude the client
mounts and have only the server do the updatedb, for GFS and friends there is no
designated server that manages the mlocate database.
The current model is to have all cluster members do the updatedb at the same
time which for large clusters is killing the IO. The current "workaround" is to
add GFS to the prunefs argument, or to designate one cluster member to do the
updatedb work. The former is bad, if one is interested in using locate on the
SAN contents, the latter is bad. because it introduces a manual asymmetry in the
How about the following model: The updatedb database is split across filesystems
(e.g. under /.mlocatedb/) and updatedb uses lock files to indicate that someone
is doing the updatedb work already - in that case updatedb skips this mount
point. locate then uses these databases automatically.
The benefits are:
a) cluster filesystems can be scanned with using only one member, but all
members can later use locate on the contents.
b1) New cluster members have immediate access to a fresh updatedb.
b2) Same for moving around disks between systems
b3) Even NFS attached nodes could start using locate on NFS contents
c) no manual config tuning for cluster fs.
d) Independent of cluster fs in use, works with every current and upcoming
There are security implications to consider especially for old-fashioned NFS
mounted systems, where the NFS client could spoof any userid including locate's,
but OTOH these setups are insecure on much worse level than giving away
visibility of paths, e.g. the rogue NFS client can simply become the user it
wants to query the paths for and even query the contents.
Would that model make sense? It looks easy to implement and perhaps it could
default to the current single db setup, but have easy switches to make cluster
fs behave as described.
Sounds interesting, but I wonder whether this dictates too much local policy.
e.g.: What if / and /usr are separate GFS mounts, with /usr mounted read-only?
There is no way to store data to /usr/.mlocatedb in that case.
Then there is the technical problem of detecting stale locks on a cluster
filesystem (without a shared PID space).
Read-only mounts could check for (a read-only) .mlocatedb and fall-back to /var
if it doesn't exist. Or the policy could be chosen in updatedb.conf.
Stale locks are nasty. One way to work around them would be to introduce
lock-stamping, e.g. have updatedb refresh the locks in given fixed time
intervalls and declare a lock stale if it's older than some higher value (e.g.
refresh every 15 minutes, declare stale if older than 30 minutes).
Based on the above, I have just sent a RFC to fedora-devel-list. Could you take
a look, please?
This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.