Hide Forgot
When using GFS or other cluster fs one is faced with the problem that no one really owns the filesystem. For NAS setups like NFS one can exclude the client mounts and have only the server do the updatedb, for GFS and friends there is no designated server that manages the mlocate database. The current model is to have all cluster members do the updatedb at the same time which for large clusters is killing the IO. The current "workaround" is to add GFS to the prunefs argument, or to designate one cluster member to do the updatedb work. The former is bad, if one is interested in using locate on the SAN contents, the latter is bad. because it introduces a manual asymmetry in the cluster members. How about the following model: The updatedb database is split across filesystems (e.g. under /.mlocatedb/) and updatedb uses lock files to indicate that someone is doing the updatedb work already - in that case updatedb skips this mount point. locate then uses these databases automatically. The benefits are: a) cluster filesystems can be scanned with using only one member, but all members can later use locate on the contents. b1) New cluster members have immediate access to a fresh updatedb. b2) Same for moving around disks between systems b3) Even NFS attached nodes could start using locate on NFS contents c) no manual config tuning for cluster fs. d) Independent of cluster fs in use, works with every current and upcoming ones. There are security implications to consider especially for old-fashioned NFS mounted systems, where the NFS client could spoof any userid including locate's, but OTOH these setups are insecure on much worse level than giving away visibility of paths, e.g. the rogue NFS client can simply become the user it wants to query the paths for and even query the contents. Would that model make sense? It looks easy to implement and perhaps it could default to the current single db setup, but have easy switches to make cluster fs behave as described. Thanks!
Sounds interesting, but I wonder whether this dictates too much local policy. e.g.: What if / and /usr are separate GFS mounts, with /usr mounted read-only? There is no way to store data to /usr/.mlocatedb in that case. Then there is the technical problem of detecting stale locks on a cluster filesystem (without a shared PID space).
Read-only mounts could check for (a read-only) .mlocatedb and fall-back to /var if it doesn't exist. Or the policy could be chosen in updatedb.conf. Stale locks are nasty. One way to work around them would be to introduce lock-stamping, e.g. have updatedb refresh the locks in given fixed time intervalls and declare a lock stale if it's older than some higher value (e.g. refresh every 15 minutes, declare stale if older than 30 minutes).
Based on the above, I have just sent a RFC to fedora-devel-list. Could you take a look, please?
This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.