Description of problem: Something is causing the kernel lockd process to spew warnings of the form: Oct 21 16:32:13 tr02 kernel: [2864238.243859] lockd: cannot monitor tfs Oct 21 16:32:13 tr02 kernel: [2864238.243874] lockd: cannot monitor tfs Oct 21 16:32:13 tr02 kernel: [2864238.243907] lockd: cannot monitor tfs Oct 21 16:32:13 tr02 kernel: [2864238.243923] lockd: cannot monitor tfs Oct 21 16:32:13 tr02 kernel: [2864238.243956] lockd: cannot monitor tfs Oct 21 16:32:13 tr02 kernel: [2864238.243971] lockd: cannot monitor tfs At the rate of up to 30 thousand messages per second, for maybe 3 or 4 seconds of every minute and causing /var/log/messages to grow at the rate of 100-200 megabytes per hour. The actual rate is not constant but varies seemingly at random. I don't know what the reason for the messages is* but even if there is a genuine problem the system should not spam the logs like this. (* This is an NFS client and doesn't run an NFS server; it mounts filesystems off a remote server which seems to be running fine, and the client seems to be able to mount and lock files with no issues. Running "rpcinfo -p" on both client and server says that both status and nlockmgr are known to the system. However, tracing the local statd process shows no activity. Restarting rpcbind and nfs-lock did not make any difference.) Version-Release number of selected component (if applicable): nfs-utils-1.3.0-2.2.fc20.x86_64 kernel-3.16.2-200.fc20.x86_64 rsyslog-7.4.8-1.fc20.1.x86_64 How reproducible: No idea, this is a live server and I can't arbitrarily reboot it.
(In reply to Ian Collier from comment #0) > Description of problem: > > Something is causing the kernel lockd process to spew warnings of > the form: > > Oct 21 16:32:13 tr02 kernel: [2864238.243859] lockd: cannot monitor tfs > Oct 21 16:32:13 tr02 kernel: [2864238.243874] lockd: cannot monitor tfs > Oct 21 16:32:13 tr02 kernel: [2864238.243907] lockd: cannot monitor tfs > Oct 21 16:32:13 tr02 kernel: [2864238.243923] lockd: cannot monitor tfs > Oct 21 16:32:13 tr02 kernel: [2864238.243956] lockd: cannot monitor tfs > Oct 21 16:32:13 tr02 kernel: [2864238.243971] lockd: cannot monitor tfs This generally is a firewall problem since the lockd/statd listen and use ports that are not known to the firewalld. > > At the rate of up to 30 thousand messages per second, for maybe 3 or 4 > seconds of every minute and causing /var/log/messages to grow at the > rate of 100-200 megabytes per hour. The actual rate is not constant > but varies seemingly at random. There must be a lot of file locking happening at different points plus you must be using NFS v3 instead of v4. > > I don't know what the reason for the messages is* but even if there > is a genuine problem the system should not spam the logs like this. > > (* This is an NFS client and doesn't run an NFS server; it mounts > filesystems off a remote server which seems to be running fine, > and the client seems to be able to mount and lock files with no > issues. Running "rpcinfo -p" on both client and server says that both > status and nlockmgr are known to the system. However, tracing the local > statd process shows no activity. Restarting rpcbind and nfs-lock did > not make any difference.) Again, its probably a firewall problem... Either tell firewalld about the port lockd/statd is listen on or simply shutdown or use v4. I'm closing this for now... If its not a firewall issue please feel free to reopen...
Right, it turns out the actual culprit is evolution-addressbook-factory, which spins in a tight loop trying to lock its database. (To answer your questions: no, it's not a firewall issue, and yes we are using NFS servers that only support v3.) So nfs is almost in the clear here. I say almost, because it shouldn't really be possible for a userspace utility to cause a DoS in this way. I suspect the correct answer to that is to ask for an enhancement to rsyslog that collapses identical messages. (It always used to do that, but these days every kernel message comes with a timestamp which stops them from being identical.) I guess I'll leave this closed and report separately against evolution-data-server.
Yes, it would be good to have it handle -ENOLCK errors sanely. That said, it also wouldn't hurt to turn that printk into a printk_ratelimited. I'll send something to Bruce on the linux-nfs mailing list.