Bug 1155232

Summary:	log spam "lockd: cannot monitor X"
Product:	[Fedora] Fedora	Reporter:	Ian Collier <imc>
Component:	nfs-utils	Assignee:	Steve Dickson <steved>
Status:	CLOSED NOTABUG	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	20	CC:	bfields, jlayton, steved
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2014-10-29 13:15:39 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Ian Collier 2014-10-21 15:51:11 UTC

Description of problem:

Something is causing the kernel lockd process to spew warnings of 
the form:

Oct 21 16:32:13 tr02 kernel: [2864238.243859] lockd: cannot monitor tfs
Oct 21 16:32:13 tr02 kernel: [2864238.243874] lockd: cannot monitor tfs
Oct 21 16:32:13 tr02 kernel: [2864238.243907] lockd: cannot monitor tfs
Oct 21 16:32:13 tr02 kernel: [2864238.243923] lockd: cannot monitor tfs
Oct 21 16:32:13 tr02 kernel: [2864238.243956] lockd: cannot monitor tfs
Oct 21 16:32:13 tr02 kernel: [2864238.243971] lockd: cannot monitor tfs

At the rate of up to 30 thousand messages per second, for maybe 3 or 4
seconds of every minute and causing /var/log/messages to grow at the
rate of 100-200 megabytes per hour.  The actual rate is not constant
but varies seemingly at random.

I don't know what the reason for the messages is* but even if there
is a genuine problem the system should not spam the logs like this.

(* This is an NFS client and doesn't run an NFS server; it mounts
filesystems off a remote server which seems to be running fine,
and the client seems to be able to mount and lock files with no
issues. Running "rpcinfo -p" on both client and server says that both
status and nlockmgr are known to the system.  However, tracing the local
statd process shows no activity.  Restarting rpcbind and nfs-lock did
not make any difference.)

Version-Release number of selected component (if applicable):
nfs-utils-1.3.0-2.2.fc20.x86_64
kernel-3.16.2-200.fc20.x86_64
rsyslog-7.4.8-1.fc20.1.x86_64

How reproducible:
No idea, this is a live server and I can't arbitrarily reboot it.

Comment 1 Steve Dickson 2014-10-29 13:15:39 UTC

(In reply to Ian Collier from comment #0)
> Description of problem:
> 
> Something is causing the kernel lockd process to spew warnings of 
> the form:
> 
> Oct 21 16:32:13 tr02 kernel: [2864238.243859] lockd: cannot monitor tfs
> Oct 21 16:32:13 tr02 kernel: [2864238.243874] lockd: cannot monitor tfs
> Oct 21 16:32:13 tr02 kernel: [2864238.243907] lockd: cannot monitor tfs
> Oct 21 16:32:13 tr02 kernel: [2864238.243923] lockd: cannot monitor tfs
> Oct 21 16:32:13 tr02 kernel: [2864238.243956] lockd: cannot monitor tfs
> Oct 21 16:32:13 tr02 kernel: [2864238.243971] lockd: cannot monitor tfs
This generally is a firewall problem since the lockd/statd listen
and use ports that are not known to the firewalld.

> 
> At the rate of up to 30 thousand messages per second, for maybe 3 or 4
> seconds of every minute and causing /var/log/messages to grow at the
> rate of 100-200 megabytes per hour.  The actual rate is not constant
> but varies seemingly at random.
There must be a lot of file locking happening at different points
plus you must be using NFS v3 instead of v4.

> 
> I don't know what the reason for the messages is* but even if there
> is a genuine problem the system should not spam the logs like this.
> 
> (* This is an NFS client and doesn't run an NFS server; it mounts
> filesystems off a remote server which seems to be running fine,
> and the client seems to be able to mount and lock files with no
> issues. Running "rpcinfo -p" on both client and server says that both
> status and nlockmgr are known to the system.  However, tracing the local
> statd process shows no activity.  Restarting rpcbind and nfs-lock did
> not make any difference.)
Again, its probably a firewall problem... Either tell firewalld about
the port lockd/statd is listen on or simply shutdown or use v4.


I'm closing this for now... If its not a firewall issue please feel free to reopen...

Comment 2 Ian Collier 2014-10-31 11:04:40 UTC

Right, it turns out the actual culprit is evolution-addressbook-factory, which
spins in a tight loop trying to lock its database.

(To answer your questions: no, it's not a firewall issue, and yes we are using
NFS servers that only support v3.)

So nfs is almost in the clear here.  I say almost, because it shouldn't really
be possible for a userspace utility to cause a DoS in this way.  I suspect the
correct answer to that is to ask for an enhancement to rsyslog that collapses
identical messages.  (It always used to do that, but these days every kernel
message comes with a timestamp which stops them from being identical.)

I guess I'll leave this closed and report separately against
evolution-data-server.

Comment 3 Jeff Layton 2014-10-31 12:30:59 UTC

Yes, it would be good to have it handle -ENOLCK errors sanely. That said, it also wouldn't hurt to turn that printk into a printk_ratelimited. I'll send something to Bruce on the linux-nfs mailing list.