1155232 – log spam "lockd: cannot monitor X"

Bug 1155232 - log spam "lockd: cannot monitor X"

Summary: log spam "lockd: cannot monitor X"

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	nfs-utils
Sub Component:
Version:	20
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Steve Dickson
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-10-21 15:51 UTC by Ian Collier
Modified:	2014-10-31 12:30 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-10-29 13:15:39 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ian Collier 2014-10-21 15:51:11 UTC

Description of problem:

Something is causing the kernel lockd process to spew warnings of 
the form:

Oct 21 16:32:13 tr02 kernel: [2864238.243859] lockd: cannot monitor tfs
Oct 21 16:32:13 tr02 kernel: [2864238.243874] lockd: cannot monitor tfs
Oct 21 16:32:13 tr02 kernel: [2864238.243907] lockd: cannot monitor tfs
Oct 21 16:32:13 tr02 kernel: [2864238.243923] lockd: cannot monitor tfs
Oct 21 16:32:13 tr02 kernel: [2864238.243956] lockd: cannot monitor tfs
Oct 21 16:32:13 tr02 kernel: [2864238.243971] lockd: cannot monitor tfs

At the rate of up to 30 thousand messages per second, for maybe 3 or 4
seconds of every minute and causing /var/log/messages to grow at the
rate of 100-200 megabytes per hour.  The actual rate is not constant
but varies seemingly at random.

I don't know what the reason for the messages is* but even if there
is a genuine problem the system should not spam the logs like this.

(* This is an NFS client and doesn't run an NFS server; it mounts
filesystems off a remote server which seems to be running fine,
and the client seems to be able to mount and lock files with no
issues. Running "rpcinfo -p" on both client and server says that both
status and nlockmgr are known to the system.  However, tracing the local
statd process shows no activity.  Restarting rpcbind and nfs-lock did
not make any difference.)

Version-Release number of selected component (if applicable):
nfs-utils-1.3.0-2.2.fc20.x86_64
kernel-3.16.2-200.fc20.x86_64
rsyslog-7.4.8-1.fc20.1.x86_64

How reproducible:
No idea, this is a live server and I can't arbitrarily reboot it.

Comment 1 Steve Dickson 2014-10-29 13:15:39 UTC

(In reply to Ian Collier from comment #0)
> Description of problem:
> 
> Something is causing the kernel lockd process to spew warnings of 
> the form:
> 
> Oct 21 16:32:13 tr02 kernel: [2864238.243859] lockd: cannot monitor tfs
> Oct 21 16:32:13 tr02 kernel: [2864238.243874] lockd: cannot monitor tfs
> Oct 21 16:32:13 tr02 kernel: [2864238.243907] lockd: cannot monitor tfs
> Oct 21 16:32:13 tr02 kernel: [2864238.243923] lockd: cannot monitor tfs
> Oct 21 16:32:13 tr02 kernel: [2864238.243956] lockd: cannot monitor tfs
> Oct 21 16:32:13 tr02 kernel: [2864238.243971] lockd: cannot monitor tfs
This generally is a firewall problem since the lockd/statd listen
and use ports that are not known to the firewalld.

> 
> At the rate of up to 30 thousand messages per second, for maybe 3 or 4
> seconds of every minute and causing /var/log/messages to grow at the
> rate of 100-200 megabytes per hour.  The actual rate is not constant
> but varies seemingly at random.
There must be a lot of file locking happening at different points
plus you must be using NFS v3 instead of v4.

> 
> I don't know what the reason for the messages is* but even if there
> is a genuine problem the system should not spam the logs like this.
> 
> (* This is an NFS client and doesn't run an NFS server; it mounts
> filesystems off a remote server which seems to be running fine,
> and the client seems to be able to mount and lock files with no
> issues. Running "rpcinfo -p" on both client and server says that both
> status and nlockmgr are known to the system.  However, tracing the local
> statd process shows no activity.  Restarting rpcbind and nfs-lock did
> not make any difference.)
Again, its probably a firewall problem... Either tell firewalld about
the port lockd/statd is listen on or simply shutdown or use v4.


I'm closing this for now... If its not a firewall issue please feel free to reopen...

Comment 2 Ian Collier 2014-10-31 11:04:40 UTC

Right, it turns out the actual culprit is evolution-addressbook-factory, which
spins in a tight loop trying to lock its database.

(To answer your questions: no, it's not a firewall issue, and yes we are using
NFS servers that only support v3.)

So nfs is almost in the clear here.  I say almost, because it shouldn't really
be possible for a userspace utility to cause a DoS in this way.  I suspect the
correct answer to that is to ask for an enhancement to rsyslog that collapses
identical messages.  (It always used to do that, but these days every kernel
message comes with a timestamp which stops them from being identical.)

I guess I'll leave this closed and report separately against
evolution-data-server.

Comment 3 Jeff Layton 2014-10-31 12:30:59 UTC

Yes, it would be good to have it handle -ENOLCK errors sanely. That said, it also wouldn't hurt to turn that printk into a printk_ratelimited. I'll send something to Bruce on the linux-nfs mailing list.

Note You need to log in before you can comment on or make changes to this bug.