Bug 894483 - systemctl reports no space left on device due to inotify "max_user_watches" limit
Summary: systemctl reports no space left on device due to inotify "max_user_watches" l...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 17
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-01-11 20:56 UTC by Ted W.
Modified: 2016-03-11 10:17 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1316855 (view as bug list)
Environment:
Last Closed: 2013-08-01 11:27:03 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
strace output (24.36 KB, text/plain)
2013-01-22 15:03 UTC, Jonathan Kamens
no flags Details

Description Ted W. 2013-01-11 20:56:50 UTC
Description of problem: systemctl commands function normally but return "Error: no space left on device".


Version-Release number of selected component (if applicable): 
systemd-44-21.fc17.x86_64


How reproducible:Consistent


Steps to Reproduce:
1. Boot system normally
2. Run "sudo systemctl restart sshd" (replace sshd with any service)
3. Error produced
  
Actual results: "Error: no space left on device"


Expected results: No warning about space left on device


Additional info: It seems this is a result of the inotify "max_user_watches" limit. To silence the error I ran "echo 1048576 > /proc/sys/fs/inotify/max_user_watches".

As a temporary work around, you can run this command or set the following in /etc/sysctl.conf: fs.inotify.max_user_watches=1048576

Comment 1 Lennart Poettering 2013-01-14 17:53:58 UTC
Hmm, does this happen for all verbs, or only for some, such as "systemctl start"?

Not sure if we should do something about this. I mean, for some services we have to watch PID files with inotify, and if we can't do that we probably should refuse operation, the way we currently do. If the resources are all used up things will fail, which is kinda expected, no?

Comment 2 Ted W. 2013-01-14 20:52:09 UTC
I'm no expert on inotify or systemd so I may be investigating this improperly, however, upon checking both "df" and "df -i" it does not appear either have anything remotely full (maximum % full is around 20%). It appears that the verbs "start", "stop" and "reload" produce the message. Verbs such as "enable", "disable" and "kill" do not.

I should also specify, and I did not realize this at the time I originally filed this, the warning message does not impede proper functionality of systemd. The verbs will work as expected, minus the warning message. Is it possible that there is something on my system, such as number of running processes, number of files, partition size, etc which may be some order of magnitude larger than the default max_user_watch limit is assuming the "average" or "normal" use case would be?

Perhaps this is less of an issue with systemd and more with some default setting in Fedora or inotify? I completely understand that "if X is size n and X is full, further attempts to fill X will fail", I believe my question was more of "Why is X full?" and "Is there anything that can be done globally (upstream or otherwise) which would benefit future users" rather than just fixing the issue locally on my system and leaving it be.

(Please excuse me if any of the above was unclear, it's difficult for me to put thoughts together in this manner sometimes)

Comment 3 Jonathan Kamens 2013-01-20 21:04:54 UTC
I am seeing this as well. I think "Not sure if we should do something about this" in Comment 1 is just a little silly. OF COURSE you should "do something about this." It's ridiculous for a standard system command on a properly functioning, properly installed system to return a completely cryptic and nonsensical "Error: no space left on device" message.

Comment 4 Michal Schmidt 2013-01-21 18:06:55 UTC
sshd.service is a Type=simple service, it does not use a pidfile, so that's not what we need the inotify watch for. I think the error is reported by the password agent that systemctl spawns. Could you confirm that by producing a trace?:
strace -f -o /tmp/trace.txt systemctl restart sshd

Comment 5 Jonathan Kamens 2013-01-22 15:03:12 UTC
Created attachment 685219 [details]
strace output

Trace attached.

Comment 6 Michal Schmidt 2013-01-22 15:27:29 UTC
It is indeed from the password agent:

12136 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f961bd14ad0) = 12137
...
12137 execve("/usr/bin/systemd-tty-ask-password-agent", ["/usr/bin/systemd-tty-ask-passwor"..., "--watch"], [/* 58 vars */] <unfinished ...>
...
12137 mkdir("/run/systemd/ask-password", 0755) = -1 EEXIST (File exists)
12137 stat("/run/systemd/ask-password", {st_mode=S_IFDIR|0755, st_size=40, ...}) = 0
12137 inotify_init1(IN_CLOEXEC)         = 4
12137 inotify_add_watch(4, "/run/systemd/ask-password", IN_CLOSE_WRITE|IN_MOVED_TO) = -1 ENOSPC (No space left on device)

The error message could be made clearer in this case.

Comment 7 Jonathan Kamens 2013-01-23 14:05:06 UTC
I can confirm that increasing fs.inotify.max_user_watches makes the error from systemctl go away, but this doesn't make any sense to me.

I have fs.inotify.max_user_watches set to 65535 when the error is occurring. According to "find /proc -lname anon_inode:inotify -print | wc -l", I have only 321 inotify watches total. That is obviously a lot less than 65535. So why does inotify think that I'm out of watches?

Comment 8 Jonathan Kamens 2013-01-23 14:22:33 UTC
I figured it out. looking for anon_inode:inotify doesn't show you all watches, it only shows you instances, I think. I'm using CrashPlan and it's creating a ton of watches and exceeding my limit. I still think there's a bug here, though -- the "No space left on device" error is completely nonintuitive and needs to be better about explaining what's going wrong. Like, the error could maybe have the word "inotify" in it?

Comment 9 Ted W. 2013-01-23 19:02:37 UTC
(In reply to comment #8)
> I figured it out. looking for anon_inode:inotify doesn't show you all
> watches, it only shows you instances, I think. I'm using CrashPlan and it's
> creating a ton of watches and exceeding my limit. I still think there's a
> bug here, though -- the "No space left on device" error is completely
> nonintuitive and needs to be better about explaining what's going wrong.
> Like, the error could maybe have the word "inotify" in it?

Interesting that you should bring up CrashPlan, I happen to be running it on the system with the errors as well. How did you determine how many watches it was using? I would be curious to run it on my system as well and determine if I am seeing the same thing. Perhaps there is something that needs reporting in CrashPlan as well.

Comment 10 Jonathan Kamens 2013-01-23 23:29:37 UTC
How I determined that CrashPlan was the culprit is described at http://unix.stackexchange.com/a/62284/31003 .

I don't think there's a bug in CrashPlan; it's simply monitoring the files you're backing up, which is a reasonable thing for it to be doing. This problem is already discussed in the CrashPlan documentation at http://support.crashplan.com/doku.php/client/troubleshooting/real-time#linux .

Note that if you have too many files being backed up by CrashPlan for inotify to be useable (too much kernel memory), you can disable real-time backup protection in the advanced backup settings in the crashplan desktop app.

Comment 11 Fedora End Of Life 2013-07-04 03:36:28 UTC
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 12 Fedora End Of Life 2013-08-01 11:27:09 UTC
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 13 Andrea V. 2016-03-11 10:13:06 UTC
I'd like to re-open thi bug for Fedora 23 x86_64...

Comment 14 Andrea V. 2016-03-11 10:17:15 UTC
(In reply to Andrea V. from comment #13)
> I'd like to re-open thi bug for Fedora 23 x86_64...

Done, see... https://bugzilla.redhat.com/show_bug.cgi?id=1316855


Note You need to log in before you can comment on or make changes to this bug.