Bug 1316855

Summary: systemctl reports no space left on device due to inotify "max_user_watches" limit
Product: [Fedora] Fedora Reporter: Andrea V. <vezza>
Component: systemdAssignee: systemd-maint
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: unspecified    
Version: 27CC: d.ananyev, extras-qa, jhasse, jik, johannbg, jsynacek, lnykryn, louis, metherid, mschmidt, msekleta, muadda, plautrba, psi-jack, s, systemd-maint, ted-redhat, vezza, vpavlin, zbyszek
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 894483 Environment:
Last Closed: 2018-11-30 19:02:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andrea V. 2016-03-11 10:15:17 UTC
+++ This bug was initially created as a clone of Bug #894483 +++

Description of problem: systemctl commands function normally but return "Error: no space left on device".


Version-Release number of selected component (if applicable): 
systemd-44-21.fc17.x86_64


How reproducible:Consistent


Steps to Reproduce:
1. Boot system normally
2. Run "sudo systemctl restart sshd" (replace sshd with any service)
3. Error produced
  
Actual results: "Error: no space left on device"


Expected results: No warning about space left on device


Additional info: It seems this is a result of the inotify "max_user_watches" limit. To silence the error I ran "echo 1048576 > /proc/sys/fs/inotify/max_user_watches".

As a temporary work around, you can run this command or set the following in /etc/sysctl.conf: fs.inotify.max_user_watches=1048576

--- Additional comment from Lennart Poettering on 2013-01-14 12:53:58 EST ---

Hmm, does this happen for all verbs, or only for some, such as "systemctl start"?

Not sure if we should do something about this. I mean, for some services we have to watch PID files with inotify, and if we can't do that we probably should refuse operation, the way we currently do. If the resources are all used up things will fail, which is kinda expected, no?

--- Additional comment from Ted W. on 2013-01-14 15:52:09 EST ---

I'm no expert on inotify or systemd so I may be investigating this improperly, however, upon checking both "df" and "df -i" it does not appear either have anything remotely full (maximum % full is around 20%). It appears that the verbs "start", "stop" and "reload" produce the message. Verbs such as "enable", "disable" and "kill" do not.

I should also specify, and I did not realize this at the time I originally filed this, the warning message does not impede proper functionality of systemd. The verbs will work as expected, minus the warning message. Is it possible that there is something on my system, such as number of running processes, number of files, partition size, etc which may be some order of magnitude larger than the default max_user_watch limit is assuming the "average" or "normal" use case would be?

Perhaps this is less of an issue with systemd and more with some default setting in Fedora or inotify? I completely understand that "if X is size n and X is full, further attempts to fill X will fail", I believe my question was more of "Why is X full?" and "Is there anything that can be done globally (upstream or otherwise) which would benefit future users" rather than just fixing the issue locally on my system and leaving it be.

(Please excuse me if any of the above was unclear, it's difficult for me to put thoughts together in this manner sometimes)

--- Additional comment from Jonathan Kamens on 2013-01-20 16:04:54 EST ---

I am seeing this as well. I think "Not sure if we should do something about this" in Comment 1 is just a little silly. OF COURSE you should "do something about this." It's ridiculous for a standard system command on a properly functioning, properly installed system to return a completely cryptic and nonsensical "Error: no space left on device" message.

--- Additional comment from Michal Schmidt on 2013-01-21 13:06:55 EST ---

sshd.service is a Type=simple service, it does not use a pidfile, so that's not what we need the inotify watch for. I think the error is reported by the password agent that systemctl spawns. Could you confirm that by producing a trace?:
strace -f -o /tmp/trace.txt systemctl restart sshd

--- Additional comment from Jonathan Kamens on 2013-01-22 10:03:12 EST ---

Trace attached.

--- Additional comment from Michal Schmidt on 2013-01-22 10:27:29 EST ---

It is indeed from the password agent:

12136 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f961bd14ad0) = 12137
...
12137 execve("/usr/bin/systemd-tty-ask-password-agent", ["/usr/bin/systemd-tty-ask-passwor"..., "--watch"], [/* 58 vars */] <unfinished ...>
...
12137 mkdir("/run/systemd/ask-password", 0755) = -1 EEXIST (File exists)
12137 stat("/run/systemd/ask-password", {st_mode=S_IFDIR|0755, st_size=40, ...}) = 0
12137 inotify_init1(IN_CLOEXEC)         = 4
12137 inotify_add_watch(4, "/run/systemd/ask-password", IN_CLOSE_WRITE|IN_MOVED_TO) = -1 ENOSPC (No space left on device)

The error message could be made clearer in this case.

--- Additional comment from Jonathan Kamens on 2013-01-23 09:05:06 EST ---

I can confirm that increasing fs.inotify.max_user_watches makes the error from systemctl go away, but this doesn't make any sense to me.

I have fs.inotify.max_user_watches set to 65535 when the error is occurring. According to "find /proc -lname anon_inode:inotify -print | wc -l", I have only 321 inotify watches total. That is obviously a lot less than 65535. So why does inotify think that I'm out of watches?

--- Additional comment from Jonathan Kamens on 2013-01-23 09:22:33 EST ---

I figured it out. looking for anon_inode:inotify doesn't show you all watches, it only shows you instances, I think. I'm using CrashPlan and it's creating a ton of watches and exceeding my limit. I still think there's a bug here, though -- the "No space left on device" error is completely nonintuitive and needs to be better about explaining what's going wrong. Like, the error could maybe have the word "inotify" in it?

--- Additional comment from Ted W. on 2013-01-23 14:02:37 EST ---

(In reply to comment #8)
> I figured it out. looking for anon_inode:inotify doesn't show you all
> watches, it only shows you instances, I think. I'm using CrashPlan and it's
> creating a ton of watches and exceeding my limit. I still think there's a
> bug here, though -- the "No space left on device" error is completely
> nonintuitive and needs to be better about explaining what's going wrong.
> Like, the error could maybe have the word "inotify" in it?

Interesting that you should bring up CrashPlan, I happen to be running it on the system with the errors as well. How did you determine how many watches it was using? I would be curious to run it on my system as well and determine if I am seeing the same thing. Perhaps there is something that needs reporting in CrashPlan as well.

--- Additional comment from Jonathan Kamens on 2013-01-23 18:29:37 EST ---

How I determined that CrashPlan was the culprit is described at http://unix.stackexchange.com/a/62284/31003 .

I don't think there's a bug in CrashPlan; it's simply monitoring the files you're backing up, which is a reasonable thing for it to be doing. This problem is already discussed in the CrashPlan documentation at http://support.crashplan.com/doku.php/client/troubleshooting/real-time#linux .

Note that if you have too many files being backed up by CrashPlan for inotify to be useable (too much kernel memory), you can disable real-time backup protection in the advanced backup settings in the crashplan desktop app.

--- Additional comment from Fedora End Of Life on 2013-07-03 23:36:28 EDT ---

This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

--- Additional comment from Fedora End Of Life on 2013-08-01 07:27:09 EDT ---

Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

--- Additional comment from Andrea V. on 2016-03-11 05:13:06 EST ---

I'd like to re-open thi bug for Fedora 23 x86_64...

Comment 1 Fedora End Of Life 2016-11-24 16:01:37 UTC
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 2 Fedora End Of Life 2016-12-20 19:23:14 UTC
Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 3 Eric Renfro 2017-02-02 02:13:38 UTC
Hmmm.. This bug got closed, yet not resolved.

I am experiencing this problem on CentOS 7.3 today, out of the blue I noticed it, which made me check my filesystems, then look for this issue on google and found this.

Comment 4 Louis van Dyk 2017-04-22 23:35:37 UTC
I am running on Fedora 25.  I also am having this - since installing Crashplan.  What do we do?  Open it under a new bug report again?

Comment 5 Jan Synacek 2017-05-23 15:51:35 UTC
Could you please provide the output of what exactly you run and what the exact output is? I can reproduce this somewhat, but I would like to see what you get in your case. "systemctl start/stop <something>" should be enough to reproduce this under right conditions.

Comment 6 Jan Niklas Hasse 2017-07-28 08:42:24 UTC
Kernel bug report about increasing the "max_user_watchers" default value: https://bugzilla.kernel.org/show_bug.cgi?id=190011

Comment 7 Fedora End Of Life 2017-11-16 19:02:19 UTC
This message is a reminder that Fedora 25 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 25. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '25'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 25 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 8 Fedora End Of Life 2017-12-12 10:59:12 UTC
Fedora 25 changed to end-of-life (EOL) status on 2017-12-12. Fedora 25 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 9 Ben Cotton 2018-11-27 17:36:50 UTC
This message is a reminder that Fedora 27 is nearing its end of life.
On 2018-Nov-30  Fedora will stop maintaining and issuing updates for
Fedora 27. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora  'version' of '27'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 27 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 10 Ben Cotton 2018-11-30 19:02:24 UTC
Fedora 27 changed to end-of-life (EOL) status on 2018-11-30. Fedora 27 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.