Bug 1452933 - systemctl incorrectly reports "Error: No space left on device" when restarting a service
Summary: systemctl incorrectly reports "Error: No space left on device" when restartin...
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: systemd
Version: 7.3
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: ---
Assignee: Jan Synacek
QA Contact: qe-baseos-daemons
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-20 16:24 UTC by James Boyle
Modified: 2019-05-01 23:56 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)
reproducer (10.00 KB, application/x-tar)
2017-05-23 09:53 UTC, Jan Synacek
no flags Details
patch (1.57 KB, patch)
2017-05-25 09:29 UTC, Jan Synacek
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1316855 None CLOSED systemctl reports no space left on device due to inotify "max_user_watches" limit 2019-08-01 05:05:12 UTC

Internal Links: 1316855

Description James Boyle 2017-05-20 16:24:05 UTC
Description of problem:
systemctl incorrectly reports "Error: No space left on device" when restarting a service.  The issue is related to the kernel setting fs.inotify.max_user_watches

Version-Release number of selected component (if applicable):
# rpm -qa |grep systemd
systemd-219-30.el7_3.8.x86_64

How reproducible:
Always

Steps to Reproduce / Actual Results:
(Have Crashplan backup client installed - or another application using up user_watches)
# sysctl fs.inotify.max_user_watches=8192
fs.inotify.max_user_watches = 8192
# systemctl restart ntpd
Error: No space left on device
# sysctl -w fs.inotify.max_user_watches=131072
fs.inotify.max_user_watches = 131072
# systemctl restart ntpd
#

Expected results:
Error message describing the actual root cause of the error condition: max_user_watches exhausted.  

Additional info:
See also BZ# 1316855 and BZ# 894483

Comment 2 Jan Synacek 2017-05-23 09:53:16 UTC
Created attachment 1281430 [details]
reproducer

Steps to reproduce:
1) Untar the reproducer.
2) $ sh run.sh
3) Observe.

Note that the reproducer sets max_user_watches to 8192. If, by any chance, step 2 doesn't reproduce the issue, any subsequent "systemctl start test" followed by "systemctl stop test" should do it.

The thing here is that systemd simply translates the errno returned by inotify_add_watch(), which is ENOSPC and then translates it into string, which is "No space left on device", as can be seen in the output of "systemctl status test". I think that, in this particular case, it would be reasonable to add one if (r == -ENOSPC) {...} to create a message fitting for this case instead of a general one.

Comment 3 Jan Synacek 2017-05-24 06:20:33 UTC
Rescheduling this for RHEL-7.5.

(In reply to Jan Synacek from comment #2)
> I think that, in this particular case, it would be
> reasonable to add one if (r == -ENOSPC) {...} to create a message fitting
> for this case instead of a general one.

Unfortunately, after some debugging I found out that it's actually not so trivial to fix this bug.

Comment 4 Jan Synacek 2017-05-25 09:23:24 UTC
(In reply to Jan Synacek from comment #3)
> Unfortunately, after some debugging I found out that it's actually not so
> trivial to fix this bug.

Scratch that, it is easy to fix, it just wasn't obvious at first sight...

https://github.com/systemd/systemd/issues/6030

Comment 5 Jan Synacek 2017-05-25 09:29:09 UTC
Created attachment 1282191 [details]
patch

This patch fixes this particular problem, which will probably be enough. I figure that a correct upstream patch would be more general, though.


Note You need to log in before you can comment on or make changes to this bug.