Bug 902483

Summary: Cannot handle 2000 mounts
Product: [Fedora] Fedora Reporter: Ben Greear <greearb>
Component: systemdAssignee: systemd-maint
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 17CC: david, joeym, johannbg, lnykryn, metherid, mschmidt, msekleta, notting, plautrba, systemd-maint, vpavlin
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-04 22:33:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ben Greear 2013-01-21 18:56:13 UTC
Description of problem:

I have a system with 2000+ mount points.  On shutdown, there are lots of
systemd errors about having too many open files.  Shutdown can hang after
this, although not always, and I'm not sure if this is root cause or not.

systemd[1]: Failed to kill control group: Too many open files
systemd[1]: Failed to kill control group: Too many open files
systemd[1]: irqbalance.service failed to kill processes: Too many open files
systemd[1]: Failed to kill control group: Too many open files
systemd[1]: irqbalance.service failed to kill processes: Too many open files
systemd[1]: Unit irqbalance.service entered failed state.
systemd[1]: Failed to kill control group: Too many open files
systemd[1]: atd.service failed to kill processes: Too many open files
systemd[1]: Failed to kill control group: Too many open files
systemd[1]: atd.service failed to kill processes: Too many open files
systemd[1]: Unit atd.service entered failed state.
systemd[1]: Failed to kill control group: Too many open files
systemd[1]: chronyd.service failed to kill processes: Too many open files
systemd[1]: Failed to kill control group: Too many open files
systemd[1]: chronyd.service failed to kill processes: Too many open files
systemd[1]: Unit chronyd.service entered failed state.
systemd[1]: Failed to kill control group: Too many open files



Version-Release number of selected component (if applicable):


How reproducible:

Error messages are always reproducible..the hang on shutdown is not,
and may or may not be related to systemd.


Steps to Reproduce:
1. Create 2000 NFS mounts (probably other mounts would work as well)
2. 'reboot'
3. Watch console output for the errors.
  
Actual results:

Lots of 'too many open files' error messages.


Expected results:

Clean shutdown.

Additional info:

Comment 1 Jóhann B. Guðmundsson 2013-01-21 19:39:47 UTC
Have you increased the default file descriptor limit in units/system.conf to handle all these mount points? 

Also see...

http://en.usenet.digipedia.org/thread/18978/19676/

Comment 2 Michal Schmidt 2013-01-22 12:40:13 UTC
This may be fixed by:

commit 4096d6f5879aef73e20dd7b62a01f447629945b0
Author: Lennart Poettering <lennart>
Date:   Mon Sep 17 16:35:59 2012 +0200

    main: bump up RLIMIT_NOFILE for systemd itself

... which is not in F17 however.

Comment 3 Ben Greear 2013-01-22 18:13:41 UTC
I tried editing file:

/etc/systemd/system.conf

and changed the setting as below:

DefaultLimitNOFILE=6000


It still complained and hung on reboot when I have 3000 mounts.
I have updated it to 12,000
and will try again.  But, maybe the systemd code needs some better
logic to deal with lots of open files and/or better recovery
logic if it does hit an error?

Even if it can't bring things down gracefully, it would be nice if
at least it could manage a reboot...

Comment 4 Michal Schmidt 2013-01-22 19:25:34 UTC
DefaultLimitNOFILE has no effect on systemd itself. It's a setting for the services it spawns.

We really need to backport that patch.

We can also consider economizing systemd's fd usage. For example, timerfds - we could have a tree of timeouts and schedule always only the earliest one using a single timerfd.

Comment 5 Ben Greear 2013-01-22 19:36:34 UTC
That patch in comment #2 still hard-codes things (though 64k is
big enough for anyone! (tm))

Maybe instead make it configurable in the system.conf file,
with a 64k default?

If someone can cook up a patched RPM for 64-bit Fedora 17,
I'll be happy to test it.

Comment 6 Lennart Poettering 2013-02-14 18:22:11 UTC
*** Bug 908531 has been marked as a duplicate of this bug. ***

Comment 7 Joe Miller 2013-02-14 20:25:33 UTC
We would love to see this backported to F17 which was our reason for opening the dupe bug 908531 (it was more of a request than a bug report.)

Ben, we have backported this to f17 rpms because it is very important to us. You can build rpm's yourself from this repo:  https://github.com/pantheon-systems/systemd


sudo yum install -y yum-utils rpm-build spectool
sudo yum-builddep systemd
git clone git:pantheon-systems/systemd.git
cd systemd/
git checkout f17
cd ..
mkdir -p ~/rpmbuild/{BUILD,RPMS,SOURCES,SPECS,SRPMS}
spectool --get-files --sourcedir systemd/systemd.spec
cp systemd/* ~/rpmbuild/SOURCES/
rpmbuild -ba systemd/systemd.spec

Comment 8 Fedora Update System 2013-02-15 10:15:33 UTC
systemd-44-24.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/systemd-44-24.fc17

Comment 9 Fedora Update System 2013-02-16 01:19:25 UTC
Package systemd-44-24.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-44-24.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-2564/systemd-44-24.fc17
then log in and leave karma (feedback).

Comment 10 Fedora Update System 2013-03-04 22:33:24 UTC
systemd-44-24.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.