Bug 1266508

Summary: defaults of tmpfs on PPC64LE architecture
Product: [Fedora] Fedora Reporter: Miroslav Suchý <msuchy>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: unspecified    
Version: rawhideCC: gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, mchehab, msuchy
Target Milestone: ---Keywords: Tracking
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1071880    

Description Miroslav Suchý 2015-09-25 12:52:58 UTC
When you do:
  mount -t tmpfs tmpfs /mnt/foo
then is tmpfs created with some default count of inodes. From documentation (man mount):

       nr_inodes=
              The  maximum  number of inodes for this instance.  The default is
              half of the number of your physical RAM pages, or (on  a  machine
              with  highmem)  the  number of lowmem RAM pages, whichever is the
              lower.

This is reasonable number on Intel architectures. However PPC64LE has 16 times bigger RAM page and therefore 16 times less number of inodes. Therefore I pretty quickly use all inodes on PPC64LE.

While this can be workaround with
  if $arch = 'PPC64LE' then
      mount volueme with -o nr_inodes=XXXX
it definitelly sucks as you usually do not know about this feature before it hit you.

To put it in use case - I am building packages using mock and whole chroot is in tmpfs. I have some memory (5GB on Intel, 8 GB on PPC64) and large swap (50 GB).
When I create tmpfs on Intel I get filesystem with 664k inodes while on PPC64LE architecture I get only 60k inodes. And that is insufficient.

I will workaround it in Mock in bug 1266453. But other users may benefit from changing the default on PPC64LE.
BTW: what is the reason to set number of inodes based on number of RAM pages. Now you get the same number of inodes if you create 1kB tmpfs or 100GB tmpfs. If you would count number of inodes according the size, you will resolve this architecture differences -- for all of them.

Filing against kernel as maintainer of util-linux told me that those defaults are in kernel.

Comment 1 Josh Boyer 2015-09-28 13:27:49 UTC
A few things.  First, we aren't going to change the defaults in the kernel itself only in Fedora.  Such changes should be made upstream first.

Second, this isn't limited to ppc64le.  Any architecture which uses a larger page size has this issue.  That includes at least ppc64le, ppc64, and aarch64.  Additionally, it is configurable on those architectures so it is possible to have one kernel running there that uses 4k pages and another that uses 64k pages.  Rather than hard coding mock to look for 'ppc64le', it would be better to simply query the system for the page size and set appropriate defaults according to what it returns.

As for why the kernel defaults to a percentage of the current number of pages, it is because anyone with write access to the tmpfs mount can consume system memory.  If it is unlimited, then a regular user can consume all the memory on a system simply by writing a bunch of files.

If you specify nr_inodes=0, then the number of inodes will no longer be limited.  That might work for your cases, but it would only take one errant build to run the whole machine out of memory.  (Yes, it may be swapped but as soon as you start hitting swap it negates the performance benefits of using a tmpfs to begin with and you can still exhaust swap space too.)

Comment 2 Miroslav Suchý 2015-09-29 06:31:07 UTC
(In reply to Josh Boyer from comment #1)
> A few things.  First, we aren't going to change the defaults in the kernel
> itself only in Fedora.  Such changes should be made upstream first.

*nod* However I am not subscribed to any kernel mailing list and I know nothing about kernel development. So I will leave it up to you guys - after discussion here - to either persuade the change in upstream (or close this BZ). 
 
> As for why the kernel defaults to a percentage of the current number of
> pages, it is because anyone with write access to the tmpfs mount can consume
> system memory.  If it is unlimited, then a regular user can consume all the
> memory on a system simply by writing a bunch of files.

This make sense for *size* but you can barely exhaust memory by consuming inodes table.
 
> If you specify nr_inodes=0, then the number of inodes will no longer be
> limited.

I am aware of that, and I will use it in Mock as workaround. But I'm defending sensible defaults for everyone on all architectures.

> That might work for your cases, but it would only take one errant
> build to run the whole machine out of memory.  (Yes, it may be swapped but
> as soon as you start hitting swap it negates the performance benefits of
> using a tmpfs to begin with and you can still exhaust swap space too.)

Again, it is more about size of that volume. And if I create 50 GB tmpfs on system with 8GB RAM without appropriate swap... well you provide gun and I'm the one who is shooting myself into my leg.
It simply does not have sense to create 50 GB tmpfs and (with defaults on) be able to write there just 5GB.

Regarding the benefits when you start swapping: It still does have sense (at least for me), because I get file system that get mostly in memory and rarely used bits are swapped out and I get LRU for free.
I even benchmarked it and it is really big improvement:
http://miroslav.suchy.cz/blog/archives/2015/05/28/increase_mock_performance_-_build_packages_in_memory/index.html

Comment 3 Justin M. Forbes 2015-10-20 19:37:51 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 22 kernel bugs.

Fedora 22 has now been rebased to 4.2.3-200.fc22.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 23, and are still experiencing this issue, please change the version to Fedora 23.

If you experience different issues, please open a new bug report for those.

Comment 4 Laura Abbott 2016-09-23 19:46:23 UTC
*********** MASS BUG UPDATE **************
 
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 23 kernel bugs.
 
Fedora 23 has now been rebased to 4.7.4-100.fc23.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 24 or 25, and are still experiencing this issue, please change the version to Fedora 24 or 25.
 
If you experience different issues, please open a new bug report for those.

Comment 5 Laura Abbott 2017-01-17 01:21:28 UTC
*********** MASS BUG UPDATE **************
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 25 kernel bugs.
 
Fedora 25 has now been rebased to 4.9.3-200.fc25.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 26, and are still experiencing this issue, please change the version to Fedora 26.
 
If you experience different issues, please open a new bug report for those.