Bug 1406435 - If no limit to memory provided on container /tmp ends up at 4EB
Summary: If no limit to memory provided on container /tmp ends up at 4EB
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: oci-systemd-hook
Version: 25
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Daniel Walsh
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-20 14:09 UTC by James Hogarth
Modified: 2017-04-01 16:57 UTC (History)
5 users (show)

Fixed In Version: oci-systemd-hook-0.1.6-1.gitfe22236.fc25 oci-systemd-hook-0.1.6-1.gitfe22236.fc26
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-14 17:23:00 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description James Hogarth 2016-12-20 14:09:33 UTC
Description of problem:
When using a systemd based container the oci-systemd-hook makes use of /sys/fs/cgroup/memory/memory.limit_in_bytes to determine the size to make /tmp 

If not explicitly limiting the container this results in a 'limit' of 9223372036854771712/2 ... or 4EB to make it readable.

a) this is insane as few systems will have 4EB of RAM available to back this ;)
b) this causes issues on anything that uses the detected size to work out max sizing 
c) this breaks 32bit apps in a lovely subtle way. Since getconf FILESIZEBITS /tmp results in 32 the assumptions is large file support is not required. So statfs() (and family) get called instead of the 64bit equivalents and the application promptly fails with: statfs("/tmp/isjbeOWOh", 0xffa15b60)    = -1 EOVERFLOW (Value too large for defined data type)


Version-Release number of selected component (if applicable):
docker-1.12.3-12.git97974ae.fc25.x86_64
oci-systemd-hook-0.1.4-3.git41491a3.fc25.x86_64


How reproducible:
deterministic

Steps to Reproduce:
1. 

cat > Dockerfile.systemd-test << EOF
FROM centos:latest

RUN yum -y install systemd bash
ENTRYPOINT ["/sbin/init"]
EOF


2. docker build -f Dockerfile.systemd-test -t systemd-test . 
3. docker run -d --name systemd-test systemd-test
4. docker exec systemd-test df -h

Actual results:
Filesystem                                             Size  Used Avail Use% Mounted on
/dev/mapper/luks-533c35ab-2572-45ad-b18b-5203c8b8563f  231G  183G   46G  81% /
tmpfs                                                  3.9G     0  3.9G   0% /dev
tmpfs                                                  3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/mapper/luks-533c35ab-2572-45ad-b18b-5203c8b8563f  231G  183G   46G  81% /etc/hosts
shm                                                     64M     0   64M   0% /dev/shm
tmpfs                                                   64M   16K   64M   1% /run
tmpfs                                                  4.0E     0  4.0E   0% /tmp


Expected results:
A /tmp that is a sensible size

Additional info:
This happens on a RHEL7 host as well and also with a fedora container

Comment 1 Daniel Walsh 2016-12-20 14:22:56 UTC
I believe the limit is half of physical memory for a tmpfs, which is the same fro the host.

   Mount options for tmpfs
       size=nbytes
              Override  default  maximum  size of the filesystem.  The size is
              given in bytes, and rounded up to entire pages.  The default  is
              half  of the memory.  The size parameter also accepts a suffix %
              to limit this tmpfs instance to that percentage of your physical
              RAM:  the default, when neither size nor nr_blocks is specified,
              is size=50%


Are you saying we are doing something wrong that is allowing this to go behond that?

Comment 2 James Hogarth 2016-12-20 14:34:42 UTC
For anyone that bumps into this in their applications there are a couple of workarounds that can be employed:

1) use docker run -m XG -d systemd

Where X is the memory to limit the guest to, rather than permitting infinite which is the default. The hook then uses X/2 for the /tmp size 

2) use docker run -v /tmp -d systemd

This then has /tmp in the volumes listed in the container config so the hook skips mounting of /tmp letting it get the sensible sizing of the underlying volume presented. Note that --tmpfs /tmp:rw,mode=1777,size=10G doesn't work as it doesn't get listed in volumes, which is what oci-systemd-hook checks. 

It might be sensible for the hook to also check any tmpfs structures rather than just volume mounts as a nice way of configuring this.

This workaround has the distinct disadvantage that volumes don't get removed by default when a container is removed so could end up with space usage creep.

Comment 3 James Hogarth 2016-12-20 14:41:51 UTC
Yeah Dan - that might be the intention but the path used to get that info doesn't result in the expected values ...

https://github.com/projectatomic/oci-systemd-hook/blob/master/src/systemdhook.c#L482

That gets the limit as half the limit set by the memory cgroup:

/sys/fs/cgroup/memory/memory.limit_in_bytes

But in a default setup there is no memory limit, so this defaults to maximum accessible on the architecture ... which is ~8EB ... not the actual physical RAM size on the host.

This then results in the 4EB /tmp assigned ... which is kind of crazy ;)

Note that the hook doesn't use default tmpfs mount options for the sizing but explicitly sets it:

https://github.com/projectatomic/oci-systemd-hook/blob/master/src/systemdhook.c#L513

rc = asprintf(&options, "mode=1777,size=%" PRIu64 "k", memory_limit_in_kb);

Comment 4 Daniel Walsh 2016-12-20 15:42:13 UTC
https://github.com/projectatomic/oci-systemd-hook/pull/40

Any chance you could check if this fixes your issue.

Comment 5 Daniel Walsh 2016-12-20 15:42:56 UTC
James, you are also saying the 

docker run --tmpfs /tmp:... Does not fix the issue?

Comment 6 James Hogarth 2016-12-21 13:05:27 UTC
sure i'll give it a go this afternoon after my lunch ;)

and indeed I tried docker run --tmpfs /tmp:rw,mode=1777,size=10G and it didn't fix it ... when you do a df -h in the container it still shows as 4EB in that instance.

I haven't had time to delve into the whys of that deeply, but on a quick check --tmpfs based things don't populate Mounts in the docker structure returned by inspect on the container, they appear in Tmpfs instead.

The code to check if it's mounted already only looks at Mounts:

https://github.com/projectatomic/oci-systemd-hook/blob/master/src/systemdhook.c#L775

A work around from the hook point of view would be to check mounts and tmpfs ... a better 'fix' for this, given it'd be a little unexpected not to consider this a mount, would be for docker to populate tmpfs stuff into the mounts array as well.

 docker run --tmpfs /tmp:rw,mode=1777,size=15G --name blahbalhfoo  centos /bin/bash

 docker inspect blahbalhfoo  | grep -iE '(mount|volume|tmpfs)'
        "MountLabel": "system_u:object_r:container_file_t:s0:c204,c767",
            "VolumeDriver": "",
            "VolumesFrom": null,
            "Tmpfs": {
        "Mounts": [],
            "Volumes": null,

Comment 7 James Hogarth 2016-12-21 15:06:48 UTC
Confirming that a build from master (with the PR already merged) behaves with an expected behaviour of half my system RAM:

[ja.hogarth@lap37607 ansible_role_wsc]$ docker exec systemd-test df -h
Filesystem                                             Size  Used Avail Use% Mounted on
/dev/mapper/luks-533c35ab-2572-45ad-b18b-5203c8b8563f  231G  192G   37G  85% /
tmpfs                                                  3.9G     0  3.9G   0% /dev
tmpfs                                                  3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/mapper/luks-533c35ab-2572-45ad-b18b-5203c8b8563f  231G  192G   37G  85% /etc/hosts
shm                                                     64M     0   64M   0% /dev/shm
tmpfs                                                   64M   16K   64M   1% /run
tmpfs                                                  3.9G     0  3.9G   0% /tmp
tmpfs                                                  3.9G  4.0K  3.9G   1% /var/log

Comment 8 James Hogarth 2016-12-21 15:16:04 UTC
The --tmpfs case now has a speparate bug and reproducible test case bz1406830

Comment 9 Daniel Walsh 2017-02-09 14:23:48 UTC
FIxed in oci-systemd-hook-0.1.5-1.git16f7c8a.fc25

Comment 10 Fedora Update System 2017-03-12 11:41:56 UTC
oci-systemd-hook-0.1.6-1.gitfe22236.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-a25973481c

Comment 11 Fedora Update System 2017-03-12 11:42:12 UTC
oci-systemd-hook-0.1.6-1.gitfe22236.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-5e4259e590

Comment 12 Fedora Update System 2017-03-13 00:21:49 UTC
oci-systemd-hook-0.1.6-1.gitfe22236.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-5e4259e590

Comment 13 Fedora Update System 2017-03-13 01:51:16 UTC
oci-systemd-hook-0.1.6-1.gitfe22236.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-a25973481c

Comment 14 Fedora Update System 2017-03-14 17:23:00 UTC
oci-systemd-hook-0.1.6-1.gitfe22236.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.

Comment 15 Fedora Update System 2017-04-01 16:57:23 UTC
oci-systemd-hook-0.1.6-1.gitfe22236.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.