Bug 1715254 - [Fix Proposal] Add nofile ulimit to default docker daemon options
Summary: [Fix Proposal] Add nofile ulimit to default docker daemon options
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: moby-engine
Version: 30
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Olivier Lemasle
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-29 23:35 UTC by Nicolas MURE
Modified: 2020-03-09 02:27 UTC (History)
4 users (show)

Fixed In Version: moby-engine-18.09.7-4.ce.git2d0083d.fc30
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-09 02:27:42 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Nicolas MURE 2019-05-29 23:35:16 UTC
Description of problem:

When installing the moby-engine package on a fresh fedora 30 machine, some of my docker images could not build anymore (`debian:jessie` based).

The `apt-get update` command was hanging. I first thought about a network issue but I was able to ping the outside world.

After `strace`ing the process (like explained in [1]), I saw the similar error than in [2], which was solved by setting a number of opened files limit.

Would you be interested by adding the following to the default daemon `OPTIONS` [3] ?
```
--default-ulimit nofile=1024:1024
```
(cf the docker CLI ref for daemon [4] and container [5])

[1] https://stackoverflow.com/questions/20980303/docker-build-freezes-installing-packages-from-apt
[2] https://bugs.launchpad.net/ubuntu/+source/apt/+bug/1332440
[3] https://src.fedoraproject.org/rpms/moby-engine/blob/master/f/docker.sysconfig
[4] https://docs.docker.com/engine/reference/commandline/dockerd/#default-ulimit-settings
[5] https://docs.docker.com/engine/reference/commandline/run/#set-ulimits-in-container---ulimit



Kernel : 5.0.17-300.fc30.x86_64

Installed `moby-engine` package details :
Name         : moby-engine
Version      : 18.06.3
Release      : 2.ce.gitd7080c1.fc30
Architecture : x86_64
Size         : 226 M
Source       : moby-engine-18.06.3-2.ce.gitd7080c1.fc30.src.rpm
Repository   : @System
From repo    : fedora

Steps to Reproduce:
1. install `moby-engine` package : `$ sudo dnf install -y moby-engine`
2. add yourself to the docker group : `$ sudo usermod -a -G docker <your_username>`
3. reboot (log logout and login again)
4. pull a `debian:jessie` image : `$ docker pull debian:jessie`
5. run a container : `$ docker run --rm -it debian:jessie /bin/bash`
6. try to update the package list: `container$ apt-get update` <= hangs

Actual results:

The `apt-get update` task ran into the container hangs (and consumes 100% of a CPU core).

Expected results:

the task should perform normally.

Additional info:

When setting the ulimit, the task perfoms normally.
I'd like to open a PR but it seems like forking the repo is not an easy step (although I accepted the contributor agreement).

Note that the issue does not appear with a `ubuntu:18.04` image.

Comment 1 Daniele Viganò 2019-06-25 10:27:54 UTC
+1

the default ulimit value (1073741816 on my Fedora 30) makes yum unusable from centos:7 but also other programs like the Erlang based (i.e. rabbitmq-server) have problems because of the high ulimit -n value

Comment 2 Fedora Update System 2019-07-13 12:43:25 UTC
FEDORA-2019-572b06a0f7 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-572b06a0f7

Comment 3 Fedora Update System 2019-07-14 03:07:37 UTC
moby-engine-18.09.7-4.ce.git2d0083d.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-572b06a0f7

Comment 4 sedrubal 2019-08-09 11:11:46 UTC
Bug 1723106 is related to this bug.

Comment 5 Nicolas MURE 2019-08-14 16:04:56 UTC
Thanks for the consideration,

The issue still persists on F30 with these changes when installing a package with yum inside a centos:7 container, e.g. :

root@container$ yum install -y gcc

One CPU core hangs at 100% usage and the process never completes.
(as reported by Daniele https://bugzilla.redhat.com/show_bug.cgi?id=1715254#c1 )

Comment 6 sedrubal 2019-08-14 20:28:43 UTC
Does anybody know the reason for the high limits? Is this a mistake? Podman also increases the limits compared to outside of the container, but far not that much.

To the maintainer: the priority of this issue should be quite high as it is currently very hard to run a Produktion grade docker infrastructure on fedora.

Comment 7 Nicolas MURE 2019-08-15 02:30:15 UTC
Here's the systemd config for the docker.service from docker-ce on fedora 30 [1]

```
cat /usr/lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
BindsTo=containerd.service
After=network-online.target firewalld.service containerd.service
Wants=network-online.target
Requires=docker.socket

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always

# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3

# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s

# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity

# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity

# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes

# kill only the docker process, not all processes in the cgroup
KillMode=process

[Install]
WantedBy=multi-user.target
```

As you can see, the ExecStart is quite simple :
`ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock`
I tried the above line instead of the one provided by `moby-engine`, but docker refuses to start when running `$ sudo systemctl start docker`.

Then, I reverted the `ExecStart` line to the original ones provided by `moby-engine`, and I tried with the OPTIONS of this commit [2] :

```
OPTIONS="--selinux-enabled \
  --log-driver=journald \
  --live-restore \
  --default-ulimit nofile=1024:1024 \
  --init-path /usr/libexec/docker/docker-init \
  --userland-proxy-path /usr/libexec/docker/docker-proxy \
"
```
(except for the `live-restore` option which was preventing me to run docker swarm mode);

and it was working :)

I was able to run a `yum install` inside a container :

```
me@host$ docker pull centos:7.6.1810
me@host$ docker run --rm -it centos:7.6.1810 yum install -y gcc
# ...
me@host$ echo $?
O
```

So far I was missing the `--init-path` and `--userland-proxy-path` options. I think these two have solved the issue.


Thank you for the fix :D
--------------------

[1] https://github.com/docker/for-linux/issues/600#issuecomment-515918169
[2] https://src.fedoraproject.org/rpms/moby-engine/c/b73040075e618f039def4adb0476adaba24b68bd

Comment 8 Nicolas MURE 2019-08-19 14:00:55 UTC
Actually, I had an other issue with the `--userland-proxy-path` option. I wasn't able to start a container having a port binding on my host:

```
starting container failed: container 66453e7d9a481dbd6a0d6c75717e86bc2c71bcc770d14139adc89716d6094808: endpoint join on GW Network failed: driver failed programming external connectivity on endpoint gateway_88e5e7ca7cb5 (bb3095e9cbd0dd23623878ea1b235a3672d3f9501cb6115d46d0a7746807976b): fork/exec /usr/libexec/docker/docker-proxy: no such file or directory
```

from the following config of the docker service :

```
services:
  varnish:
    ports:
      - published: 8080
        target: 80
        protocol: tcp
        mode: host
```
the 8080 port biding on the host could not be created.

There was nothing on my machine on that port yet ( `sudo lsof -i:8080` returned nothing)

Removing the `--userland-proxy-path` option fixed this issue (and I'm still able to install yum packages on the CentOS base container).

Comment 9 Fedora Update System 2020-03-09 02:27:42 UTC
moby-engine-18.09.7-4.ce.git2d0083d.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.