Bug 1715254

Summary: [Fix Proposal] Add nofile ulimit to default docker daemon options
Product: [Fedora] Fedora Reporter: Nicolas MURE <nm2107>
Component: moby-engineAssignee: Olivier Lemasle <o.lemasle>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 30CC: dennyvatwork, fedora.dm0, fedora, o.lemasle
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: moby-engine-18.09.7-4.ce.git2d0083d.fc30 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-09 02:27:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nicolas MURE 2019-05-29 23:35:16 UTC
Description of problem:

When installing the moby-engine package on a fresh fedora 30 machine, some of my docker images could not build anymore (`debian:jessie` based).

The `apt-get update` command was hanging. I first thought about a network issue but I was able to ping the outside world.

After `strace`ing the process (like explained in [1]), I saw the similar error than in [2], which was solved by setting a number of opened files limit.

Would you be interested by adding the following to the default daemon `OPTIONS` [3] ?
```
--default-ulimit nofile=1024:1024
```
(cf the docker CLI ref for daemon [4] and container [5])

[1] https://stackoverflow.com/questions/20980303/docker-build-freezes-installing-packages-from-apt
[2] https://bugs.launchpad.net/ubuntu/+source/apt/+bug/1332440
[3] https://src.fedoraproject.org/rpms/moby-engine/blob/master/f/docker.sysconfig
[4] https://docs.docker.com/engine/reference/commandline/dockerd/#default-ulimit-settings
[5] https://docs.docker.com/engine/reference/commandline/run/#set-ulimits-in-container---ulimit



Kernel : 5.0.17-300.fc30.x86_64

Installed `moby-engine` package details :
Name         : moby-engine
Version      : 18.06.3
Release      : 2.ce.gitd7080c1.fc30
Architecture : x86_64
Size         : 226 M
Source       : moby-engine-18.06.3-2.ce.gitd7080c1.fc30.src.rpm
Repository   : @System
From repo    : fedora

Steps to Reproduce:
1. install `moby-engine` package : `$ sudo dnf install -y moby-engine`
2. add yourself to the docker group : `$ sudo usermod -a -G docker <your_username>`
3. reboot (log logout and login again)
4. pull a `debian:jessie` image : `$ docker pull debian:jessie`
5. run a container : `$ docker run --rm -it debian:jessie /bin/bash`
6. try to update the package list: `container$ apt-get update` <= hangs

Actual results:

The `apt-get update` task ran into the container hangs (and consumes 100% of a CPU core).

Expected results:

the task should perform normally.

Additional info:

When setting the ulimit, the task perfoms normally.
I'd like to open a PR but it seems like forking the repo is not an easy step (although I accepted the contributor agreement).

Note that the issue does not appear with a `ubuntu:18.04` image.

Comment 1 Daniele ViganĂ² 2019-06-25 10:27:54 UTC
+1

the default ulimit value (1073741816 on my Fedora 30) makes yum unusable from centos:7 but also other programs like the Erlang based (i.e. rabbitmq-server) have problems because of the high ulimit -n value

Comment 2 Fedora Update System 2019-07-13 12:43:25 UTC
FEDORA-2019-572b06a0f7 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-572b06a0f7

Comment 3 Fedora Update System 2019-07-14 03:07:37 UTC
moby-engine-18.09.7-4.ce.git2d0083d.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-572b06a0f7

Comment 4 sedrubal 2019-08-09 11:11:46 UTC
Bug 1723106 is related to this bug.

Comment 5 Nicolas MURE 2019-08-14 16:04:56 UTC
Thanks for the consideration,

The issue still persists on F30 with these changes when installing a package with yum inside a centos:7 container, e.g. :

root@container$ yum install -y gcc

One CPU core hangs at 100% usage and the process never completes.
(as reported by Daniele https://bugzilla.redhat.com/show_bug.cgi?id=1715254#c1 )

Comment 6 sedrubal 2019-08-14 20:28:43 UTC
Does anybody know the reason for the high limits? Is this a mistake? Podman also increases the limits compared to outside of the container, but far not that much.

To the maintainer: the priority of this issue should be quite high as it is currently very hard to run a Produktion grade docker infrastructure on fedora.

Comment 7 Nicolas MURE 2019-08-15 02:30:15 UTC
Here's the systemd config for the docker.service from docker-ce on fedora 30 [1]

```
cat /usr/lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
BindsTo=containerd.service
After=network-online.target firewalld.service containerd.service
Wants=network-online.target
Requires=docker.socket

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always

# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3

# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s

# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity

# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity

# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes

# kill only the docker process, not all processes in the cgroup
KillMode=process

[Install]
WantedBy=multi-user.target
```

As you can see, the ExecStart is quite simple :
`ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock`
I tried the above line instead of the one provided by `moby-engine`, but docker refuses to start when running `$ sudo systemctl start docker`.

Then, I reverted the `ExecStart` line to the original ones provided by `moby-engine`, and I tried with the OPTIONS of this commit [2] :

```
OPTIONS="--selinux-enabled \
  --log-driver=journald \
  --live-restore \
  --default-ulimit nofile=1024:1024 \
  --init-path /usr/libexec/docker/docker-init \
  --userland-proxy-path /usr/libexec/docker/docker-proxy \
"
```
(except for the `live-restore` option which was preventing me to run docker swarm mode);

and it was working :)

I was able to run a `yum install` inside a container :

```
me@host$ docker pull centos:7.6.1810
me@host$ docker run --rm -it centos:7.6.1810 yum install -y gcc
# ...
me@host$ echo $?
O
```

So far I was missing the `--init-path` and `--userland-proxy-path` options. I think these two have solved the issue.


Thank you for the fix :D
--------------------

[1] https://github.com/docker/for-linux/issues/600#issuecomment-515918169
[2] https://src.fedoraproject.org/rpms/moby-engine/c/b73040075e618f039def4adb0476adaba24b68bd

Comment 8 Nicolas MURE 2019-08-19 14:00:55 UTC
Actually, I had an other issue with the `--userland-proxy-path` option. I wasn't able to start a container having a port binding on my host:

```
starting container failed: container 66453e7d9a481dbd6a0d6c75717e86bc2c71bcc770d14139adc89716d6094808: endpoint join on GW Network failed: driver failed programming external connectivity on endpoint gateway_88e5e7ca7cb5 (bb3095e9cbd0dd23623878ea1b235a3672d3f9501cb6115d46d0a7746807976b): fork/exec /usr/libexec/docker/docker-proxy: no such file or directory
```

from the following config of the docker service :

```
services:
  varnish:
    ports:
      - published: 8080
        target: 80
        protocol: tcp
        mode: host
```
the 8080 port biding on the host could not be created.

There was nothing on my machine on that port yet ( `sudo lsof -i:8080` returned nothing)

Removing the `--userland-proxy-path` option fixed this issue (and I'm still able to install yum packages on the CentOS base container).

Comment 9 Fedora Update System 2020-03-09 02:27:42 UTC
moby-engine-18.09.7-4.ce.git2d0083d.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.