Bug 1489505

Summary: multiple docker cp operations exhaust system memory
Product: [Fedora] Fedora Reporter: Jonathan Lebon <jlebon>
Component: dockerAssignee: Antonio Murdaca <amurdaca>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 26CC: adimania, admiller, amurdaca, andreas.bierfert, bbaude, dwalsh, fkluknav, ichavero, jcajka, jchaloup, jlebon, lsm5, marianne, miabbott, miminar, mpitt, nalin, vbatts
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1489517 (view as bug list) Environment:
Last Closed: 2018-05-29 11:38:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1489517    

Description Jonathan Lebon 2017-09-07 14:51:44 UTC
Description of problem:

On the latest Fedora 26 AH, it seems really easy to get the docker daemon to exhaust all the system memory by repeated `docker cp` commands until it fails with:

Untar error on re-exec cmd: fork/exec /proc/self/exe: cannot allocate memory

This makes it impractical in testing environments where we need to routinely spin up and provision throw-away containers.

Version-Release number of selected component (if applicable):

[root@jlebon ~]# rpm-ostree status
State: idle
Deployments:
* fedora-atomic:fedora/26/x86_64/atomic-host
                   Version: 26.101 (2017-08-06 21:27:14)
                    Commit: f6331bcd14577e0ee43db3ba5a44e0f63f74a86e3955604c20542df0b7ad8ad6
[root@jlebon ~]# rpm -q docker
docker-1.13.1-19.git27e468e.fc26.x86_64

How reproducible:

Always.

Steps to Reproduce:

[root@jlebon ~]# cat reproducer.sh
#!/bin/bash
set -xeuo pipefail

dd if=/dev/urandom of=myfile count=500000
docker run --detach --name cnt registry.fedoraproject.org/fedora:26 sleep infinity
for i in $(seq 5); do
  docker cp myfile cnt:/var/tmp
  docker cp cnt:/var/tmp/myfile .
done
[root@jlebon ~]# sh reproducer.sh
+ dd if=/dev/urandom of=myfile count=500000
500000+0 records in
500000+0 records out
256000000 bytes (256 MB, 244 MiB) copied, 1.81003 s, 141 MB/s
+ docker run --detach --name cnt registry.fedoraproject.org/fedora:26 sleep infinity
5c2203de1f075d2198194dc4cf2d7a3e891e2973f997ae19c9fb695e0922025f
++ seq 5
+ for i in $(seq 5)
+ docker cp myfile cnt:/var/tmp
Error response from daemon: Untar error on re-exec cmd: fork/exec /proc/self/exe: cannot allocate memory
[root@jlebon ~]#

Actual results:

Crashes

Expected results:

Doesn't crash

Additional info:

Looking at the memory usage of the docker service (just a simple `watch -n 1 systemctl status docker`), it's as if it's mapping the whole file into memory and not releasing it.

Comment 1 Jonathan Lebon 2017-09-07 14:54:11 UTC
Also reproduced on the latest Fedora 26 AH release:

[root@jlebon ~]# rpm-ostree status
State: idle
Deployments:
* fedora-atomic:fedora/26/x86_64/atomic-host
                   Version: 26.110 (2017-08-20 18:10:09)
                    Commit: 13ed0f241c9945fd5253689ccd081b5478e5841a71909020e719437bbeb74424

  fedora-atomic:fedora/26/x86_64/atomic-host
                   Version: 26.101 (2017-08-06 21:27:14)
                    Commit: f6331bcd14577e0ee43db3ba5a44e0f63f74a86e3955604c20542df0b7ad8ad6

Comment 2 Jonathan Lebon 2017-09-07 14:58:48 UTC
I hit this specifically when trying to move PAPR[1] to Fedora 26 AH. We often spin up e.g. 8 containers at a time there where we need to `docker cp` whole git repositories simultaneously. Though as the reproducer shows, it doesn't even have to be simultaneous.

I forgot to add the docker version included in v26.110 in my previous comment:

# rpm -q docker
docker-1.13.1-21.git27e468e.fc26.x86_64

[1] https://github.com/projectatomic/papr

Comment 3 Martin Pitt 2018-02-15 14:58:48 UTC
Same bug on RHEL side: https://bugzilla.redhat.com/show_bug.cgi?id=1489517

This keeps us from updating Cockpit's OpenShift test VM to something newer than Fedora 25 (which is EOL).

Comment 4 Jonathan Lebon 2018-02-22 22:00:18 UTC
AFAICT, this is no longer an issue in the latest Fedora 27 AH release at least.

# rpm-ostree status
State: idle
Deployments:
* fedora-atomic:fedora/27/x86_64/atomic-host
                   Version: 27.81 (2018-02-12 17:50:48)
                    Commit: b25bde0109441817f912ece57ca1fc39efc60e6cef4a7a23ad9de51b1f36b742
              GPGSignature: Valid signature by 860E19B0AFA800A1751881A6F55E7430F5282EE4

Martin, might be worth checking if that is the case for you as well. Then we can probably close this bug.

Comment 5 Martin Pitt 2018-02-23 14:25:53 UTC
@Jonathan: This is still easily reproducible on current Fedora 27:

# docker create --name foo docker.io/openshift/origin:v3.7.1 
# docker cp foo:/usr/bin/ /tmp/oc-bin

[  186.781101] dockerd-current invoked oom-killer: gfp_mask=0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null),  order=0, oom_score_adj=-999

Comment 6 Jonathan Lebon 2018-02-23 15:30:40 UTC
Thanks Martin. Reproduced it as well here. My original reproducer no longer triggers a crash for some reason.

Comment 7 Antonio Murdaca 2018-03-13 14:00:50 UTC
(In reply to Jonathan Lebon from comment #6)
> Thanks Martin. Reproduced it as well here. My original reproducer no longer
> triggers a crash for some reason.

Jonathan, could you please re-test this removing the rhel-push-plugin? You can remove it by removing the line:

--authorization-plugin=rhel-push-plugin

from the docker.service (systemctl edit --full docker.service). Re-start the daemon and re-check.

Thanks

This should be related to https://github.com/openshift/origin/issues/18952 as well

Comment 8 Jonathan Lebon 2018-03-13 14:29:55 UTC
I can confirm that I cannot reproduce the issue when removing the --authorization-plugin flag.

Comment 9 Jonathan Lebon 2018-03-13 14:30:50 UTC
[root@jlebon-tmp2 ~]# rpm-ostree status
State: idle; auto updates disabled
Deployments:
* ostree://fedora-atomic:fedora/27/x86_64/atomic-host
                   Version: 27.93 (2018-02-25 20:49:19)
                    Commit: da0bd968610aa1e29c5bb37065649407fbbfffa53e63831afdadbd34a3b05327
              GPGSignature: Valid signature by 860E19B0AFA800A1751881A6F55E7430F5282EE4
[root@jlebon-tmp2 ~]# rpm -q docker
docker-1.13.1-44.git584d391.fc27.x86_64

Comment 10 Fedora End Of Life 2018-05-03 07:52:44 UTC
This message is a reminder that Fedora 26 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 26. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '26'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 26 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 11 Martin Pitt 2018-05-03 08:51:02 UTC
My test case does not work any more on Fedora 28, since it seems docker cp now got differently broken:

# docker cp foo:/usr/bin /tmp/oc-bin
invalid symlink "/tmp/oc-bin/bin/Mail" -> "../../bin/mailx"

Same result with -a or -L. So I won't bump this for now.

Comment 12 Fedora End Of Life 2018-05-29 11:38:43 UTC
Fedora 26 changed to end-of-life (EOL) status on 2018-05-29. Fedora 26
is no longer maintained, which means that it will not receive any
further security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.