Bug 1661443 - dockerd may leak memory resources if uncompressing a layer fails [NEEDINFO]
Summary: dockerd may leak memory resources if uncompressing a layer fails
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: docker
Version: 7.6
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Daniel Walsh
QA Contact: atomic-bugs@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1186913 1647729
TreeView+ depends on / blocked
 
Reported: 2018-12-21 09:22 UTC by Sergio Lopez
Modified: 2019-03-13 01:52 UTC (History)
9 users (show)

Fixed In Version: docker-1.13.1-93.gitb2f74b2.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-13 01:52:23 UTC
Target Upstream Version:
fshaikh: needinfo? (lsm5)


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2019:0487 None None None 2019-03-13 01:52:49 UTC

Description Sergio Lopez 2018-12-21 09:22:59 UTC
Description of problem:

This is the analysis of the core dump from a dockerd instance that was using more than 20GB or RAM:

 - While the type using the largest amount of memory isn't known, the number of objects gives us a hint about what it is:

# grep 855 viewcore-histogram.txt
    count     size      bytes type
      855 10027008 8573091840 unk10027008
      855   663552  567336960 unk663552
      855    73728   63037440 unk73728
      855    65536   56033280 unk65536
      855     4864    4158720 compress/flate.decompressor
      855     2688    2298240 [316]int
      855     1536    1313280 [1536]uint8
      855      704     601920 compress/gzip.Reader
      855      640     547200 github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.Reader
      855      208     177840 github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.Header
      855      160     136800 [19]int
      855      128     109440 crypto/sha256.digest
      855       96      82080 github.com/docker/docker/pkg/progress.Reader
      855       96      82080 hchan<github.com/docker/docker/pkg/progress.Progress>
      855       80      68400 github.com/docker/docker/vendor/golang.org/x/time/rate.Limiter
      855       32      27360 github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.regFileReader
      855       32      27360 github.com/docker/docker/pkg/ioutils.cancelReadCloser
      855       32      27360 github.com/docker/docker/pkg/ioutils.readCloserWrapper
      855        8       6840 io.PipeReader


 - We also have the same number of goroutines stuck in tar.regFileReader:

# grep regFileReader dockerd.txt |wc -l
855

 - These goroutines are waiting for someone to flush the other side of the pipe:

(dlv) goroutine 4778918
Switched from 0 to 4778918 (thread 2151)
(dlv) bt
 0  0x000000000043d3dc in runtime.gopark
    at /usr/lib/golang/src/runtime/proc.go:288
 1  0x000000000043d4ce in runtime.goparkunlock
    at /usr/lib/golang/src/runtime/proc.go:293
 2  0x00000000004501f0 in sync.runtime_notifyListWait
    at /usr/lib/golang/src/runtime/sema.go:507
 3  0x0000000000480df0 in sync.(*Cond).Wait
    at /usr/lib/golang/src/sync/cond.go:56
 4  0x0000000000485faa in io.(*pipe).write
    at /usr/lib/golang/src/io/pipe.go:90
 5  0x00000000004864ec in io.(*PipeWriter).Write
    at /usr/lib/golang/src/io/pipe.go:157
 6  0x00000000004851e1 in io.(*teeReader).Read
    at /usr/lib/golang/src/io/io.go:527
 7  0x0000000000892f34 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*regFileReader).Read
    at /usr/src/debug/docker-8633870faeff2c03dd6a75fdef9c2976379339c6/_build/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:960
 8  0x0000000000892d7c in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*Reader).Read
    at /usr/src/debug/docker-8633870faeff2c03dd6a75fdef9c2976379339c6/_build/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:945
 9  0x0000000000484ab3 in io.copyBuffer
    at /usr/lib/golang/src/io/io.go:392
10  0x0000000000484858 in io.Copy
    at /usr/lib/golang/src/io/io.go:362
11  0x0000000000863dd5 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/storage.(*bitBucketFilePutter).Put
    at /usr/src/debug/docker-8633870faeff2c03dd6a75fdef9c2976379339c6/_build/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/storage/getter.go:99
12  0x0000000000895975 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/asm.NewInputTarStream.func1
    at /usr/src/debug/docker-8633870faeff2c03dd6a75fdef9c2976379339c6/_build/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/asm/disassemble.go:87
13  0x000000000046cc21 in runtime.goexit
    at /usr/lib/golang/src/runtime/asm_amd64.s:2337

 - This is where the bug is located:

layer/layer_store.go:
    206 func (ls *layerStore) applyTar(tx MetadataTransaction, ts io.Reader, parent string, layer *roLayer        ) error {
    207         digester := digest.Canonical.New()
    208         tr := io.TeeReader(ts, digester.Hash())
    209 
    210         tsw, err := tx.TarSplitWriter(true)
    211         if err != nil {
    212                 return err
    213         }
    214         metaPacker := storage.NewJSONPacker(tsw)
    215         defer tsw.Close()
    216 
    217         // we're passing nil here for the file putter, because the ApplyDiff will
    218         // handle the extraction of the archive
    219         rdr, err := asm.NewInputTarStream(tr, metaPacker, nil)
    220         if err != nil {
    221                 return err
    222         }
    223 
    224         applySize, err := ls.driver.ApplyDiff(layer.cacheID, parent, rdr)
    225         if err != nil {

--> If ApplyDiff returns an error, the io.Copy bellow is not called, and the resources created by the call to NewInputTarStream are never released.

    226                 return err
    227         }
    228 
    229         // Discard trailing data but ensure metadata is picked up to reconstruct stream
    230         io.Copy(ioutil.Discard, rdr) // ignore error as reader may be closed
    231 
    232         layer.size = applySize
    233         layer.diffID = DiffID(digester.Digest())
    234 
    235         logrus.Debugf("Applied tar %s to %s, size: %d", layer.diffID, layer.cacheID, applySize)
    236 
    237         return nil
    238 }


Version-Release number of selected component (if applicable):

All, including upstream (moby).


How reproducible:

Always.


Steps to Reproduce:

I'm using dlv to simulate the storage issue:

[root@localhost binary-daemon]# dlv exec ./dockerd-dev
Type 'help' for list of commands.
(dlv) break applyTar
Breakpoint 1 set at 0xee94da for github.com/docker/docker/layer.(*layerStore).applyTar() /go/src/github.com/docker/docker/layer/layer_store.go:234
(dlv) c

(another terminal)

[root@localhost ~]# docker run --rm -ti busybox /bin/sh
Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox
90e01955edcd: Extracting    728kB/728kB

(back to dlv)

> github.com/docker/docker/layer.(*layerStore).applyTar() /go/src/github.com/docker/docker/layer/layer_store.go:234 (hits goroutine(142):1 total:1) (PC: 0xee94da)
Warning: debugging optimized function
   229:		ls.mounts[ml.name] = ml
   230:	
   231:		return nil
   232:	}
   233:	
=> 234:	func (ls *layerStore) applyTar(tx *fileMetadataTransaction, ts io.Reader, parent string, layer *roLayer) error {
   235:		digester := digest.Canonical.Digester()
   236:		tr := io.TeeReader(ts, digester.Hash())
   237:	
   238:		rdr := tr
   239:		if ls.useTarSplit {
(dlv) n
(...)
> github.com/docker/docker/layer.(*layerStore).applyTar() /go/src/github.com/docker/docker/layer/layer_store.go:255 (PC: 0xee95d6)
Warning: debugging optimized function
   250:			if err != nil {
   251:				return err
   252:			}
   253:		}
   254:	
=> 255:		applySize, err := ls.driver.ApplyDiff(layer.cacheID, parent, rdr)
   256:		if err != nil {
   257:			return err
   258:		}
   259:	
   260:		// Discard trailing data but ensure metadata is picked up to reconstruct stream
(dlv) p layer.cacheID
"b4e6ddae7f5cace7bc165d0bdfe3efcc1650eb61d335806ea293e61391910b49"

(just another terminal more)

[root@localhost moby]# # Make sure extracting the tar will fail...
[root@localhost moby]# mount -t tmpfs -o ro tmpfs /var/lib/docker/overlay2/b4e6ddae7f5cace7bc165d0bdfe3efcc1650eb61d335806ea293e61391910b49

(back to dlv)

(dlv) c
INFO[2018-12-21T09:56:56.454866823+01:00] Attempting next endpoint for pull after error: failed to register layer: mkdir /var/lib/docker/overlay2/b4e6ddae7f5cace7bc165d0bdfe3efcc1650eb61d335806ea293e61391910b49/diff: read-only file system 
(dlv) goroutine 196
Switched from 0 to 196 (thread 15923)
(dlv) bt
 0  0x000000000042f821 in runtime.gopark
    at /usr/local/go/src/runtime/proc.go:303
 1  0x000000000043fb3c in runtime.selectgo
    at /usr/local/go/src/runtime/select.go:313
 2  0x000000000046a78e in io.(*pipe).Write
    at /usr/local/go/src/io/pipe.go:87
 3  0x000000000046ac3e in io.(*PipeWriter).Write
    at /usr/local/go/src/io/pipe.go:153
 4  0x00000000004695c3 in io.(*teeReader).Read
    at /usr/local/go/src/io/io.go:537
 5  0x00000000004688ba in io.ReadAtLeast
    at /usr/local/go/src/io/io.go:310
 6  0x0000000000468a5a in io.ReadFull
    at /usr/local/go/src/io/io.go:329
 7  0x0000000000edc99e in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*Reader).readHeader
    at /go/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:389
 8  0x0000000000edab02 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*Reader).next
    at /go/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:111
 9  0x0000000000eda894 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*Reader).Next
    at /go/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:74
10  0x0000000000ee3c6e in github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/asm.NewInputTarStream.func1
    at /go/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/asm/disassemble.go:49
11  0x000000000045e561 in runtime.goexit
    at /usr/local/go/src/runtime/asm_amd64.s:1333

Without the patch, we get a stuck goroutine as shown above. With it, we get the same error (as expected), but without the stuck goroutine.


Additional info:

I've created a PR for upstream (moby):

 - https://github.com/moby/moby/pull/38413

If it's accepted, I'll create another PR for downstream (projectatomic)

Comment 2 Sergio Lopez 2018-12-24 08:16:03 UTC
The PR for upstream (moby) has been merged.

I've just created a PR for the backport downstream (projectatomic):

 - https://github.com/projectatomic/docker/pull/335

Comment 4 Daniel Walsh 2019-01-07 13:09:14 UTC
Lokesh and Franticek 
can we get out a new version of docker?

Comment 11 errata-xmlrpc 2019-03-13 01:52:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0487


Note You need to log in before you can comment on or make changes to this bug.