Bug 1661443

Summary: dockerd may leak memory resources if uncompressing a layer fails
Product: Red Hat Enterprise Linux 7 Reporter: Sergio Lopez <slopezpa>
Component: dockerAssignee: Daniel Walsh <dwalsh>
Status: CLOSED ERRATA QA Contact: atomic-bugs <atomic-bugs>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.6CC: amurdaca, cshereme, dornelas, dwalsh, fkluknav, fshaikh, lsm5, pasik, suchaudh
Target Milestone: rcKeywords: Extras
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: docker-1.13.1-93.gitb2f74b2.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-13 01:52:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1186913, 1647729    

Description Sergio Lopez 2018-12-21 09:22:59 UTC
Description of problem:

This is the analysis of the core dump from a dockerd instance that was using more than 20GB or RAM:

 - While the type using the largest amount of memory isn't known, the number of objects gives us a hint about what it is:

# grep 855 viewcore-histogram.txt
    count     size      bytes type
      855 10027008 8573091840 unk10027008
      855   663552  567336960 unk663552
      855    73728   63037440 unk73728
      855    65536   56033280 unk65536
      855     4864    4158720 compress/flate.decompressor
      855     2688    2298240 [316]int
      855     1536    1313280 [1536]uint8
      855      704     601920 compress/gzip.Reader
      855      640     547200 github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.Reader
      855      208     177840 github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.Header
      855      160     136800 [19]int
      855      128     109440 crypto/sha256.digest
      855       96      82080 github.com/docker/docker/pkg/progress.Reader
      855       96      82080 hchan<github.com/docker/docker/pkg/progress.Progress>
      855       80      68400 github.com/docker/docker/vendor/golang.org/x/time/rate.Limiter
      855       32      27360 github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.regFileReader
      855       32      27360 github.com/docker/docker/pkg/ioutils.cancelReadCloser
      855       32      27360 github.com/docker/docker/pkg/ioutils.readCloserWrapper
      855        8       6840 io.PipeReader


 - We also have the same number of goroutines stuck in tar.regFileReader:

# grep regFileReader dockerd.txt |wc -l
855

 - These goroutines are waiting for someone to flush the other side of the pipe:

(dlv) goroutine 4778918
Switched from 0 to 4778918 (thread 2151)
(dlv) bt
 0  0x000000000043d3dc in runtime.gopark
    at /usr/lib/golang/src/runtime/proc.go:288
 1  0x000000000043d4ce in runtime.goparkunlock
    at /usr/lib/golang/src/runtime/proc.go:293
 2  0x00000000004501f0 in sync.runtime_notifyListWait
    at /usr/lib/golang/src/runtime/sema.go:507
 3  0x0000000000480df0 in sync.(*Cond).Wait
    at /usr/lib/golang/src/sync/cond.go:56
 4  0x0000000000485faa in io.(*pipe).write
    at /usr/lib/golang/src/io/pipe.go:90
 5  0x00000000004864ec in io.(*PipeWriter).Write
    at /usr/lib/golang/src/io/pipe.go:157
 6  0x00000000004851e1 in io.(*teeReader).Read
    at /usr/lib/golang/src/io/io.go:527
 7  0x0000000000892f34 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*regFileReader).Read
    at /usr/src/debug/docker-8633870faeff2c03dd6a75fdef9c2976379339c6/_build/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:960
 8  0x0000000000892d7c in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*Reader).Read
    at /usr/src/debug/docker-8633870faeff2c03dd6a75fdef9c2976379339c6/_build/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:945
 9  0x0000000000484ab3 in io.copyBuffer
    at /usr/lib/golang/src/io/io.go:392
10  0x0000000000484858 in io.Copy
    at /usr/lib/golang/src/io/io.go:362
11  0x0000000000863dd5 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/storage.(*bitBucketFilePutter).Put
    at /usr/src/debug/docker-8633870faeff2c03dd6a75fdef9c2976379339c6/_build/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/storage/getter.go:99
12  0x0000000000895975 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/asm.NewInputTarStream.func1
    at /usr/src/debug/docker-8633870faeff2c03dd6a75fdef9c2976379339c6/_build/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/asm/disassemble.go:87
13  0x000000000046cc21 in runtime.goexit
    at /usr/lib/golang/src/runtime/asm_amd64.s:2337

 - This is where the bug is located:

layer/layer_store.go:
    206 func (ls *layerStore) applyTar(tx MetadataTransaction, ts io.Reader, parent string, layer *roLayer        ) error {
    207         digester := digest.Canonical.New()
    208         tr := io.TeeReader(ts, digester.Hash())
    209 
    210         tsw, err := tx.TarSplitWriter(true)
    211         if err != nil {
    212                 return err
    213         }
    214         metaPacker := storage.NewJSONPacker(tsw)
    215         defer tsw.Close()
    216 
    217         // we're passing nil here for the file putter, because the ApplyDiff will
    218         // handle the extraction of the archive
    219         rdr, err := asm.NewInputTarStream(tr, metaPacker, nil)
    220         if err != nil {
    221                 return err
    222         }
    223 
    224         applySize, err := ls.driver.ApplyDiff(layer.cacheID, parent, rdr)
    225         if err != nil {

--> If ApplyDiff returns an error, the io.Copy bellow is not called, and the resources created by the call to NewInputTarStream are never released.

    226                 return err
    227         }
    228 
    229         // Discard trailing data but ensure metadata is picked up to reconstruct stream
    230         io.Copy(ioutil.Discard, rdr) // ignore error as reader may be closed
    231 
    232         layer.size = applySize
    233         layer.diffID = DiffID(digester.Digest())
    234 
    235         logrus.Debugf("Applied tar %s to %s, size: %d", layer.diffID, layer.cacheID, applySize)
    236 
    237         return nil
    238 }


Version-Release number of selected component (if applicable):

All, including upstream (moby).


How reproducible:

Always.


Steps to Reproduce:

I'm using dlv to simulate the storage issue:

[root@localhost binary-daemon]# dlv exec ./dockerd-dev
Type 'help' for list of commands.
(dlv) break applyTar
Breakpoint 1 set at 0xee94da for github.com/docker/docker/layer.(*layerStore).applyTar() /go/src/github.com/docker/docker/layer/layer_store.go:234
(dlv) c

(another terminal)

[root@localhost ~]# docker run --rm -ti busybox /bin/sh
Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox
90e01955edcd: Extracting    728kB/728kB

(back to dlv)

> github.com/docker/docker/layer.(*layerStore).applyTar() /go/src/github.com/docker/docker/layer/layer_store.go:234 (hits goroutine(142):1 total:1) (PC: 0xee94da)
Warning: debugging optimized function
   229:		ls.mounts[ml.name] = ml
   230:	
   231:		return nil
   232:	}
   233:	
=> 234:	func (ls *layerStore) applyTar(tx *fileMetadataTransaction, ts io.Reader, parent string, layer *roLayer) error {
   235:		digester := digest.Canonical.Digester()
   236:		tr := io.TeeReader(ts, digester.Hash())
   237:	
   238:		rdr := tr
   239:		if ls.useTarSplit {
(dlv) n
(...)
> github.com/docker/docker/layer.(*layerStore).applyTar() /go/src/github.com/docker/docker/layer/layer_store.go:255 (PC: 0xee95d6)
Warning: debugging optimized function
   250:			if err != nil {
   251:				return err
   252:			}
   253:		}
   254:	
=> 255:		applySize, err := ls.driver.ApplyDiff(layer.cacheID, parent, rdr)
   256:		if err != nil {
   257:			return err
   258:		}
   259:	
   260:		// Discard trailing data but ensure metadata is picked up to reconstruct stream
(dlv) p layer.cacheID
"b4e6ddae7f5cace7bc165d0bdfe3efcc1650eb61d335806ea293e61391910b49"

(just another terminal more)

[root@localhost moby]# # Make sure extracting the tar will fail...
[root@localhost moby]# mount -t tmpfs -o ro tmpfs /var/lib/docker/overlay2/b4e6ddae7f5cace7bc165d0bdfe3efcc1650eb61d335806ea293e61391910b49

(back to dlv)

(dlv) c
INFO[2018-12-21T09:56:56.454866823+01:00] Attempting next endpoint for pull after error: failed to register layer: mkdir /var/lib/docker/overlay2/b4e6ddae7f5cace7bc165d0bdfe3efcc1650eb61d335806ea293e61391910b49/diff: read-only file system 
(dlv) goroutine 196
Switched from 0 to 196 (thread 15923)
(dlv) bt
 0  0x000000000042f821 in runtime.gopark
    at /usr/local/go/src/runtime/proc.go:303
 1  0x000000000043fb3c in runtime.selectgo
    at /usr/local/go/src/runtime/select.go:313
 2  0x000000000046a78e in io.(*pipe).Write
    at /usr/local/go/src/io/pipe.go:87
 3  0x000000000046ac3e in io.(*PipeWriter).Write
    at /usr/local/go/src/io/pipe.go:153
 4  0x00000000004695c3 in io.(*teeReader).Read
    at /usr/local/go/src/io/io.go:537
 5  0x00000000004688ba in io.ReadAtLeast
    at /usr/local/go/src/io/io.go:310
 6  0x0000000000468a5a in io.ReadFull
    at /usr/local/go/src/io/io.go:329
 7  0x0000000000edc99e in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*Reader).readHeader
    at /go/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:389
 8  0x0000000000edab02 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*Reader).next
    at /go/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:111
 9  0x0000000000eda894 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*Reader).Next
    at /go/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:74
10  0x0000000000ee3c6e in github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/asm.NewInputTarStream.func1
    at /go/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/asm/disassemble.go:49
11  0x000000000045e561 in runtime.goexit
    at /usr/local/go/src/runtime/asm_amd64.s:1333

Without the patch, we get a stuck goroutine as shown above. With it, we get the same error (as expected), but without the stuck goroutine.


Additional info:

I've created a PR for upstream (moby):

 - https://github.com/moby/moby/pull/38413

If it's accepted, I'll create another PR for downstream (projectatomic)

Comment 2 Sergio Lopez 2018-12-24 08:16:03 UTC
The PR for upstream (moby) has been merged.

I've just created a PR for the backport downstream (projectatomic):

 - https://github.com/projectatomic/docker/pull/335

Comment 4 Daniel Walsh 2019-01-07 13:09:14 UTC
Lokesh and Franticek 
can we get out a new version of docker?

Comment 11 errata-xmlrpc 2019-03-13 01:52:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0487

Comment 12 Red Hat Bugzilla 2023-09-14 04:44:10 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days