Description of problem:
This is the analysis of the core dump from a dockerd instance that was using more than 20GB or RAM:
- While the type using the largest amount of memory isn't known, the number of objects gives us a hint about what it is:
# grep 855 viewcore-histogram.txt
count size bytes type
855 10027008 8573091840 unk10027008
855 663552 567336960 unk663552
855 73728 63037440 unk73728
855 65536 56033280 unk65536
855 4864 4158720 compress/flate.decompressor
855 2688 2298240 [316]int
855 1536 1313280 [1536]uint8
855 704 601920 compress/gzip.Reader
855 640 547200 github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.Reader
855 208 177840 github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.Header
855 160 136800 [19]int
855 128 109440 crypto/sha256.digest
855 96 82080 github.com/docker/docker/pkg/progress.Reader
855 96 82080 hchan<github.com/docker/docker/pkg/progress.Progress>
855 80 68400 github.com/docker/docker/vendor/golang.org/x/time/rate.Limiter
855 32 27360 github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.regFileReader
855 32 27360 github.com/docker/docker/pkg/ioutils.cancelReadCloser
855 32 27360 github.com/docker/docker/pkg/ioutils.readCloserWrapper
855 8 6840 io.PipeReader
- We also have the same number of goroutines stuck in tar.regFileReader:
# grep regFileReader dockerd.txt |wc -l
855
- These goroutines are waiting for someone to flush the other side of the pipe:
(dlv) goroutine 4778918
Switched from 0 to 4778918 (thread 2151)
(dlv) bt
0 0x000000000043d3dc in runtime.gopark
at /usr/lib/golang/src/runtime/proc.go:288
1 0x000000000043d4ce in runtime.goparkunlock
at /usr/lib/golang/src/runtime/proc.go:293
2 0x00000000004501f0 in sync.runtime_notifyListWait
at /usr/lib/golang/src/runtime/sema.go:507
3 0x0000000000480df0 in sync.(*Cond).Wait
at /usr/lib/golang/src/sync/cond.go:56
4 0x0000000000485faa in io.(*pipe).write
at /usr/lib/golang/src/io/pipe.go:90
5 0x00000000004864ec in io.(*PipeWriter).Write
at /usr/lib/golang/src/io/pipe.go:157
6 0x00000000004851e1 in io.(*teeReader).Read
at /usr/lib/golang/src/io/io.go:527
7 0x0000000000892f34 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*regFileReader).Read
at /usr/src/debug/docker-8633870faeff2c03dd6a75fdef9c2976379339c6/_build/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:960
8 0x0000000000892d7c in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*Reader).Read
at /usr/src/debug/docker-8633870faeff2c03dd6a75fdef9c2976379339c6/_build/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:945
9 0x0000000000484ab3 in io.copyBuffer
at /usr/lib/golang/src/io/io.go:392
10 0x0000000000484858 in io.Copy
at /usr/lib/golang/src/io/io.go:362
11 0x0000000000863dd5 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/storage.(*bitBucketFilePutter).Put
at /usr/src/debug/docker-8633870faeff2c03dd6a75fdef9c2976379339c6/_build/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/storage/getter.go:99
12 0x0000000000895975 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/asm.NewInputTarStream.func1
at /usr/src/debug/docker-8633870faeff2c03dd6a75fdef9c2976379339c6/_build/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/asm/disassemble.go:87
13 0x000000000046cc21 in runtime.goexit
at /usr/lib/golang/src/runtime/asm_amd64.s:2337
- This is where the bug is located:
layer/layer_store.go:
206 func (ls *layerStore) applyTar(tx MetadataTransaction, ts io.Reader, parent string, layer *roLayer ) error {
207 digester := digest.Canonical.New()
208 tr := io.TeeReader(ts, digester.Hash())
209
210 tsw, err := tx.TarSplitWriter(true)
211 if err != nil {
212 return err
213 }
214 metaPacker := storage.NewJSONPacker(tsw)
215 defer tsw.Close()
216
217 // we're passing nil here for the file putter, because the ApplyDiff will
218 // handle the extraction of the archive
219 rdr, err := asm.NewInputTarStream(tr, metaPacker, nil)
220 if err != nil {
221 return err
222 }
223
224 applySize, err := ls.driver.ApplyDiff(layer.cacheID, parent, rdr)
225 if err != nil {
--> If ApplyDiff returns an error, the io.Copy bellow is not called, and the resources created by the call to NewInputTarStream are never released.
226 return err
227 }
228
229 // Discard trailing data but ensure metadata is picked up to reconstruct stream
230 io.Copy(ioutil.Discard, rdr) // ignore error as reader may be closed
231
232 layer.size = applySize
233 layer.diffID = DiffID(digester.Digest())
234
235 logrus.Debugf("Applied tar %s to %s, size: %d", layer.diffID, layer.cacheID, applySize)
236
237 return nil
238 }
Version-Release number of selected component (if applicable):
All, including upstream (moby).
How reproducible:
Always.
Steps to Reproduce:
I'm using dlv to simulate the storage issue:
[root@localhost binary-daemon]# dlv exec ./dockerd-dev
Type 'help' for list of commands.
(dlv) break applyTar
Breakpoint 1 set at 0xee94da for github.com/docker/docker/layer.(*layerStore).applyTar() /go/src/github.com/docker/docker/layer/layer_store.go:234
(dlv) c
(another terminal)
[root@localhost ~]# docker run --rm -ti busybox /bin/sh
Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox
90e01955edcd: Extracting 728kB/728kB
(back to dlv)
> github.com/docker/docker/layer.(*layerStore).applyTar() /go/src/github.com/docker/docker/layer/layer_store.go:234 (hits goroutine(142):1 total:1) (PC: 0xee94da)
Warning: debugging optimized function
229: ls.mounts[ml.name] = ml
230:
231: return nil
232: }
233:
=> 234: func (ls *layerStore) applyTar(tx *fileMetadataTransaction, ts io.Reader, parent string, layer *roLayer) error {
235: digester := digest.Canonical.Digester()
236: tr := io.TeeReader(ts, digester.Hash())
237:
238: rdr := tr
239: if ls.useTarSplit {
(dlv) n
(...)
> github.com/docker/docker/layer.(*layerStore).applyTar() /go/src/github.com/docker/docker/layer/layer_store.go:255 (PC: 0xee95d6)
Warning: debugging optimized function
250: if err != nil {
251: return err
252: }
253: }
254:
=> 255: applySize, err := ls.driver.ApplyDiff(layer.cacheID, parent, rdr)
256: if err != nil {
257: return err
258: }
259:
260: // Discard trailing data but ensure metadata is picked up to reconstruct stream
(dlv) p layer.cacheID
"b4e6ddae7f5cace7bc165d0bdfe3efcc1650eb61d335806ea293e61391910b49"
(just another terminal more)
[root@localhost moby]# # Make sure extracting the tar will fail...
[root@localhost moby]# mount -t tmpfs -o ro tmpfs /var/lib/docker/overlay2/b4e6ddae7f5cace7bc165d0bdfe3efcc1650eb61d335806ea293e61391910b49
(back to dlv)
(dlv) c
INFO[2018-12-21T09:56:56.454866823+01:00] Attempting next endpoint for pull after error: failed to register layer: mkdir /var/lib/docker/overlay2/b4e6ddae7f5cace7bc165d0bdfe3efcc1650eb61d335806ea293e61391910b49/diff: read-only file system
(dlv) goroutine 196
Switched from 0 to 196 (thread 15923)
(dlv) bt
0 0x000000000042f821 in runtime.gopark
at /usr/local/go/src/runtime/proc.go:303
1 0x000000000043fb3c in runtime.selectgo
at /usr/local/go/src/runtime/select.go:313
2 0x000000000046a78e in io.(*pipe).Write
at /usr/local/go/src/io/pipe.go:87
3 0x000000000046ac3e in io.(*PipeWriter).Write
at /usr/local/go/src/io/pipe.go:153
4 0x00000000004695c3 in io.(*teeReader).Read
at /usr/local/go/src/io/io.go:537
5 0x00000000004688ba in io.ReadAtLeast
at /usr/local/go/src/io/io.go:310
6 0x0000000000468a5a in io.ReadFull
at /usr/local/go/src/io/io.go:329
7 0x0000000000edc99e in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*Reader).readHeader
at /go/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:389
8 0x0000000000edab02 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*Reader).next
at /go/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:111
9 0x0000000000eda894 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*Reader).Next
at /go/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:74
10 0x0000000000ee3c6e in github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/asm.NewInputTarStream.func1
at /go/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/asm/disassemble.go:49
11 0x000000000045e561 in runtime.goexit
at /usr/local/go/src/runtime/asm_amd64.s:1333
Without the patch, we get a stuck goroutine as shown above. With it, we get the same error (as expected), but without the stuck goroutine.
Additional info:
I've created a PR for upstream (moby):
- https://github.com/moby/moby/pull/38413
If it's accepted, I'll create another PR for downstream (projectatomic)
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2019:0487
Comment 12Red Hat Bugzilla
2023-09-14 04:44:10 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days