Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
This is the analysis of the core dump from a dockerd instance that was using more than 20GB or RAM:
- While the type using the largest amount of memory isn't known, the number of objects gives us a hint about what it is:
# grep 855 viewcore-histogram.txt
count size bytes type
855 10027008 8573091840 unk10027008
855 663552 567336960 unk663552
855 73728 63037440 unk73728
855 65536 56033280 unk65536
855 4864 4158720 compress/flate.decompressor
855 2688 2298240 [316]int
855 1536 1313280 [1536]uint8
855 704 601920 compress/gzip.Reader
855 640 547200 github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.Reader
855 208 177840 github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.Header
855 160 136800 [19]int
855 128 109440 crypto/sha256.digest
855 96 82080 github.com/docker/docker/pkg/progress.Reader
855 96 82080 hchan<github.com/docker/docker/pkg/progress.Progress>
855 80 68400 github.com/docker/docker/vendor/golang.org/x/time/rate.Limiter
855 32 27360 github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.regFileReader
855 32 27360 github.com/docker/docker/pkg/ioutils.cancelReadCloser
855 32 27360 github.com/docker/docker/pkg/ioutils.readCloserWrapper
855 8 6840 io.PipeReader
- We also have the same number of goroutines stuck in tar.regFileReader:
# grep regFileReader dockerd.txt |wc -l
855
- These goroutines are waiting for someone to flush the other side of the pipe:
(dlv) goroutine 4778918
Switched from 0 to 4778918 (thread 2151)
(dlv) bt
0 0x000000000043d3dc in runtime.gopark
at /usr/lib/golang/src/runtime/proc.go:288
1 0x000000000043d4ce in runtime.goparkunlock
at /usr/lib/golang/src/runtime/proc.go:293
2 0x00000000004501f0 in sync.runtime_notifyListWait
at /usr/lib/golang/src/runtime/sema.go:507
3 0x0000000000480df0 in sync.(*Cond).Wait
at /usr/lib/golang/src/sync/cond.go:56
4 0x0000000000485faa in io.(*pipe).write
at /usr/lib/golang/src/io/pipe.go:90
5 0x00000000004864ec in io.(*PipeWriter).Write
at /usr/lib/golang/src/io/pipe.go:157
6 0x00000000004851e1 in io.(*teeReader).Read
at /usr/lib/golang/src/io/io.go:527
7 0x0000000000892f34 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*regFileReader).Read
at /usr/src/debug/docker-8633870faeff2c03dd6a75fdef9c2976379339c6/_build/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:960
8 0x0000000000892d7c in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*Reader).Read
at /usr/src/debug/docker-8633870faeff2c03dd6a75fdef9c2976379339c6/_build/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:945
9 0x0000000000484ab3 in io.copyBuffer
at /usr/lib/golang/src/io/io.go:392
10 0x0000000000484858 in io.Copy
at /usr/lib/golang/src/io/io.go:362
11 0x0000000000863dd5 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/storage.(*bitBucketFilePutter).Put
at /usr/src/debug/docker-8633870faeff2c03dd6a75fdef9c2976379339c6/_build/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/storage/getter.go:99
12 0x0000000000895975 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/asm.NewInputTarStream.func1
at /usr/src/debug/docker-8633870faeff2c03dd6a75fdef9c2976379339c6/_build/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/asm/disassemble.go:87
13 0x000000000046cc21 in runtime.goexit
at /usr/lib/golang/src/runtime/asm_amd64.s:2337
- This is where the bug is located:
layer/layer_store.go:
206 func (ls *layerStore) applyTar(tx MetadataTransaction, ts io.Reader, parent string, layer *roLayer ) error {
207 digester := digest.Canonical.New()
208 tr := io.TeeReader(ts, digester.Hash())
209
210 tsw, err := tx.TarSplitWriter(true)
211 if err != nil {
212 return err
213 }
214 metaPacker := storage.NewJSONPacker(tsw)
215 defer tsw.Close()
216
217 // we're passing nil here for the file putter, because the ApplyDiff will
218 // handle the extraction of the archive
219 rdr, err := asm.NewInputTarStream(tr, metaPacker, nil)
220 if err != nil {
221 return err
222 }
223
224 applySize, err := ls.driver.ApplyDiff(layer.cacheID, parent, rdr)
225 if err != nil {
--> If ApplyDiff returns an error, the io.Copy bellow is not called, and the resources created by the call to NewInputTarStream are never released.
226 return err
227 }
228
229 // Discard trailing data but ensure metadata is picked up to reconstruct stream
230 io.Copy(ioutil.Discard, rdr) // ignore error as reader may be closed
231
232 layer.size = applySize
233 layer.diffID = DiffID(digester.Digest())
234
235 logrus.Debugf("Applied tar %s to %s, size: %d", layer.diffID, layer.cacheID, applySize)
236
237 return nil
238 }
Version-Release number of selected component (if applicable):
All, including upstream (moby).
How reproducible:
Always.
Steps to Reproduce:
I'm using dlv to simulate the storage issue:
[root@localhost binary-daemon]# dlv exec ./dockerd-dev
Type 'help' for list of commands.
(dlv) break applyTar
Breakpoint 1 set at 0xee94da for github.com/docker/docker/layer.(*layerStore).applyTar() /go/src/github.com/docker/docker/layer/layer_store.go:234
(dlv) c
(another terminal)
[root@localhost ~]# docker run --rm -ti busybox /bin/sh
Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox
90e01955edcd: Extracting 728kB/728kB
(back to dlv)
> github.com/docker/docker/layer.(*layerStore).applyTar() /go/src/github.com/docker/docker/layer/layer_store.go:234 (hits goroutine(142):1 total:1) (PC: 0xee94da)
Warning: debugging optimized function
229: ls.mounts[ml.name] = ml
230:
231: return nil
232: }
233:
=> 234: func (ls *layerStore) applyTar(tx *fileMetadataTransaction, ts io.Reader, parent string, layer *roLayer) error {
235: digester := digest.Canonical.Digester()
236: tr := io.TeeReader(ts, digester.Hash())
237:
238: rdr := tr
239: if ls.useTarSplit {
(dlv) n
(...)
> github.com/docker/docker/layer.(*layerStore).applyTar() /go/src/github.com/docker/docker/layer/layer_store.go:255 (PC: 0xee95d6)
Warning: debugging optimized function
250: if err != nil {
251: return err
252: }
253: }
254:
=> 255: applySize, err := ls.driver.ApplyDiff(layer.cacheID, parent, rdr)
256: if err != nil {
257: return err
258: }
259:
260: // Discard trailing data but ensure metadata is picked up to reconstruct stream
(dlv) p layer.cacheID
"b4e6ddae7f5cace7bc165d0bdfe3efcc1650eb61d335806ea293e61391910b49"
(just another terminal more)
[root@localhost moby]# # Make sure extracting the tar will fail...
[root@localhost moby]# mount -t tmpfs -o ro tmpfs /var/lib/docker/overlay2/b4e6ddae7f5cace7bc165d0bdfe3efcc1650eb61d335806ea293e61391910b49
(back to dlv)
(dlv) c
INFO[2018-12-21T09:56:56.454866823+01:00] Attempting next endpoint for pull after error: failed to register layer: mkdir /var/lib/docker/overlay2/b4e6ddae7f5cace7bc165d0bdfe3efcc1650eb61d335806ea293e61391910b49/diff: read-only file system
(dlv) goroutine 196
Switched from 0 to 196 (thread 15923)
(dlv) bt
0 0x000000000042f821 in runtime.gopark
at /usr/local/go/src/runtime/proc.go:303
1 0x000000000043fb3c in runtime.selectgo
at /usr/local/go/src/runtime/select.go:313
2 0x000000000046a78e in io.(*pipe).Write
at /usr/local/go/src/io/pipe.go:87
3 0x000000000046ac3e in io.(*PipeWriter).Write
at /usr/local/go/src/io/pipe.go:153
4 0x00000000004695c3 in io.(*teeReader).Read
at /usr/local/go/src/io/io.go:537
5 0x00000000004688ba in io.ReadAtLeast
at /usr/local/go/src/io/io.go:310
6 0x0000000000468a5a in io.ReadFull
at /usr/local/go/src/io/io.go:329
7 0x0000000000edc99e in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*Reader).readHeader
at /go/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:389
8 0x0000000000edab02 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*Reader).next
at /go/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:111
9 0x0000000000eda894 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*Reader).Next
at /go/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:74
10 0x0000000000ee3c6e in github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/asm.NewInputTarStream.func1
at /go/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/asm/disassemble.go:49
11 0x000000000045e561 in runtime.goexit
at /usr/local/go/src/runtime/asm_amd64.s:1333
Without the patch, we get a stuck goroutine as shown above. With it, we get the same error (as expected), but without the stuck goroutine.
Additional info:
I've created a PR for upstream (moby):
- https://github.com/moby/moby/pull/38413
If it's accepted, I'll create another PR for downstream (projectatomic)
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2019:0487
Comment 12Red Hat Bugzilla
2023-09-14 04:44:10 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days
Description of problem: This is the analysis of the core dump from a dockerd instance that was using more than 20GB or RAM: - While the type using the largest amount of memory isn't known, the number of objects gives us a hint about what it is: # grep 855 viewcore-histogram.txt count size bytes type 855 10027008 8573091840 unk10027008 855 663552 567336960 unk663552 855 73728 63037440 unk73728 855 65536 56033280 unk65536 855 4864 4158720 compress/flate.decompressor 855 2688 2298240 [316]int 855 1536 1313280 [1536]uint8 855 704 601920 compress/gzip.Reader 855 640 547200 github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.Reader 855 208 177840 github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.Header 855 160 136800 [19]int 855 128 109440 crypto/sha256.digest 855 96 82080 github.com/docker/docker/pkg/progress.Reader 855 96 82080 hchan<github.com/docker/docker/pkg/progress.Progress> 855 80 68400 github.com/docker/docker/vendor/golang.org/x/time/rate.Limiter 855 32 27360 github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.regFileReader 855 32 27360 github.com/docker/docker/pkg/ioutils.cancelReadCloser 855 32 27360 github.com/docker/docker/pkg/ioutils.readCloserWrapper 855 8 6840 io.PipeReader - We also have the same number of goroutines stuck in tar.regFileReader: # grep regFileReader dockerd.txt |wc -l 855 - These goroutines are waiting for someone to flush the other side of the pipe: (dlv) goroutine 4778918 Switched from 0 to 4778918 (thread 2151) (dlv) bt 0 0x000000000043d3dc in runtime.gopark at /usr/lib/golang/src/runtime/proc.go:288 1 0x000000000043d4ce in runtime.goparkunlock at /usr/lib/golang/src/runtime/proc.go:293 2 0x00000000004501f0 in sync.runtime_notifyListWait at /usr/lib/golang/src/runtime/sema.go:507 3 0x0000000000480df0 in sync.(*Cond).Wait at /usr/lib/golang/src/sync/cond.go:56 4 0x0000000000485faa in io.(*pipe).write at /usr/lib/golang/src/io/pipe.go:90 5 0x00000000004864ec in io.(*PipeWriter).Write at /usr/lib/golang/src/io/pipe.go:157 6 0x00000000004851e1 in io.(*teeReader).Read at /usr/lib/golang/src/io/io.go:527 7 0x0000000000892f34 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*regFileReader).Read at /usr/src/debug/docker-8633870faeff2c03dd6a75fdef9c2976379339c6/_build/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:960 8 0x0000000000892d7c in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*Reader).Read at /usr/src/debug/docker-8633870faeff2c03dd6a75fdef9c2976379339c6/_build/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:945 9 0x0000000000484ab3 in io.copyBuffer at /usr/lib/golang/src/io/io.go:392 10 0x0000000000484858 in io.Copy at /usr/lib/golang/src/io/io.go:362 11 0x0000000000863dd5 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/storage.(*bitBucketFilePutter).Put at /usr/src/debug/docker-8633870faeff2c03dd6a75fdef9c2976379339c6/_build/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/storage/getter.go:99 12 0x0000000000895975 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/asm.NewInputTarStream.func1 at /usr/src/debug/docker-8633870faeff2c03dd6a75fdef9c2976379339c6/_build/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/asm/disassemble.go:87 13 0x000000000046cc21 in runtime.goexit at /usr/lib/golang/src/runtime/asm_amd64.s:2337 - This is where the bug is located: layer/layer_store.go: 206 func (ls *layerStore) applyTar(tx MetadataTransaction, ts io.Reader, parent string, layer *roLayer ) error { 207 digester := digest.Canonical.New() 208 tr := io.TeeReader(ts, digester.Hash()) 209 210 tsw, err := tx.TarSplitWriter(true) 211 if err != nil { 212 return err 213 } 214 metaPacker := storage.NewJSONPacker(tsw) 215 defer tsw.Close() 216 217 // we're passing nil here for the file putter, because the ApplyDiff will 218 // handle the extraction of the archive 219 rdr, err := asm.NewInputTarStream(tr, metaPacker, nil) 220 if err != nil { 221 return err 222 } 223 224 applySize, err := ls.driver.ApplyDiff(layer.cacheID, parent, rdr) 225 if err != nil { --> If ApplyDiff returns an error, the io.Copy bellow is not called, and the resources created by the call to NewInputTarStream are never released. 226 return err 227 } 228 229 // Discard trailing data but ensure metadata is picked up to reconstruct stream 230 io.Copy(ioutil.Discard, rdr) // ignore error as reader may be closed 231 232 layer.size = applySize 233 layer.diffID = DiffID(digester.Digest()) 234 235 logrus.Debugf("Applied tar %s to %s, size: %d", layer.diffID, layer.cacheID, applySize) 236 237 return nil 238 } Version-Release number of selected component (if applicable): All, including upstream (moby). How reproducible: Always. Steps to Reproduce: I'm using dlv to simulate the storage issue: [root@localhost binary-daemon]# dlv exec ./dockerd-dev Type 'help' for list of commands. (dlv) break applyTar Breakpoint 1 set at 0xee94da for github.com/docker/docker/layer.(*layerStore).applyTar() /go/src/github.com/docker/docker/layer/layer_store.go:234 (dlv) c (another terminal) [root@localhost ~]# docker run --rm -ti busybox /bin/sh Unable to find image 'busybox:latest' locally latest: Pulling from library/busybox 90e01955edcd: Extracting 728kB/728kB (back to dlv) > github.com/docker/docker/layer.(*layerStore).applyTar() /go/src/github.com/docker/docker/layer/layer_store.go:234 (hits goroutine(142):1 total:1) (PC: 0xee94da) Warning: debugging optimized function 229: ls.mounts[ml.name] = ml 230: 231: return nil 232: } 233: => 234: func (ls *layerStore) applyTar(tx *fileMetadataTransaction, ts io.Reader, parent string, layer *roLayer) error { 235: digester := digest.Canonical.Digester() 236: tr := io.TeeReader(ts, digester.Hash()) 237: 238: rdr := tr 239: if ls.useTarSplit { (dlv) n (...) > github.com/docker/docker/layer.(*layerStore).applyTar() /go/src/github.com/docker/docker/layer/layer_store.go:255 (PC: 0xee95d6) Warning: debugging optimized function 250: if err != nil { 251: return err 252: } 253: } 254: => 255: applySize, err := ls.driver.ApplyDiff(layer.cacheID, parent, rdr) 256: if err != nil { 257: return err 258: } 259: 260: // Discard trailing data but ensure metadata is picked up to reconstruct stream (dlv) p layer.cacheID "b4e6ddae7f5cace7bc165d0bdfe3efcc1650eb61d335806ea293e61391910b49" (just another terminal more) [root@localhost moby]# # Make sure extracting the tar will fail... [root@localhost moby]# mount -t tmpfs -o ro tmpfs /var/lib/docker/overlay2/b4e6ddae7f5cace7bc165d0bdfe3efcc1650eb61d335806ea293e61391910b49 (back to dlv) (dlv) c INFO[2018-12-21T09:56:56.454866823+01:00] Attempting next endpoint for pull after error: failed to register layer: mkdir /var/lib/docker/overlay2/b4e6ddae7f5cace7bc165d0bdfe3efcc1650eb61d335806ea293e61391910b49/diff: read-only file system (dlv) goroutine 196 Switched from 0 to 196 (thread 15923) (dlv) bt 0 0x000000000042f821 in runtime.gopark at /usr/local/go/src/runtime/proc.go:303 1 0x000000000043fb3c in runtime.selectgo at /usr/local/go/src/runtime/select.go:313 2 0x000000000046a78e in io.(*pipe).Write at /usr/local/go/src/io/pipe.go:87 3 0x000000000046ac3e in io.(*PipeWriter).Write at /usr/local/go/src/io/pipe.go:153 4 0x00000000004695c3 in io.(*teeReader).Read at /usr/local/go/src/io/io.go:537 5 0x00000000004688ba in io.ReadAtLeast at /usr/local/go/src/io/io.go:310 6 0x0000000000468a5a in io.ReadFull at /usr/local/go/src/io/io.go:329 7 0x0000000000edc99e in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*Reader).readHeader at /go/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:389 8 0x0000000000edab02 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*Reader).next at /go/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:111 9 0x0000000000eda894 in github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar.(*Reader).Next at /go/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/archive/tar/reader.go:74 10 0x0000000000ee3c6e in github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/asm.NewInputTarStream.func1 at /go/src/github.com/docker/docker/vendor/github.com/vbatts/tar-split/tar/asm/disassemble.go:49 11 0x000000000045e561 in runtime.goexit at /usr/local/go/src/runtime/asm_amd64.s:1333 Without the patch, we get a stuck goroutine as shown above. With it, we get the same error (as expected), but without the stuck goroutine. Additional info: I've created a PR for upstream (moby): - https://github.com/moby/moby/pull/38413 If it's accepted, I'll create another PR for downstream (projectatomic)