Bug 1314463

Summary: Core dump tailing container logs
Product: [Fedora] Fedora Reporter: Andy Goldstein <agoldste>
Component: dockerAssignee: Nalin Dahyabhai <nalin>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 23CC: adimania, admiller, amurdaca, dwalsh, ichavero, jcajka, jchaloup, lsm5, marianne, miminar, nalin, vbatts
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: docker-1.9.1-9.gitee06d03.fc23 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-26 20:58:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
core dump none

Description Andy Goldstein 2016-03-03 16:53:15 UTC
Description of problem: Docker panicked and core dumped while tailing container logs


Version-Release number of selected component (if applicable): docker-1.10.2-6.git0f5ac89.fc23.x86_64


How reproducible: sporadic


Steps to Reproduce:
1. Start 3 OpenShift builds simultaneously
2. Run `oc logs bc/<build config name>` for each build config simultaneously


Actual results: Docker panicked and core dumped


Expected results: no panic / no core


Additional info: here's the goroutine that panicked:

Mar 03 11:22:49 f23 docker[6551]: fatal error: unexpected signal during runtime execution
Mar 03 11:22:49 f23 docker[6551]: [signal 0xb code=0x1 addr=0x7fde873af0a0 pc=0x7fdfc32ab692]
Mar 03 11:22:49 f23 docker[6551]: runtime stack:
Mar 03 11:22:49 f23 docker[6551]: runtime.throw(0x1849e00, 0x2a)
Mar 03 11:22:49 f23 docker[6551]: /usr/lib/golang/src/runtime/panic.go:527 +0x90 fp=0x7fdf6cff89b0 sp=0x7fdf6cff8998
Mar 03 11:22:49 f23 docker[6551]: runtime.sigpanic()
Mar 03 11:22:49 f23 docker[6551]: /usr/lib/golang/src/runtime/sigpanic_unix.go:12 +0x5a fp=0x7fdf6cff8a00 sp=0x7fdf6cff89b0
Mar 03 11:22:49 f23 docker[6551]: goroutine 9678 [syscall, locked to thread]:
Mar 03 11:22:49 f23 docker[6551]: runtime.cgocall(0x1072330, 0xc821e1bd98, 0x0)
Mar 03 11:22:49 f23 docker[6551]: /usr/lib/golang/src/runtime/cgocall.go:120 +0x11b fp=0xc821e1bd68 sp=0xc821e1bd38
Mar 03 11:22:49 f23 docker[6551]: github.com/docker/docker/daemon/logger/journald._Cfunc_sd_journal_next(0x7fdf20000910, 0xc800000000)
Mar 03 11:22:49 f23 docker[6551]: github.com/docker/docker/daemon/logger/journald/_obj/_cgo_gotypes.go:191 +0x35 fp=0xc821e1bd98 sp=0xc821e1bd68
Mar 03 11:22:49 f23 docker[6551]: github.com/docker/docker/daemon/logger/journald.(*journald).drainJournal(0xc821b11bc0, 0xc82147d700, 0x0, 0xc800000000, 0x0, 0xffffffffffffffff, 0xc821386101, 0x7fdf20000910, 0xc821652180, 0x7b, ...)
Mar 03 11:22:49 f23 docker[6551]: /builddir/build/BUILD/docker-0f5ac89062e1f80bd87e7db9a94859adf7a188a7/_build/src/github.com/docker/docker/daemon/logger/journald/read.go:164 +0x615 fp=0xc821e1bef0 sp=0xc821e1bd98
Mar 03 11:22:49 f23 docker[6551]: github.com/docker/docker/daemon/logger/journald.(*journald).followJournal.func1(0x7fdf20000910, 0x8e0000008d, 0xc822681410, 0xc821b11bc0, 0xc82147d700, 0x0, 0xc800000000, 0x0, 0xffffffffffffffff, 0xc821c5c601)
Mar 03 11:22:49 f23 docker[6551]: /builddir/build/BUILD/docker-0f5ac89062e1f80bd87e7db9a94859adf7a188a7/_build/src/github.com/docker/docker/daemon/logger/journald/read.go:180 +0xc4 fp=0xc821e1bf70 sp=0xc821e1bef0
Mar 03 11:22:49 f23 docker[6551]: runtime.goexit()
Mar 03 11:22:49 f23 docker[6551]: /usr/lib/golang/src/runtime/asm_amd64.s:1721 +0x1 fp=0xc821e1bf78 sp=0xc821e1bf70
Mar 03 11:22:49 f23 docker[6551]: created by github.com/docker/docker/daemon/logger/journald.(*journald).followJournal
Mar 03 11:22:49 f23 docker[6551]: /builddir/build/BUILD/docker-0f5ac89062e1f80bd87e7db9a94859adf7a188a7/_build/src/github.com/docker/docker/daemon/logger/journald/read.go:187 +0xf0

Comment 1 Andy Goldstein 2016-03-03 16:56:24 UTC
Created attachment 1132865 [details]
core dump

Comment 2 Andy Goldstein 2016-03-03 16:57:44 UTC
Note that while I found this when testing OpenShift, it's probably possible to reproduce just by running multiple 'docker logs -f' commands simultaneously.

Comment 3 Daniel Walsh 2016-03-03 16:59:45 UTC
Could you try this without using the journald back end to show see if this happens on the standard backend?

Comment 4 Daniel Walsh 2016-03-03 17:00:41 UTC
If it is we can open an issue with this upstream.

Comment 5 Andy Goldstein 2016-03-03 18:07:05 UTC
Doesn't this trace point to an issue with docker+journald specifically?

Comment 6 Daniel Walsh 2016-03-03 19:00:56 UTC
Yes

Comment 7 Nalin Dahyabhai 2016-03-04 16:32:22 UTC
We don't have a debuginfo package for that build, and the panic trace isn't a lot to go on once it's in cgo.  I'll have to try to reproduce it using the origin package.

Comment 8 Nalin Dahyabhai 2016-03-18 13:24:30 UTC
There's a race condition or two in the reading logic.  See
https://github.com/docker/docker/pull/21311 for proposed fixes for them and 
https://github.com/docker/docker/pull/21309 for some improvements to error reporting in that area.

Comment 9 Nalin Dahyabhai 2016-04-26 20:58:35 UTC
The fixes are in as of docker-1.9.1-9.gitee06d03.fc23, so I'm marking this as fixed by that update.