Bug 1309149

Summary: Docker segmentation fault on sched_getaffinity syscall
Product: [Fedora] Fedora Reporter: Mairi Dulaney <jdulaney>
Component: glibcAssignee: Carlos O'Donell <codonell>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: rawhideCC: adimania, admiller, amurdaca, arjun, codonell, dj, dwalsh, fweimer, ichavero, jakub, jcajka, jchaloup, jdulaney, law, lsm5, marianne, mfabian, miminar, pfrankli, siddhesh, vbatts
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-18 00:02:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
docker-strace
none
docker-coredump.tar.gz none

Description Mairi Dulaney 2016-02-17 01:15:58 UTC
Created attachment 1127783 [details]
docker-strace

Description of problem:
Running docker in a fully up-to-date rawhide vm on F23 host, docker is segfaulting very early in startup on sched_getaffinity(0, 8192, [0])         = 8


Full strace is attached

Version-Release number of selected component (if applicable):
docker-1.10.1-3.git49805e4.fc24.x86_64

How reproducible:
Always

Steps to Reproduce:
1.  run /usr/bin/docker

Actual results:
seg fault

Expected results:
no seg fault

Additional info:
strace is attached

Comment 1 Daniel Walsh 2016-02-17 14:02:01 UTC
If you run an older version of docker are you seeing this?

Comment 2 Mairi Dulaney 2016-02-17 17:50:22 UTC
Created attachment 1127986 [details]
docker-coredump.tar.gz

Comment 3 Mairi Dulaney 2016-02-17 17:53:11 UTC
Aye, it's still a thing if I downgrade to the last 1.10.0 build in koji.

I went ahead reupgraded and grabbed the abrt output.

Comment 4 Mairi Dulaney 2016-02-17 18:07:13 UTC
Running gdb:

(gdb) bt
#0  _dl_lookup_symbol_x (undef_name=0x4056e9 "mmap", undef_map=0x7f6ae3f5b128, ref=ref@entry=0x7ffd6b3c8920, 
    symbol_scope=0x7f6ae3f5b480, version=0x7f6ae3e9c450, type_class=type_class@entry=1, flags=1, skip_map=0x0) at dl-lookup.c:809
#1  0x00007f6ae3d43ea4 in _dl_fixup (l=<optimized out>, reloc_arg=<optimized out>) at ../elf/dl-runtime.c:111
#2  0x00007f6ae3d4c2af in _dl_runtime_resolve_sse () at ../sysdeps/x86_64/dl-trampoline.h:112

This is starting to look like a bug in glibc.

Comment 5 Florian Weimer 2016-02-17 18:25:41 UTC
Does docker work if you set LD_BIND_NOW=1?

Comment 6 Mairi Dulaney 2016-02-17 21:24:09 UTC
(In reply to Florian Weimer from comment #5)
> Does docker work if you set LD_BIND_NOW=1?

Gave that a try, same result.

Comment 7 Carlos O'Donell 2016-02-17 22:09:23 UTC
(In reply to John Dulaney from comment #4)
> Running gdb:
> 
> (gdb) bt
> #0  _dl_lookup_symbol_x (undef_name=0x4056e9 "mmap",
> undef_map=0x7f6ae3f5b128, ref=ref@entry=0x7ffd6b3c8920, 
>     symbol_scope=0x7f6ae3f5b480, version=0x7f6ae3e9c450,
> type_class=type_class@entry=1, flags=1, skip_map=0x0) at dl-lookup.c:809
> #1  0x00007f6ae3d43ea4 in _dl_fixup (l=<optimized out>, reloc_arg=<optimized
> out>) at ../elf/dl-runtime.c:111
> #2  0x00007f6ae3d4c2af in _dl_runtime_resolve_sse () at
> ../sysdeps/x86_64/dl-trampoline.h:112
> 
> This is starting to look like a bug in glibc.

I can reproduce this.

Comment 8 Florian Weimer 2016-02-17 22:18:37 UTC
There is a known golang ABI issue, see bug 1304591 comment 7.

I'm surprised LD_BIND_NOW=1 doesn't work around this.  Can you set a breakpoint on x_cgo_mmap and see if you can get a longer backtrace?

Comment 9 Carlos O'Donell 2016-02-18 00:02:24 UTC
(In reply to Florian Weimer from comment #8)
> There is a known golang ABI issue, see bug 1304591 comment 7.
> 
> I'm surprised LD_BIND_NOW=1 doesn't work around this.  Can you set a
> breakpoint on x_cgo_mmap and see if you can get a longer backtrace?

Using LD_BIND_NOW=1 resolves the issue.

In comment 7 (of this issue) I only meant to say that I could reproduce the crash on rawhide, but had not don anything further.

The crash backtrace is is exactly as in bug 1304591 and it is indeed an incorrect stack alignment issue with the hand-written Go assembly.

*** This bug has been marked as a duplicate of bug 1304591 ***