Bug 1309149

Summary:

Docker segmentation fault on sched_getaffinity syscall

Product:

[Fedora] Fedora

Reporter:

Mairi Dulaney <jdulaney>

Component:

glibc

Assignee:

Carlos O'Donell <codonell>

Status:

CLOSED DUPLICATE

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

rawhide

CC:

adimania, admiller, amurdaca, arjun, codonell, dj, dwalsh, fweimer, ichavero, jakub, jcajka, jchaloup, jdulaney, law, lsm5, marianne, mfabian, miminar, pfrankli, siddhesh, vbatts

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-02-18 00:02:24 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
docker-strace	none
docker-coredump.tar.gz	none

Description Mairi Dulaney 2016-02-17 01:15:58 UTC

Created attachment 1127783 [details]
docker-strace

Description of problem:
Running docker in a fully up-to-date rawhide vm on F23 host, docker is segfaulting very early in startup on sched_getaffinity(0, 8192, [0])         = 8


Full strace is attached

Version-Release number of selected component (if applicable):
docker-1.10.1-3.git49805e4.fc24.x86_64

How reproducible:
Always

Steps to Reproduce:
1.  run /usr/bin/docker

Actual results:
seg fault

Expected results:
no seg fault

Additional info:
strace is attached

Comment 1 Daniel Walsh 2016-02-17 14:02:01 UTC

If you run an older version of docker are you seeing this?

Comment 2 Mairi Dulaney 2016-02-17 17:50:22 UTC

Created attachment 1127986 [details]
docker-coredump.tar.gz

Comment 3 Mairi Dulaney 2016-02-17 17:53:11 UTC

Aye, it's still a thing if I downgrade to the last 1.10.0 build in koji.

I went ahead reupgraded and grabbed the abrt output.

Comment 4 Mairi Dulaney 2016-02-17 18:07:13 UTC

Running gdb:

(gdb) bt
#0  _dl_lookup_symbol_x (undef_name=0x4056e9 "mmap", undef_map=0x7f6ae3f5b128, ref=ref@entry=0x7ffd6b3c8920, 
    symbol_scope=0x7f6ae3f5b480, version=0x7f6ae3e9c450, type_class=type_class@entry=1, flags=1, skip_map=0x0) at dl-lookup.c:809
#1  0x00007f6ae3d43ea4 in _dl_fixup (l=<optimized out>, reloc_arg=<optimized out>) at ../elf/dl-runtime.c:111
#2  0x00007f6ae3d4c2af in _dl_runtime_resolve_sse () at ../sysdeps/x86_64/dl-trampoline.h:112

This is starting to look like a bug in glibc.

Comment 5 Florian Weimer 2016-02-17 18:25:41 UTC

Does docker work if you set LD_BIND_NOW=1?

Comment 6 Mairi Dulaney 2016-02-17 21:24:09 UTC

(In reply to Florian Weimer from comment #5)
> Does docker work if you set LD_BIND_NOW=1?

Gave that a try, same result.

Comment 7 Carlos O'Donell 2016-02-17 22:09:23 UTC

(In reply to John Dulaney from comment #4)
> Running gdb:
> 
> (gdb) bt
> #0  _dl_lookup_symbol_x (undef_name=0x4056e9 "mmap",
> undef_map=0x7f6ae3f5b128, ref=ref@entry=0x7ffd6b3c8920, 
>     symbol_scope=0x7f6ae3f5b480, version=0x7f6ae3e9c450,
> type_class=type_class@entry=1, flags=1, skip_map=0x0) at dl-lookup.c:809
> #1  0x00007f6ae3d43ea4 in _dl_fixup (l=<optimized out>, reloc_arg=<optimized
> out>) at ../elf/dl-runtime.c:111
> #2  0x00007f6ae3d4c2af in _dl_runtime_resolve_sse () at
> ../sysdeps/x86_64/dl-trampoline.h:112
> 
> This is starting to look like a bug in glibc.

I can reproduce this.

Comment 8 Florian Weimer 2016-02-17 22:18:37 UTC

There is a known golang ABI issue, see bug 1304591 comment 7.

I'm surprised LD_BIND_NOW=1 doesn't work around this.  Can you set a breakpoint on x_cgo_mmap and see if you can get a longer backtrace?

Comment 9 Carlos O'Donell 2016-02-18 00:02:24 UTC

(In reply to Florian Weimer from comment #8)
> There is a known golang ABI issue, see bug 1304591 comment 7.
> 
> I'm surprised LD_BIND_NOW=1 doesn't work around this.  Can you set a
> breakpoint on x_cgo_mmap and see if you can get a longer backtrace?

Using LD_BIND_NOW=1 resolves the issue.

In comment 7 (of this issue) I only meant to say that I could reproduce the crash on rawhide, but had not don anything further.

The crash backtrace is is exactly as in bug 1304591 and it is indeed an incorrect stack alignment issue with the hand-written Go assembly.

*** This bug has been marked as a duplicate of bug 1304591 ***