| Summary: | /lib64/ld64.so.1 --verify on valgrind tool libraries leads to segfault | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Miloš Prchlík <mprchlik> |
| Component: | glibc | Assignee: | Carlos O'Donell <codonell> |
| Status: | CLOSED UPSTREAM | QA Contact: | qe-baseos-tools-bugs |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.8 | CC: | ashankar, fweimer, hannsj_uhl, mcermak, mnewsome, mprchlik, pfrankli |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | ppc64 | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-11-23 11:31:57 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Miloš Prchlík
2016-11-22 15:05:54 UTC
A segmentation fault during the handling of __mmap is really very very odd. There is almost nothing in __mmap at all, just argument prep and then the inlined syscall. Exactly where in mmap did it fail? Can you disassemble and see where the instruction pointer is pointing? For now I'm moving to rhel-6.10, since without a deep triage we can't do anything for rhel-6.9. Sure, I can disassemble. But I'm none the wiser after that, I hope you would :) I can even borrow you the box I kept for such purpose if you'd like to investigate on your own. Unfortunately I cannot attach a core dump, bugzilla won't let me :(
Core was generated by `/lib64/ld64.so.1 --verify /usr/lib64/valgrind/cachegrind-ppc64-linux '.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000038ec271c in .__mmap ()
(gdb) disas
Dump of assembler code for function .__mmap:
0x0000000038ec2714 <+0>: li r0,90
0x0000000038ec2718 <+4>: sc
=> 0x0000000038ec271c <+8>: bnslr+
0x0000000038ec2720 <+12>: b 0x38ec2110 <__syscall_error>
0x0000000038ec2724 <+16>: .long 0x0
0x0000000038ec2728 <+20>: .long 0xc2040
0x0000000038ec272c <+24>: .long 0x0
0x0000000038ec2730 <+28>: .long 0x10
0x0000000038ec2734 <+32>: .long 0x65f5f
0x0000000038ec2738 <+36>: xoris r13,r11,24944
End of assembler dump.
(gdb)
(gdb) info registers
r0 0x5a 90
r1 0xfffcce75f70 17591328792432
r2 0x38ee8af8 955157240
r3 0x38390000 943259648
r4 0x26b7850 40597584
r5 0x3 3
r6 0x32 50
r7 0xffffffffffffffff 18446744073709551615
r8 0x0 0
r9 0x0 0
r10 0x0 0
r11 0x0 0
r12 0x0 0
r13 0x0 0
r14 0x38ee2a70 955132528
r15 0x38390000 943259648
r16 0xfffcce76040 17591328792640
r17 0xfffcce762b8 17591328793272
r18 0x6 6
r19 0xfffcce76010 17591328792592
r20 0x2 2
r21 0xfffcce762f8 17591328793336
r22 0x38ee2a48 955132488
r23 0x2a47850 44333136
r24 0xfffcce7fae1 17591328832225
r25 0x38ee1900 955128064
r26 0x3 3
r27 0x20000000 536870912
r28 0x0 0
r29 0x38edfb88 955120520
r30 0x38ee2aa0 955132576
r31 0xfffcce76010 17591328792592
pc 0x38ec271c 0x38ec271c <.__mmap+8>
msr 0x800000004000d032 9223372037928570930
cr 0x48048444 1208255556
lr 0x38eac534 0x38eac534 <_dl_map_object_from_fd+2948>
ctr 0x0 0
xer 0x20000000 536870912
orig_r3 0x38390000 943259648
trap 0x400 1024
(gdb)
(gdb) up
#1 0x0000000038eac534 in _dl_map_object_from_fd (name=0xfffcce7fae1 "/usr/lib64/valgrind/cachegrind-ppc64-linux", origname=0x0, fd=<value optimized out>, fbp=0xfffcce762b0,
realname=0x38ee2a70 <Address 0x38ee2a70 out of bounds>, loader=<value optimized out>, l_type=<value optimized out>, mode=<value optimized out>, stack_endp=0xfffcce762a0, nsid=0)
at dl-load.c:1322
1322 mapat = __mmap ((caddr_t) zeropage, zeroend - zeropage,
(gdb) info locals
mapat = <value optimized out>
zero = <value optimized out>
zeroend = <value optimized out>
zeropage = 943259648
c = 0xfffcce76040
nloadcmds = 2
loadcmds = 0xfffcce76010
has_holes = <value optimized out>
l = 0x38ee2aa0
header = <value optimized out>
phdr = 0xfffcce762f8
ph = <value optimized out>
maplength = 44333136
type = 2
st = {st_dev = 64768, st_ino = 2122697, st_nlink = 1, st_mode = 33261, st_uid = 0, st_gid = 0, __pad2 = 0, st_rdev = 0, st_size = 14827066, st_blksize = 4096, st_blocks = 28960, st_atim = {
tv_sec = 1479825101, tv_nsec = 776033347}, st_mtim = {tv_sec = 1424941587, tv_nsec = 0}, st_ctim = {tv_sec = 1479825045, tv_nsec = 26026581}, __unused4 = 0, __unused5 = 0,
__unused6 = 0}
errstring = 0x0
errval = 0
r = 0x38ee2a48
make_consistent = true
stack_flags = 6
(gdb)
Reproduced successfully. strace output:
execve("/lib64/ld64.so.1", ["/lib64/ld64.so.1", "--verify", "/usr/lib64/valgrind/cachegrind-p"...], [/* 44 vars */]) = 0
brk(0) = 0x10035af0000
open("/usr/lib64/valgrind/cachegrind-ppc64-linux", O_RDONLY) = 3
read(3, "\177ELF\2\2\1\0\0\0\0\0\0\0\0\0\0\2\0\25\0\0\0\1\0\0\0\00087/0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=14827066, ...}) = 0
mmap(0x38000000, 3473408, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x10000) = 0x38000000
mmap(0x38350000, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x360000) = 0x38350000
mmap(0x38390000, 40597584, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x38390000
The final mapping apparently overlaps with the dynamic linker code, but succeeds because of MAP_FIXED. This is difficult to fix because we do not have control over where the kernel loads /lib64/ld64.so.1.
Do you see this with --verify only, or also when running programs? When loading a program, the kernel should make sure there is no overlap.
Reproducer on x86_64, Fedora 23:
char buffer[3ULL << 45];
int main (void)
{
}
The following needs to run with vm.overcommit_memory=1 unless you have a lot of RAM.
$ gdb --args /lib64/ld-linux-x86-64.so.2 ./a.out
…
(gdb) r
Starting program: /usr/lib64/ld-linux-x86-64.so.2 ./a.out
Program received signal SIGSEGV, Segmentation fault.
0x000055555556d4ba in mmap64 () at ../sysdeps/unix/syscall-template.S:84
84 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) disas
Dump of assembler code for function mmap64:
0x000055555556d4b0 <+0>: add %al,(%rax)
0x000055555556d4b2 <+2>: add %al,(%rax)
0x000055555556d4b4 <+4>: add %al,(%rax)
0x000055555556d4b6 <+6>: add %al,(%rax)
0x000055555556d4b8 <+8>: add %al,(%rax)
=> 0x000055555556d4ba <+10>: add %al,(%rax)
0x000055555556d4bc <+12>: add %al,(%rax)
0x000055555556d4be <+14>: add %al,(%rax)
0x000055555556d4c0 <+16>: add %al,(%rax)
0x000055555556d4c2 <+18>: add %al,(%rax)
0x000055555556d4c4 <+20>: add %al,(%rax)
0x000055555556d4c6 <+22>: add %al,(%rax)
0x000055555556d4c8 <+24>: add %al,(%rax)
0x000055555556d4ca <+26>: add %al,(%rax)
0x000055555556d4cc <+28>: add %al,(%rax)
0x000055555556d4ce <+30>: add %al,(%rax)
0x000055555556d4d0 <+32>: add %al,(%rax)
0x000055555556d4d2 <+34>: add %al,(%rax)
End of assembler dump.
(gdb)
The opcodes correspond to NUL bytes, so the mmap system call stub itself has been over-mapped.
ldd is affected as well, and this use case could eventually be addressed by Carlos' eu-ldd. I'm resolving this as CLOSED/UPSTREAM because there is no downstream work until we have an upstream fix or a suitable replacement for ldd. |