Bug 1381315 - SIGSEGV when executable lacks PT_DYNAMIC
Summary: SIGSEGV when executable lacks PT_DYNAMIC
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 25
Hardware: All
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: glibc team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-03 16:31 UTC by John Reiser
Modified: 2017-12-12 10:05 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-12 10:05:09 UTC
Type: Bug


Attachments (Terms of Use)

Description John Reiser 2016-10-03 16:31:13 UTC
Description of problem: Explicitly running ld-linux-x86-64.so.2 on an ELF executable file that lacks PT_DYNAMIC causes a SIGSEGV in ld-linux.


Version-Release number of selected component (if applicable):
glibc-2.24-3.fc25.x86_64


How reproducible: every time


Steps to Reproduce:
1. Create an ELF executable file that lacks PT_DYNAMIC, such as by running 'upx' on a copy of /bin/date.
2. /lib64/ld-linux-x86-64.so.2 date.upx
3.

Actual results:
Program received signal SIGSEGV, Segmentation fault.
_dl_relocate_object (scope=0x55555577b488, reloc_mode=<optimized out>, 
    consider_profiling=consider_profiling@entry=0) at dl-reloc.c:232
232	    const char *strtab = (const void *) D_PTR (l, l_info[DT_STRTAB]);
(gdb) bt
#0  _dl_relocate_object (scope=0x55555577b488, reloc_mode=<optimized out>, 
    consider_profiling=consider_profiling@entry=0) at dl-reloc.c:232
#1  0x0000555555558051 in dl_main (phdr=<optimized out>, phnum=<optimized out>, 
    user_entry=<optimized out>, auxv=<optimized out>) at rtld.c:2066
#2  0x000055555556d91f in _dl_sysdep_start (start_argptr=start_argptr@entry=0x7fffffffdfa0, 
    dl_main=dl_main@entry=0x5555555559a0 <dl_main>) at ../elf/dl-sysdep.c:249
#3  0x0000555555558f68 in _dl_start_final (arg=0x7fffffffdfa0) at rtld.c:305
#4  _dl_start (arg=0x7fffffffdfa0) at rtld.c:411
#5  0x0000555555554cd8 in _start ()
(gdb) x/i $pc
=> 0x5555555604a1 <_dl_relocate_object+161>:	mov    0x8(%rax),%rax
(gdb) l
227	
228	  {
229	    /* Do the actual relocation of the object's GOT and other data.  */
230	
231	    /* String table object symbols.  */
=>232	    const char *strtab = (const void *) D_PTR (l, l_info[DT_STRTAB]);
233	
234	    /* This macro is used as a callback from the ELF_DYNAMIC_RELOCATE code.  */
235	#define RESOLVE_MAP(ref, version, r_type) \
236	    ((ELFW(ST_BIND) ((*ref)->st_info) != STB_LOCAL			      \



Expected results: no SIGSEGV from ld-linux.


Additional info:
Here is a patch which fixes this specific case.  In general this patch is not sufficient because individual elements such as DT_STRTAB could be omitted from the PT_DYNAMIC, but the patch does handle the case when there is no PT_DYNAMIC at all. 

--- rtld.c	2016-08-18 05:12:39.000000000 -0700
+++ /tmp/rtld.c	2016-10-03 09:15:09.989481696 -0700
@@ -2062,7 +2062,7 @@
 	  /* Also allocated with the fake malloc().  */
 	  l->l_free_initfini = 0;
 
-	  if (l != &GL(dl_rtld_map))
+	  if (l != &GL(dl_rtld_map) && l->l_ld)
 	    _dl_relocate_object (l, l->l_scope, GLRO(dl_lazy) ? RTLD_LAZY : 0,
 				 consider_profiling);
 

=====
For the general case it is necessary to guard every instance of D_PTR with something such as

--- sysdeps/generic/ldsodefs.h	2016-08-18 03:00:11.000000000 -0700
+++ /tmp/ldsodefs.h	2016-10-03 09:23:51.737284490 -0700
@@ -58,9 +58,9 @@
   most architectures the entry is already relocated - but for some not
   and we need to relocate at access time.  */
 #ifdef DL_RO_DYN_SECTION
-# define D_PTR(map, i) ((map)->i->d_un.d_ptr + (map)->l_addr)
+# define D_PTR(map, i) (!(map)->i ? NULL : ((map)->i->d_un.d_ptr + (map)->l_addr))
 #else
-# define D_PTR(map, i) (map)->i->d_un.d_ptr
+# define D_PTR(map, i) (!(map)->i ? NULL : ((map)->i->d_un.d_ptr))
 #endif
 
 /* Result of the lookup functions and how to retrieve the base address.  */

=====

Comment 1 Jakub Jelinek 2016-10-03 16:41:46 UTC
Why is anything needed?  There are lots of ways how you can construct invalid ELF objects, the dynamic linker IMO doesn't need to waste time to avoid crashing on garbage.  You are trying to run it, so there is always possibility to crash it, e.g. just by doing something in the executable that will crash.

Comment 2 John Reiser 2016-10-03 16:54:00 UTC
Such a ELF executable is valid; it just happens to need no relocation or other assistance from ld-linux, so in particular there is no PT_DYNAMIC.  There is no problem running the executable directly using execve().  ld-linux should not cause SIGSEGV when running it.

Comment 3 Carlos O'Donell 2016-10-07 06:34:16 UTC
(In reply to John Reiser from comment #2)
> Such a ELF executable is valid; it just happens to need no relocation or
> other assistance from ld-linux, so in particular there is no PT_DYNAMIC. 
> There is no problem running the executable directly using execve(). 
> ld-linux should not cause SIGSEGV when running it.

ELF validity is a question that can only be answered given a specific use case.

Your use case appears to be upx so let us go with that.

In Fedora Rawhide upx appears broken. In fact the test case of upx should never involve the dynamic loader because without a PT_INTERP the kernel runs the executable directly.

rpm -qa | grep upx
upx-3.91-7.fc24.x86_64

cp /bin/date .
upx ./date

eu-readelf -a -W ./date 
eu-readelf: failed reading './date': (null)

Using binutils readelf (tolerant of garbage) we see:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0] <no-name>         NULL            000120d0000120d0 b900000238 ff21fbf600000002 19901740760ecb6f     1179403647 65794 1094902530871001091
  [ 1] <no-name>         LOUSER+0x5d602c09 f88f21764b072705 6ed9b23c00670801 1c070f02380304 e8ab7ecdb2133720 WAXMSIOTxxxoxxxxxxxxx 2244480706 1281 2249689869432521
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)

Total garbage.

It contains a real PT_DYNAMIC, but the binary has been completely mangled by upx without correctly updating the ELF information, and so it crashes.

In Fedora 23 it works just fine:

cp /bin/date ./date
upx ./date
./date
Fri Oct  7 02:25:30 EDT 2016

Copying the resulting F23 upx built binary to Rawhide:
./date
Thu Oct  6 23:26:11 PDT 2016

It also works fine, showing there is no problem.

And the dynamic loader is smart enough _not_ to load it (combination of ET_DYN without PT_DYNAMIC):
/lib64/ld-linux-x86-64.so.2 ./date
./date: error while loading shared libraries: ./date: object file has no dynamic section

So for now this looks like a upx bug.

Do you have any other reproducers you want to talk about?

Comment 4 John Reiser 2016-10-09 12:10:10 UTC
Here is a reproducer of the same problem on i686:

===== upxtest.c
int const x[10000] = {1,2,3};  /* highly compressible, and in .text */
int main(int argc, char *argv[]) { return x[1]; }
=====
# dnf install glibc.i686 glibc-devel.i686 libgcc.i686  # i686 runtime support
$ gcc -m32 -o upxtest.i686 upxtest.c
$ upx upxtest.i686
$ gdb /lib/ld-linux.so.2
(gdb) run ./upxtest.i686
Starting program: /usr/lib/ld-linux.so.2 ./upxtest.i686

Program received signal SIGSEGV, Segmentation fault.
_dl_relocate_object (scope=0x5657bab8, reloc_mode=1, consider_profiling=0)
    at dl-reloc.c:232
232	    const char *strtab = (const void *) D_PTR (l, l_info[DT_STRTAB]);

(gdb) l
227	
228	  {
229	    /* Do the actual relocation of the object's GOT and other data.  */
230	
231	    /* String table object symbols.  */
232	    const char *strtab = (const void *) D_PTR (l, l_info[DT_STRTAB]);
233	
234	    /* This macro is used as a callback from the ELF_DYNAMIC_RELOCATE code.  */
235	#define RESOLVE_MAP(ref, version, r_type) \
236	    ((ELFW(ST_BIND) ((*ref)->st_info) != STB_LOCAL			      \

(gdb) bt
#0  _dl_relocate_object (scope=0x5657bab8, reloc_mode=1, consider_profiling=0)
    at dl-reloc.c:232
#1  0x565589e0 in dl_main (phdr=<optimized out>, phnum=<optimized out>, 
    user_entry=<optimized out>, auxv=<optimized out>) at rtld.c:2066
#2  0x5656ddd9 in _dl_sysdep_start (start_argptr=0xffffd120, 
    dl_main=0x565566e0 <dl_main>) at ../elf/dl-sysdep.c:249
#3  0x56559ec2 in _dl_start_final (arg=0xffffd120) at rtld.c:305
#4  _dl_start (arg=<optimized out>) at rtld.c:411
#5  0x56555ad7 in _start ()

(gdb) x/i $pc
=> 0x56560f14 <_dl_relocate_object+132>:	mov    0x4(%eax),%eax
(gdb) p/x $eax
$1 = 0x0

[The corresponding case for x86_64 currently fails to reproduce _this_ problem because of a dispute over the validity of (PT_LOAD.p_offset > .st_size) when (0 == .p_memsz).  upx uses (PT_LOAD[data_segment].p_memsz == 0)  to set the value (brk(0) = .p_vaddr), and the problem arises because for small files (.p_align > .st_size) because .p_align is 0x200000 (2MiB) on x86_64.  upx-3.92 will work-around the issue by reducing .p_align for .data to 0x1000, which will be OK because .p_align for .text remains at 2MiB.]

Comment 5 John Reiser 2016-10-09 12:32:51 UTC
(In reply to Carlos O'Donell from comment #3.  This reply provides background information, but should not derail attention to the SIGSEGV in ld-linux.)

> eu-readelf -a -W ./date 
> eu-readelf: failed reading './date': (null)
> 
> Using binutils readelf (tolerant of garbage) we see:
> 
> Section Headers:
>   [Nr] Name              Type            Address          Off    Size   ES
> Flg Lk Inf Al
>   [ 0] <no-name>         NULL            000120d0000120d0 b900000238
> ff21fbf600000002 19901740760ecb6f     1179403647 65794 1094902530871001091
>   [ 1] <no-name>         LOUSER+0x5d602c09 f88f21764b072705 6ed9b23c00670801
> 1c070f02380304 e8ab7ecdb2133720 WAXMSIOTxxxoxxxxxxxxx 2244480706 1281
> 2249689869432521
> Key to Flags:
>   W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
>   L (link order), O (extra OS processing required), G (group), T (TLS),
>   C (compressed), x (unknown), o (OS specific), E (exclude),
>   l (large), p (processor specific)
> 
> Total garbage.

The reason for "total garbage" is that libbfd and its clients do not grok (0 == .e_shnum).  An ELF file contains two views of the program: the execution view (the ElfXX_Phdr) and the linking view (the ElfXX_Shdr).  The operating system pays attention to the Phdr only; the Shdr are ignored by execve().  Speaking charitably, the readelf in binutils has good intentions of providing as much information as possible: trying to interpret .e_shoff as the start of ElfXX_Shdr table even though (0==.e_shnum).  However when (0==.e_shnum) then the printout of the "extra" Shdr should contain a warning.

upx sets (.e_shnum = 0) because upx aims for small file size of executables, and the Shdr are ignored by execution.  upx used to set (.e_shoff = 0) also, but previous versions of libbfd had an even worse understanding of that.

Incidentally, gdb cannot be used to debug some runnable programs because gdb relies on libbfd, and libbfd balks when the linking view (Shdrs) is not to its liking, even though the operating system digests the execution view (Phdrs) just fine.  Of course when (0==.e_shnum) then I don't expect symbols from the debugger, but I do expect register values, program counter, backtraces using the frame pointer, breakpoints and single stepping by instruction, etc.  I'll file a bugreport about that later.

Comment 6 Fedora End Of Life 2017-11-16 18:50:52 UTC
This message is a reminder that Fedora 25 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 25. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '25'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 25 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 7 Fedora End Of Life 2017-12-12 10:05:09 UTC
Fedora 25 changed to end-of-life (EOL) status on 2017-12-12. Fedora 25 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.