Bug 1126199

Summary: qemu is mis-linked on aarch64 when PIE+RELRO+combreloc
Product: [Fedora] Fedora Reporter: Richard W.M. Jones <rjones>
Component: binutilsAssignee: Kyle McMartin <kmcmartin>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: agk, amit.shah, berrange, cfergeau, dwmw2, itamar, jakub, kmcmartin, nickc, pbonzini, pbrobinson, peterm, rjones, scottt.tw, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: aarch64   
OS: Unspecified   
Whiteboard:
Fixed In Version: binutils-2.24-22.fc22 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-22 02:38:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 922257    
Attachments:
Description Flags
cpus.o-no-opt.txt
none
cpus.o-opt.txt none

Description Richard W.M. Jones 2014-08-03 13:16:50 UTC
Description of problem:

When optimizations are enabled when compiling qemu on aarch64,
the qemu binary crashes reliably here:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  qemu_kvm_cpu_thread_fn (arg=0x556cefd880)
    at /usr/src/debug/qemu-2.1.0/cpus.c:858
858	    current_cpu = cpu;

#0  0x00000055670bccbc in qemu_kvm_cpu_thread_fn (arg=0x556cefd880)
    at /usr/src/debug/qemu-2.1.0/cpus.c:858
#1  0x0000007f8d92f04c in start_thread (arg=0x7f88add550)
    at pthread_create.c:312
#2  0x0000007f8b25e590 in thread_start ()
    at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89

Version-Release number of selected component (if applicable):

qemu 2.1.0

How reproducible:

100%

Steps to Reproduce:
1. Start any VM.

Additional info:

The current_cpu macro is doing some Thread-Local Storage stuff,
which might be relevant.

Comment 1 Richard W.M. Jones 2014-08-03 14:37:30 UTC
Created attachment 923631 [details]
cpus.o-no-opt.txt

Compiled code with no optimization (working).

Comment 2 Richard W.M. Jones 2014-08-03 14:38:07 UTC
Created attachment 923632 [details]
cpus.o-opt.txt

Compiled code with optimizations and PIE (not working).

Comment 3 Richard W.M. Jones 2014-08-04 11:17:49 UTC
I added this patch to qemu in Rawhide to temporarily work
around the issue while we try to work out what's going on:

http://pkgs.fedoraproject.org/cgit/qemu.git/commit/?id=a6c45000fe26a552c7f72ba90e5ebfb9d27ffb90

Comment 4 Richard W.M. Jones 2014-08-04 13:36:14 UTC
Kyle McMartin asked me to try -mtls-dialect=trad.  However it
crashes in the same place.

Note that I'm only guessing that it's to do with TLS.  It could
be something completely different.

Comment 5 Richard W.M. Jones 2014-08-15 20:44:45 UTC
The reproducer for this is as follows:

Check out qemu from git.

./configure \
  --target-list="aarch64-softmmu" \
  --extra-cflags="-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches" \
  --extra-ldflags="-Wl,-z,relro -Wl,-z,now" \
  --enable-kvm

make

You will need a kernel (any kernel) whichis uncompressed, so do
something like:

zcat /boot/vmlinuz-3.WHATEVER.fc22.aarch64 > /tmp/vmlinux

Then try to boot the kernel in qemu:

gdb --args ./aarch64-softmmu/qemu-system-aarch64 -nodefaults -machine
virt,accel=kvm -kernel /tmp/vmlinux -monitor none -serial stdio

and gdb will catch the segfault.

Note that I am using aarch64 host running Fedora Rawhide.

Comment 6 Kyle McMartin 2014-08-20 18:30:24 UTC
http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=2568981
http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=2568986

can you try both these builds and let me know which work? I think I've narrowed the problem down, but it's a bit nasty.

Comment 7 Richard W.M. Jones 2014-08-20 21:32:40 UTC
The bz1126199jkkm1 package:

error: kvm run failed Bad address

This error message causes abort() to be called so the process
segfaults:

(gdb) bt
#0  0x0000007fb549d098 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:55
#1  0x0000007fb549ee0c in __GI_abort () at abort.c:89
#2  0x0000005589d51f18 in kvm_cpu_exec (cpu=cpu@entry=0x558a7d1f60)
    at /usr/src/debug/qemu-2.1.0/kvm-all.c:1727
#3  0x0000005589d40dcc in qemu_kvm_cpu_thread_fn (arg=0x558a7d1f60)
    at /usr/src/debug/qemu-2.1.0/cpus.c:874
#4  0x0000007fb7d4604c in start_thread (arg=0x7fb2f38550)
    at pthread_create.c:312
#5  0x0000007fb554b590 in thread_start ()
    at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89
(gdb) frame 2
#2  0x0000005589d51f18 in kvm_cpu_exec (cpu=cpu@entry=0x558a7d1f60)
    at /usr/src/debug/qemu-2.1.0/kvm-all.c:1727
1727	            abort();
(gdb) frame 3
#3  0x0000005589d40dcc in qemu_kvm_cpu_thread_fn (arg=0x558a7d1f60)
    at /usr/src/debug/qemu-2.1.0/cpus.c:874
874	            r = kvm_cpu_exec(cpu);
(gdb) print cpu
$1 = (CPUState *) 0x558a7d1f60
(gdb) print *cpu
$2 = {
  parent_obj = {
    parent_obj = {
      class = 0x558a7d1d90, 
      free = 0x7fb7c1f564 <g_free>, 
      properties = {
        tqh_first = 0x558a7c1960, 
        tqh_last = 0x558a7e0038
      }, 
      ref = 2, 
      parent = 0x558a7e6990
    }, 
    id = 0x0, 
    realized = true, 
    pending_deleted_event = false, 
    opts = 0x0, 
    hotplugged = 0, 
    parent_bus = 0x0, 
    gpios = {
      lh_first = 0x558a7e0440
    }, 
    child_bus = {
      lh_first = 0x0
    }, 
    num_child_bus = 0, 
    instance_id_alias = -1, 
    alias_required_for_version = 0
  }, 
  nr_cores = 1, 
  nr_threads = 1, 
  numa_node = 0, 
  thread = 0x558a7ecb60, 
  thread_id = 3556, 
  host_tid = 0, 
  running = false, 
  halt_cond = 0x558a7ecb80, 
  queued_work_first = 0x0, 
  queued_work_last = 0x0, 
  thread_kicked = false, 
  created = true, 
  stop = false, 
  stopped = false, 
  exit_request = 0, 
  interrupt_request = 0, 
  singlestep_enabled = 0, 
  icount_extra = 0, 
  jmp_env = {{
      __jmpbuf = {0 <repeats 22 times>}, 
      __mask_was_saved = 0, 
      __saved_mask = {
        __val = {0 <repeats 16 times>}
      }
    }}, 
  as = 0x558a291b28 <address_space_memory>, 
  tcg_as_listener = 0x0, 
  env_ptr = 0x558a7da218, 
  current_tb = 0x0, 
  tb_jmp_cache = {0x0 <repeats 4096 times>}, 
  gdb_regs = 0x558a7ecb30, 
  gdb_num_regs = 68, 
  gdb_num_g_regs = 34, 
  node = {
    tqe_next = 0x0, 
    tqe_prev = 0x558a2231f0 <cpus>
  }, 
  breakpoints = {
    tqh_first = 0x0, 
    tqh_last = 0x558a7da1a8
  }, 
  watchpoints = {
    tqh_first = 0x0, 
    tqh_last = 0x558a7da1b8
  }, 
  watchpoint_hit = 0x0, 
  opaque = 0x0, 
  mem_io_pc = 0, 
  mem_io_vaddr = 0, 
  kvm_fd = 10, 
  kvm_vcpu_dirty = false, 
  kvm_state = 0x558a7bfba0, 
  kvm_run = 0x7fb7fd9000, 
  cpu_index = 0, 
  halted = 0, 
  icount_decr = {
    u32 = 0, 
    u16 = {
      low = 0, 
      high = 0
    }
  }, 
  can_do_io = 0, 
  exception_index = 0, 
  tcg_exit_req = 0
}

Comment 8 Richard W.M. Jones 2014-08-20 21:41:46 UTC
OK let's ignore the previous comment.  I checked back with
unoptimized qemu from git and that is now failing in the
same way as above on this machine.

Comment 9 Richard W.M. Jones 2014-08-20 22:03:16 UTC
This time with a working kernel.

The bz1126199jkkm1 package works.

The bz1126199jkkm2 package works.

Comment 10 Kyle McMartin 2014-08-20 23:20:07 UTC
Spiffy, this is going to be fun to debug... Thanks Richard, just wanted to double check that you were seeing the same results, since the issue is weird. :)

Comment 11 Kyle McMartin 2014-08-22 00:25:15 UTC
OK, it appears to be fixed with upstream binutils... I'll work on identifying a fix.

A workaround for the moment is to set -Wl,-z,nocombreloc to avoid sorting .rela which seems to result in the right GOT entries for the TLS vars.

regards, Kyle

Comment 12 Kyle McMartin 2014-08-22 02:17:46 UTC
the fix is:

commit f44a1f8e513b37bcc52ba9ea0c172c3e94852756
Author: Christophe Lyon <christophe.lyon>
Date:   Tue Jan 14 15:53:50 2014 +0100

    2014-01-14  Michael Hudson-Doyle  <michael.hudson>
            Kugan Vivekanandarajah  <kugan.vivekanandarajah>
    
        bfd/
        * elfnn-aarch64.c (elfNN_aarch64_final_link_relocate): Use correct
        offset while calculating relocation address.
        (elfNN_aarch64_create_small_pltn_entry): Likewise.
        (elfNN_aarch64_init_small_plt0_entry): Likewise.

i'll commit it to binutils after i do a bit more testing.

Comment 13 Kyle McMartin 2014-08-22 02:38:45 UTC
test results look good, pushed.

Comment 14 Richard W.M. Jones 2014-08-22 08:15:29 UTC
Thanks Kyle!

I have verified that a self-compiled binutils -22 fixes the
problem for me.