Description of problem: When optimizations are enabled when compiling qemu on aarch64, the qemu binary crashes reliably here: Program terminated with signal SIGSEGV, Segmentation fault. #0 qemu_kvm_cpu_thread_fn (arg=0x556cefd880) at /usr/src/debug/qemu-2.1.0/cpus.c:858 858 current_cpu = cpu; #0 0x00000055670bccbc in qemu_kvm_cpu_thread_fn (arg=0x556cefd880) at /usr/src/debug/qemu-2.1.0/cpus.c:858 #1 0x0000007f8d92f04c in start_thread (arg=0x7f88add550) at pthread_create.c:312 #2 0x0000007f8b25e590 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89 Version-Release number of selected component (if applicable): qemu 2.1.0 How reproducible: 100% Steps to Reproduce: 1. Start any VM. Additional info: The current_cpu macro is doing some Thread-Local Storage stuff, which might be relevant.
Created attachment 923631 [details] cpus.o-no-opt.txt Compiled code with no optimization (working).
Created attachment 923632 [details] cpus.o-opt.txt Compiled code with optimizations and PIE (not working).
I added this patch to qemu in Rawhide to temporarily work around the issue while we try to work out what's going on: http://pkgs.fedoraproject.org/cgit/qemu.git/commit/?id=a6c45000fe26a552c7f72ba90e5ebfb9d27ffb90
Kyle McMartin asked me to try -mtls-dialect=trad. However it crashes in the same place. Note that I'm only guessing that it's to do with TLS. It could be something completely different.
The reproducer for this is as follows: Check out qemu from git. ./configure \ --target-list="aarch64-softmmu" \ --extra-cflags="-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches" \ --extra-ldflags="-Wl,-z,relro -Wl,-z,now" \ --enable-kvm make You will need a kernel (any kernel) whichis uncompressed, so do something like: zcat /boot/vmlinuz-3.WHATEVER.fc22.aarch64 > /tmp/vmlinux Then try to boot the kernel in qemu: gdb --args ./aarch64-softmmu/qemu-system-aarch64 -nodefaults -machine virt,accel=kvm -kernel /tmp/vmlinux -monitor none -serial stdio and gdb will catch the segfault. Note that I am using aarch64 host running Fedora Rawhide.
http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=2568981 http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=2568986 can you try both these builds and let me know which work? I think I've narrowed the problem down, but it's a bit nasty.
The bz1126199jkkm1 package: error: kvm run failed Bad address This error message causes abort() to be called so the process segfaults: (gdb) bt #0 0x0000007fb549d098 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55 #1 0x0000007fb549ee0c in __GI_abort () at abort.c:89 #2 0x0000005589d51f18 in kvm_cpu_exec (cpu=cpu@entry=0x558a7d1f60) at /usr/src/debug/qemu-2.1.0/kvm-all.c:1727 #3 0x0000005589d40dcc in qemu_kvm_cpu_thread_fn (arg=0x558a7d1f60) at /usr/src/debug/qemu-2.1.0/cpus.c:874 #4 0x0000007fb7d4604c in start_thread (arg=0x7fb2f38550) at pthread_create.c:312 #5 0x0000007fb554b590 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89 (gdb) frame 2 #2 0x0000005589d51f18 in kvm_cpu_exec (cpu=cpu@entry=0x558a7d1f60) at /usr/src/debug/qemu-2.1.0/kvm-all.c:1727 1727 abort(); (gdb) frame 3 #3 0x0000005589d40dcc in qemu_kvm_cpu_thread_fn (arg=0x558a7d1f60) at /usr/src/debug/qemu-2.1.0/cpus.c:874 874 r = kvm_cpu_exec(cpu); (gdb) print cpu $1 = (CPUState *) 0x558a7d1f60 (gdb) print *cpu $2 = { parent_obj = { parent_obj = { class = 0x558a7d1d90, free = 0x7fb7c1f564 <g_free>, properties = { tqh_first = 0x558a7c1960, tqh_last = 0x558a7e0038 }, ref = 2, parent = 0x558a7e6990 }, id = 0x0, realized = true, pending_deleted_event = false, opts = 0x0, hotplugged = 0, parent_bus = 0x0, gpios = { lh_first = 0x558a7e0440 }, child_bus = { lh_first = 0x0 }, num_child_bus = 0, instance_id_alias = -1, alias_required_for_version = 0 }, nr_cores = 1, nr_threads = 1, numa_node = 0, thread = 0x558a7ecb60, thread_id = 3556, host_tid = 0, running = false, halt_cond = 0x558a7ecb80, queued_work_first = 0x0, queued_work_last = 0x0, thread_kicked = false, created = true, stop = false, stopped = false, exit_request = 0, interrupt_request = 0, singlestep_enabled = 0, icount_extra = 0, jmp_env = {{ __jmpbuf = {0 <repeats 22 times>}, __mask_was_saved = 0, __saved_mask = { __val = {0 <repeats 16 times>} } }}, as = 0x558a291b28 <address_space_memory>, tcg_as_listener = 0x0, env_ptr = 0x558a7da218, current_tb = 0x0, tb_jmp_cache = {0x0 <repeats 4096 times>}, gdb_regs = 0x558a7ecb30, gdb_num_regs = 68, gdb_num_g_regs = 34, node = { tqe_next = 0x0, tqe_prev = 0x558a2231f0 <cpus> }, breakpoints = { tqh_first = 0x0, tqh_last = 0x558a7da1a8 }, watchpoints = { tqh_first = 0x0, tqh_last = 0x558a7da1b8 }, watchpoint_hit = 0x0, opaque = 0x0, mem_io_pc = 0, mem_io_vaddr = 0, kvm_fd = 10, kvm_vcpu_dirty = false, kvm_state = 0x558a7bfba0, kvm_run = 0x7fb7fd9000, cpu_index = 0, halted = 0, icount_decr = { u32 = 0, u16 = { low = 0, high = 0 } }, can_do_io = 0, exception_index = 0, tcg_exit_req = 0 }
OK let's ignore the previous comment. I checked back with unoptimized qemu from git and that is now failing in the same way as above on this machine.
This time with a working kernel. The bz1126199jkkm1 package works. The bz1126199jkkm2 package works.
Spiffy, this is going to be fun to debug... Thanks Richard, just wanted to double check that you were seeing the same results, since the issue is weird. :)
OK, it appears to be fixed with upstream binutils... I'll work on identifying a fix. A workaround for the moment is to set -Wl,-z,nocombreloc to avoid sorting .rela which seems to result in the right GOT entries for the TLS vars. regards, Kyle
the fix is: commit f44a1f8e513b37bcc52ba9ea0c172c3e94852756 Author: Christophe Lyon <christophe.lyon> Date: Tue Jan 14 15:53:50 2014 +0100 2014-01-14 Michael Hudson-Doyle <michael.hudson> Kugan Vivekanandarajah <kugan.vivekanandarajah> bfd/ * elfnn-aarch64.c (elfNN_aarch64_final_link_relocate): Use correct offset while calculating relocation address. (elfNN_aarch64_create_small_pltn_entry): Likewise. (elfNN_aarch64_init_small_plt0_entry): Likewise. i'll commit it to binutils after i do a bit more testing.
test results look good, pushed.
Thanks Kyle! I have verified that a self-compiled binutils -22 fixes the problem for me.