Bug 1271387

Summary: systemd segfaults when starting up, possibly in 'detect_virtualization'
Product: [Fedora] Fedora Reporter: Richard W.M. Jones <rjones>
Component: binutilsAssignee: Nick Clifton <nickc>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 24CC: jakub, johannbg, jsynacek, lnykryn, mjuszkie, msekleta, nickc, pbrobinson, riku.voipio, s, systemd-maint, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: aarch64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-08 11:44:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 922257    
Attachments:
Description Flags
serial port log none

Description Richard W.M. Jones 2015-10-13 19:53:14 UTC
Created attachment 1082618 [details]
serial port log

Description of problem:

I don't have much detail, but my Fedora Rawhide/aarch64 machine
is now unbootable.  The last log messages are:

         Starting Switch Root...
[    4.937969] systemd-journald[225]: Received SIGTERM from PID 1 (systemd).
[    5.824280] audit_printk_skb: 105 callbacks suppressed
[    5.829396] audit: type=1403 audit(1444765672.070:46): policy loaded auid=4294967295 ses=4294967295
[    5.856387] systemd[1]: Successfully loaded SELinux policy in 245.214ms.
[    6.021822] systemd[1]: Relabelled /dev and /run in 44.831ms.
[    6.062282] systemd[1]: unhandled level 0 translation fault (11) at 0x6aa484d8a50, esr 0x92000004
[    6.071117] pgd = fffffe00c81f0000
[    6.074499] [6aa484d8a50] *pgd=0000000000000000, *pud=0000000000000000, *pmd=0000000000000000
[    6.083013] 
[    6.084498] CPU: 7 PID: 1 Comm: systemd Tainted: G        W       4.2.0-1.fc24.aarch64 #1
[    6.092634] Hardware name: AppliedMicro Mustang/Mustang, BIOS 1.1.0 Aug 26 2015
[    6.099903] task: fffffe03dc080000 ti: fffffe03dc100000 task.ti: fffffe03dc100000
[    6.107351] PC is at 0x2aabcc03d98
[    6.110734] LR is at 0x2aabcc03d80
[    6.114116] pc : [<000002aabcc03d98>] lr : [<000002aabcc03d80>] pstate: a0000000
[    6.121475] sp : 000003ffe90b1ba0
[    6.124770] x29: 000003ffe90b1bc0 x28: 000002aabcd30000 
[    6.130075] x27: 000002aabcd2f000 x26: 000003ffe90b1fc8 
[    6.135382] x25: 00052201b8524844 x24: 0000000000000005 
[    6.140687] x23: 000002aaeee79ea0 x22: 000003ffe90b1d08 
[    6.145990] x21: 000002aabcd31000 x20: 000002aabcd30000 
[    6.151295] x19: 000002aaeee79ea0 x18: 000002aabccbae38 
[    6.156599] x17: 000003ff8b38f0e0 x16: 000002aabcd2f4a0 
[    6.161904] x15: 000002aabcca3353 x14: 000002aabccc6a10 
[    6.167204] x13: 000002aabccbae38 x12: 000002aabcca3353 
[    6.172508] x11: 000002aabcca3353 x10: 000002aabcca3353 
[    6.177811] x9 : 000003ffe90b0700 x8 : 00000000000000d3 
[    6.183119] x7 : 7f7f7f7f7f7f7f7f x6 : fefefeff7dff284d 
[    6.188420] x5 : 00000000000000b0 x4 : 0000000000000000 
[    6.193724] x3 : 0000000000000004 x2 : 0000000000000014 
[    6.199027] x1 : 000003ff8b7a9a50 x0 : 000002aabcd2f000 
[    6.204332] 
[    6.206064] audit: type=1701 audit(1444765672.450:47): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:init_t:s0 pid=505 comm="systemd" exe="/usr/lib/systemd/systemd" sig=11

The full messages are attached.

Version-Release number of selected component (if applicable):

Probably systemd-227-1.fc24

How reproducible:

100%

Steps to Reproduce:
1. Install systemd, reboot.

Comment 1 Richard W.M. Jones 2015-10-14 09:19:09 UTC
It was systemd-227-1.fc24 which is broken.

I recovered the system by booting it with 'init=/bin/bash', manually
bringing up LVM, network etc., and then dnf downgrading to the previous
working version of systemd.

It looks as if systemd-coredump collected a core file from when
dnf installed the broken systemd - it seems as if it also core dumped
during the service reload.  Hopefully this has the same root cause
as the crash on start up.  Here is the stack trace:

#0  0x000003ff935044d8 in kill () from /lib64/libc.so.6
#1  0x000002aaae9c616c in crash.lto_priv.246 (sig=11) at src/core/main.c:185
#2  <signal handler called>
#3  0x000002aaae993d98 in detect_vm () at src/basic/virt.c:263
#4  detect_virtualization () at src/basic/virt.c:410
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Comment 2 Richard W.M. Jones 2015-10-14 09:25:21 UTC
Crash happens here:

int detect_vm(void) {
        static thread_local int cached_found = _VIRTUALIZATION_INVALID;
        int r;

        if (cached_found >= 0)        <--- line 263
                return cached_found;

So could be something to do with thread-local variables & aarch64.

Comment 3 Marcin Juszkiewicz 2015-11-19 14:01:49 UTC
*** Bug 1282392 has been marked as a duplicate of this bug. ***

Comment 4 Marcin Juszkiewicz 2015-11-19 19:41:13 UTC
It is even older. systemd 226-3 and "sudo systemd-nspawn -D $PWD/ROOTFS/ -b" also ends with error:

[ 2251.691640] systemd-nspawn[1109]: unhandled level 0 translation fault (11) at 0x6aa70af64a0, esr 0x92000044
[ 2251.701357] pgd = fffffe03dfe10000
[ 2251.704739] [6aa70af64a0] *pgd=0000000000000000, *pud=0000000000000000, *pmd=0000000000000000
[ 2251.713258] 
[ 2251.714742] CPU: 0 PID: 1109 Comm: systemd-nspawn Tainted: G        W       4.3.0-1.fc24.aarch64 #1
[ 2251.723743] Hardware name: AppliedMicro Mustang/Mustang, BIOS 1.1.0 Oct 20 2015
[ 2251.731016] task: fffffe03dfcb8980 ti: fffffe03dade8000 task.ti: fffffe03dade8000
[ 2251.738464] PC is at 0x2aac8080fcc
[ 2251.741845] LR is at 0x2aac80872d4
[ 2251.745227] pc : [<000002aac8080fcc>] lr : [<000002aac80872d4>] pstate: 00000000
[ 2251.752586] sp : 000003fff808b810
[ 2251.755882] x29: 000003fff808ba80 x28: 000002aaf81210d0 
[ 2251.761189] x27: 0000000000000001 x26: 0000000000000001 
[ 2251.766493] x25: 000002aac810067c x24: 000002aaf81210d0 
[ 2251.771799] x23: 0000000000000001 x22: 0000000000000000 
[ 2251.777100] x21: 0000000000000000 x20: 0000000000000000 
[ 2251.782407] x19: 000002aaf8120030 x18: 0000000000000001 
[ 2251.787711] x17: 000003ffa86444d8 x16: 000002aac80ff940 
[ 2251.793014] x15: 0000000000000060 x14: 0000000000000000 
[ 2251.798321] x13: 0000000000000000 x12: 0000000000000000 
[ 2251.803623] x11: 0000000000000000 x10: 0000000000000000 
[ 2251.808930] x9 : 0000000000000000 x8 : 0000000000000087 
[ 2251.814232] x7 : 0000000000000000 x6 : 0000000000000001 
[ 2251.819539] x5 : 0000000000000000 x4 : 0000000000000001 
[ 2251.824842] x3 : 0000000000000000 x2 : 00000000ffffffff 
[ 2251.830151] x1 : 000003ffa89f74a0 x0 : 000002aac80ff000 
[ 2251.835454]

Comment 5 Zbigniew Jędrzejewski-Szmek 2015-12-07 04:32:59 UTC
Is there a fedora-developer-accessible aarch64 machine for debugging?

Comment 6 Peter Robinson 2015-12-07 06:52:30 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #5)
> Is there a fedora-developer-accessible aarch64 machine for debugging?

https://lists.fedoraproject.org/pipermail/arm/2015-November/010142.html

Comment 7 Richard W.M. Jones 2016-01-04 09:59:40 UTC
This post shows you how to boot an aarch64 VM on x86-64:

  https://rwmj.wordpress.com/2015/05/26/fedora-22-aarch64-virt-builder-image/

Replace s/22/23/ in that, but everything should otherwise work.

However it's super-slow.  Do we have aarch64 remote servers
available for debugging?  I know we have power machines for a
similar purpose.

Comment 8 Peter Robinson 2016-01-04 10:50:25 UTC
There are machines in beaker just like ppc

Comment 9 Riku Voipio 2016-01-04 14:16:37 UTC
I've just debugged similar issue in Debian. The bug appears to be in binutils. systemd compiled with:

binutils_2.25.1-7 -> crash
binutils_2.25.51.20151113-1 -> boots fine

it would appear somewhere between 2.25.1 and Git master on 20151113 a fix to binutils has been applied. probably related to TLS relocations...  

I reverified it locally that systemd 228 compiled with 2.25.1 crashed and todays snapshot for git head worked - unfortunately don't have time to dig the commit for backporting.

Comment 10 Marcin Juszkiewicz 2016-01-05 10:47:10 UTC
Built binutils 2.26.51 for Fedora. Rebuilt systemd 228 with it. Installed in F23 vm, updated initramfs, rebooted.

Works.

Now the question is: when binutils 2.26 will be released...

Comment 11 Marcin Juszkiewicz 2016-01-13 13:50:00 UTC
Looks like it is not only systemd ;(

Are there chances for binutils 2.26.snapshot before mass rebuilt will take place?

[26739.411673] libvirtd[4669]: unhandled level 2 translation fault (11) at 0x2ae1e70fdda, esr 0x92000006
[26739.420907] pgd = fffffe00bedc0000
[26739.424316] [2ae1e70fdda] *pgd=0000000000000000, *pud=0000000000000000, *pmd=0000000000000000
[26739.432880] 
[26739.434386] CPU: 0 PID: 4669 Comm: libvirtd Tainted: G        W       4.4.0-0.rc8.git1.1.fc24.aarch64 #1
[26739.443833] Hardware name: AppliedMicro Mustang/Mustang, BIOS 1.1.0 Oct 20 2015
[26739.451124] task: fffffe03d52ad700 ti: fffffe03d6778000 task.ti: fffffe03d6778000
[26739.458584] PC is at 0x3ff802c1964
[26739.461975] LR is at 0x3ff802c18bc
[26739.465371] pc : [<000003ff802c1964>] lr : [<000003ff802c18bc>] pstate: 20000000
[26739.472756] sp : 000003ff76dfdc20
[26739.476065] x29: 000003ff76dfdc20 x28: 000002ab0e70fdd0 
[26739.481385] x27: 0000000031000000 x26: 000003ff76dfdd70 
[26739.486718] x25: 0000000000000000 x24: 0000000000000000 
[26739.492050] x23: 000003ff5400b3e0 x22: 0000000000000001 
[26739.497367] x21: 0000000000000003 x20: 0000000000100006 
[26739.502703] x19: 000002ab0e70fa70 x18: 000003ff540086f2 
[26739.508037] x17: 0000000000000001 x16: 0000000000000001 
[26739.513358] x15: 0000000000000004 x14: 000002ab0e70f2a0 
[26739.518694] x13: 000003ff5400a110 x12: 0000000000000006 
[26739.524030] x11: 0000000000000006 x10: 000003ff76dfde20 
[26739.529356] x9 : 000003ff76dfdae8 x8 : 000003ff540086f1 
[26739.534688] x7 : 0000000000000004 x6 : 0000000000000030 
[26739.540024] x5 : 0000000000000113 x4 : 0000000000000000 
[26739.545349] x3 : 000003ff76dfdd70 x2 : 000003ff5400b3e0 
[26739.550675] x1 : 0000000000000002 x0 : 000002ae1e70fdd0

Comment 12 Peter Robinson 2016-01-13 14:48:36 UTC
Hi Nick, any chance you could take a look at this for us?

Comment 13 Nick Clifton 2016-01-13 16:52:05 UTC
Hi Marcin,

> Are there chances for binutils 2.26.snapshot before mass rebuilt will take
> place?

2.26 should be coming out next week.  Will that be OK, or would you prefer me to create a tarball from today's current sources and upload that ?

Cheers
  Nick

Comment 14 Marcin Juszkiewicz 2016-01-13 17:21:43 UTC
Yes, we can wait. Need to have it before any mass rebuilds take place.

Comment 15 Richard W.M. Jones 2016-01-26 10:21:24 UTC
FWIW binutils 2.26 has been released:

https://release-monitoring.org/project/7981/

Comment 16 Marcin Juszkiewicz 2016-01-26 10:35:47 UTC
APM Mustang boots fine with systemd 228-7 built using binutils 2.26-2 (both built locally).

Comment 17 Marcin Juszkiewicz 2016-01-26 10:55:08 UTC
https://fedora.juszkiewicz.com.pl/20160105-systemd-binutils/ has both binutils 2.26-2 packages and systemd 228-7 built with them.

Comment 18 Nick Clifton 2016-01-29 12:48:34 UTC
Rawhide binutils appears to fix the problem.

Comment 19 Richard W.M. Jones 2016-02-03 13:52:20 UTC
(In reply to Marcin Juszkiewicz from comment #17)
> https://fedora.juszkiewicz.com.pl/20160105-systemd-binutils/ has both
> binutils 2.26-2 packages and systemd 228-7 built with them.

Can confirm that this systemd package works.

The binutils package is no longer needed since arm.koji providers
a newer version.

Comment 20 Jan Kurik 2016-02-24 13:50:19 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 24 development cycle.
Changing version to '24'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora24#Rawhide_Rebase

Comment 21 Peter Robinson 2016-03-08 11:44:20 UTC
new binutils done, new systemd now built.