Description of problem: Attempting to boot the armhfp iot disk images fails: .. [ OK ] Found device /dev/disk/by-…d-b143-46f2-bd43-da13120110ab. [ OK ] Reached target Initrd Root Device. [ OK ] Reached target Remote File Systems (Pre). [ OK ] Reached target Remote File Systems. Starting File System Check…b143-46f2-bd43-da13120110ab... [ OK ] Finished File System Check…d-b143-46f2-bd43-da13120110ab. Mounting /sysroot... [ 9.051723] EXT4-fs (vda3): mounted filesystem with ordered data mode. Opts: (null) [ OK ] Mounted /sysroot. Starting OSTree Prepare OS/... [FAILED] Failed to start OSTree Prepare OS/. See 'systemctl status ostree-prepare-root.service' for details. ● ostree-prepare-root.service - OSTree Prepare OS/ Loaded: loaded (/usr/lib/systemd/system/ostree-prepare-root.service; enabled; vendor preset: enabled) Active: failed (Result: core-dump) since Wed 2020-09-30 00:00:29 UTC; 5min ago Docs: man:ostree(1) Main PID: 768 (code=dumped, signal=SEGV) CPU: 10ms Sep 30 00:00:29 fedora systemd[1]: ostree-prepare-root.service: Failed with result 'core-dump'. Sep 30 00:00:29 fedora systemd[1]: ostree-prepare-root.service: Service will not restart (restart setting) Sep 30 00:00:29 fedora systemd[1]: ostree-prepare-root.service: Changed start -> failed Sep 30 00:00:29 fedora systemd[1]: ostree-prepare-root.service: Job 538 ostree-prepare-root.service/start finished, result=failed Sep 30 00:00:29 fedora systemd[1]: Failed to start OSTree Prepare OS/. Sep 30 00:00:29 fedora systemd[1]: ostree-prepare-root.service: Unit entered failed state. Sep 30 00:00:29 fedora systemd[1]: ostree-prepare-root.service: Triggering OnFailure= dependencies. Sep 30 00:00:29 fedora systemd[1]: ostree-prepare-root.service: Consumed 10ms CPU time. Sep 30 00:00:29 fedora systemd[1]: ostree-prepare-root.service: Control group is empty. Sep 30 00:00:30 fedora systemd[1]: ostree-prepare-root.service: Changed dead -> failed Version-Release number of selected component (if applicable): ostree-2020.6-5.fc33 How reproducible: Everytime
Created attachment 1721588 [details] journalctl
From the logs: Sep 30 00:02:21 fedora systemd[2690]: ostree-prepare-root.service: Executing: /usr/lib/ostree/ostree-prepare-root /sysroot Sep 30 00:02:21 fedora systemd[1]: systemd-journald.service: Got notification message from PID 219 (FDSTORE=1) Sep 30 00:02:21 fedora systemd[1]: systemd-journald.service: Added fd 40 (n/a) to fd store. Sep 30 00:02:21 fedora systemd[1]: systemd-journald.service: Got notification message from PID 219 (FDSTORE=1) Sep 30 00:02:21 fedora systemd[1]: systemd-journald.service: Added fd 41 (n/a) to fd store. Sep 30 00:02:21 fedora audit[2690]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=2690 comm="ostree-prepare-" exe="/usr/lib/ostree/ostree-prepare-root" sig=11 res=1 Sep 30 00:02:21 fedora systemd-coredump[2691]: Failed to connect to coredump service: No such file or directory Sep 30 00:02:21 fedora systemd[1]: Received SIGCHLD from PID 2690 (ostree-prepare-). Sep 30 00:02:21 fedora systemd[1]: Child 2690 (ostree-prepare-) died (code=dumped, status=11/SEGV)
Created attachment 1721590 [details] rdsosreport.txt
Ahh fun! The SIGSEGV and armhfp tells me this could be arch-specific (something int-size related issue maybe?). Hmm, this is going to be tricky to debug. Do you have access to the emergency console after? You should be able to try to run ostree-prepare-root manually yourself. You'll have to mount /sysroot first. So e.g.: $ mount /dev/disk/by-label/root /sysroot $ /usr/lib/ostree/ostree-prepare-root /sysroot If you can do that and the segfault reproduces, one thing you can try is to run it with strace at least so we can tell where roughly in the code to look. You can fetch strace from the host (if it's in the IoT package set), e.g.: $ /sysroot/ostree/deploy/*/deploy/*.0/usr/bin/strace /usr/lib/ostree/ostree-prepare-root /sysroot If it's not in the IoT package set, you can also transfer it via e.g. an attached USB device or from the network if booted with rd.neednet=1 (or even using libguestfs to copy it into the /boot partition of the image beforehand). Could probably take this to the next level and run `gdb.minimal --args ...` on it to get the exact spot where the segfault occurs. :)
> (something int-size related issue maybe?) Right, may be reproducible with a 32 bit x86 userspace too. Unfortunately Fedora (sensibly) doesn't ship that anymore... How reproducible is this? What version of ostree is involved? Is it only happening in F33 right now?
> Ahh fun! The SIGSEGV and armhfp tells me this could be arch-specific > (something int-size related issue maybe?). Hmm, this is going to be tricky > to debug. Yes, but we're not the first/only to use ostree for ARMv7, there's Endless and Foundries that support armv7 platforms
(To be clear I meant 32 bit *x86* doesn't make much sense IMO, 32 bit ARM is clearly here for years to come and I definitely want to support it) Anyways I tried a few things, first a basic 32 bit x86 build seems OK without warnings, but we've had bugs in the past so I added it to our CI (Debian/Ubuntu still support 32 bit x86): https://github.com/ostreedev/ostree/pull/2218 Second is looking at the file history... https://github.com/ostreedev/ostree/commits/master/src/switchroot/ostree-prepare-root.c Not much has changed here recently, the biggest is https://github.com/ostreedev/ostree/commit/35642259175973617da937f3cab6ce5f13c95077#diff-e72f49c5f255b804b75fc887dd0490bff1d6d999ac6a9476dd23d3fb85a21260 And I stared at that diff for a bit and didn't see any problems. A possible source of problems might be passing the wrong size value to getline() but that looks right to me.
(In reply to Jonathan Lebon from comment #4) > Ahh fun! The SIGSEGV and armhfp tells me this could be arch-specific > (something int-size related issue maybe?). Hmm, this is going to be tricky > to debug. > > Do you have access to the emergency console after? You should be able to try > to run ostree-prepare-root manually yourself. You'll have to mount /sysroot > first. So e.g.: > > $ mount /dev/disk/by-label/root /sysroot > $ /usr/lib/ostree/ostree-prepare-root /sysroot > > If you can do that and the segfault reproduces, one thing you can try is to > run it with strace at least so we can tell where roughly in the code to > look. You can fetch strace from the host (if it's in the IoT package set), > e.g.: > > $ /sysroot/ostree/deploy/*/deploy/*.0/usr/bin/strace > /usr/lib/ostree/ostree-prepare-root /sysroot .. readlink("/sysroot/ostree/boot.0", "boot.0.1", 4095) = 8 lstat64("/sysroot/ostree/boot.0.1", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 lstat64("/sysroot/ostree/boot.0.1/fedora-iot", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 lstat64("/sysroot/ostree/boot.0.1/fedora-iot/45ebb61204d0dfafdd5022960937239165a9feb31e6996254c80bce3f16681df", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 lstat64("/sysroot/ostree/boot.0.1/fedora-iot/45ebb61204d0dfafdd5022960937239165a9feb31e6996254c80bce3f16681df/0", {st_mode=S_IFLNK|0777, st_size=100, ...}) = 0 readlink("/sysroot/ostree/boot.0.1/fedora-iot/45ebb61204d0dfafdd5022960937239165a9feb31e6996254c80bce3f16681df/0", "../../../deploy/fedora-iot/deplo"..., 4095) = 100 lstat64("/sysroot/ostree/deploy", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 lstat64("/sysroot/ostree/deploy/fedora-iot", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 lstat64("/sysroot/ostree/deploy/fedora-iot/deploy", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 lstat64("/sysroot/ostree/deploy/fedora-iot/deploy/bc246420ec9cbfe5c0780b4285a86150e9cefa1385b5acf52dacc8ab7edb0b54.0", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 stat64("/sysroot/ostree/deploy/fedora-iot/deploy/bc246420ec9cbfe5c0780b4285a86150e9cefa1385b5acf52dacc8ab7edb0b54.0", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xfc03} --- [ 811.506475] kauditd_printk_skb: 8 callbacks suppressed [ 811.506480] audit: type=1701 audit(1601424807.528:1109): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=3696 comm="ostree-prepare-" exe="/usr/lib/ostree/ostree-prepare-root" sig=11 res=1 +++ killed by SIGSEGV (core dumped) +++ [ 811.538309] audit: type=1701 audit(1601424807.560:1110): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=3693 comm="strace" exe="/tmp/boot/strace" sig=11 res=1 Segmentation fault (core dumped)
Created attachment 1721875 [details] full strace output
Hmm. Most useful here would be a core dump. Use `ulimit -c unlimited` to start should work. In order to get content out of the initramfs, you can try to bring up IP networking, or one trick I've used in the past is 9p mounts. For example with coreos-assembler you can just: cosa run --bind-rw .,/run/workdir And then in the initramfs manually mount it; see this bit of code for an example https://github.com/coreos/coreos-assembler/blob/6e4108c1a984513f9adbebb95eb5564b31ca86a5/mantle/platform/conf/conf.go#L1268
OK noticed this at least: https://github.com/ostreedev/ostree/pull/2219 Though I'm not entirely convinced that's the root cause. Using %u instead of %llu would only cause print to read part of the integer, not more of it.
FEDORA-2020-52d6060686 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-52d6060686
Proposed as a Freeze Exception for 33-final by Fedora user pbrobinson using the blocker tracking app because: FIxes and issue on ARMv7 IoT
to be clear, is the bug here "it'll always completely fail to boot on 32-bit ARM"? Or is it more specific than that?
(In reply to Adam Williamson from comment #14) > to be clear, is the bug here "it'll always completely fail to boot on 32-bit > ARM"? Or is it more specific than that? No, nothing more specific, it fails to boot to login on ARMv7
FEDORA-2020-52d6060686 has been pushed to the Fedora 33 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-52d6060686` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-52d6060686 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
We have +3 FE votes in https://pagure.io/fedora-qa/blocker-review/issue/180 , accepting.
FEDORA-2020-52d6060686 has been pushed to the Fedora 33 stable repository. If problem still persists, please make note of it in this bug report.
Update was not set as closing the bug; is it fixed now, can we close it? Or is more required?
(In reply to Adam Williamson from comment #19) > Update was not set as closing the bug; is it fixed now, can we close it? Or > is more required? That was a mistake on my part. Yes, this issue is closed now. Not sure if CURRENTRELEASE or NEXTRELEASE is more appropriate. Going with the former since it's fixed in the same release this bug is filed against.
buhbuhhhhhhh the correct answer is "ERRATA". :)