Bug 2000247 - Problems booting off new partition after copying ext4 root FS to Stratis filesystem
Summary: Problems booting off new partition after copying ext4 root FS to Stratis file...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: stratisd
Version: 34
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: John Baublitz
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-01 16:32 UTC by Dennis Keefe
Modified: 2021-09-16 14:48 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2021-09-16 14:48:02 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Dennis Keefe 2021-09-01 16:32:06 UTC
Problem taken from https://bugzilla.redhat.com/show_bug.cgi?id=1970600, comment 4

"I've tried copying (cp -a) a Fedora 34 ext4 root fs to a stratis filesystem, using a separate ext4 boot partition. With stratis* installed and stratisd enabled, I then ran dracut -a stratis to update the /boot/initramfs and tried to boot following the hints (and flags) at https://stratis-storage.github.io/stratis-rootfs/, but the boot failed with a "Failed to start D-Bus System Message Bus" error. The new initramfs boots the old ext4 root partition just fine."

Comment 1 John Baublitz 2021-09-01 17:06:32 UTC
Hi Ed!
I'm assuming from your bug report that this is failing in systemd after exiting the initramfs (this would be indicated in journald by the D-Bus service failure log occurring after the log message indicating exiting the initramfs). Can you please confirm that this is the case and also provide the output of the D-Bus service failure from journald?

If this is the case, this is likely an issue with the process of copying over the previous install to the new install. If it has indeed exited the initramfs (which, if it is trying to start D-Bus, is likely given that D-Bus in not included in the initramfs), this means that the root filesystem is correctly being set up, or it would not be able to access the D-Bus executable on your root filesystem to try to start it. I've had problems copying an entire install from one filesystem to another in certain cases, and I will try to reproduce this behavior by installing into an ext4 filesystem and copying it over with the same flags. So far, I've been installing into a Stratis filesystem using a non-standard Anaconda process, but we would like to provide instructions for copying over from another filesystem as well.

Thanks!

Comment 2 ed leaver 2021-09-01 20:42:04 UTC
Hi John

Thanks for the quick response. I'll try to provide more details
1. NVME drive, UEFI, GPT
   * 1   GB fat32 EFI
   * 2   GB ext4  boot
   * 1.3 TB /dev/stratis/stratis_b1 pool
            /dev/stratis/stratis_b1/Fedora  desired target / rootfs
            /dev/stratis/stratis_b1/Home    desired /home  (this is actually populated with some 300 GB from my other machine)
   * 16  GB free to enevtually be used for a stratis cache for a rotating HDD
   * 64  GB linux swap partition
   * 400 GB LVM
            Fedora 64 GB thin-provisioned ext4 source / rootfs
            (/dev/disk/by-id/dm-name-fedora34-root)

The / partition under LVM installed fine from Fedora 34 Live Workstation USB, although iirc I needed delete my (empty) old LVM volume and let Anaconda recreate it from free space. I used the F34-WORK-x86_64-LIVE-20210816.iso re-spin. To fstab add /dev/stratis/stratis_b1/Home /home xfs  defaults,x-systemd.requires=stratis-fstab-setup@[POOL_UUID],x-systemd.after=stratis-fstab-setup@[POOL_UUID] 0 2 as directed in above link. This part works fine.

The EFI, boot, swap, and stratis filesystems were created without issue from the 34 Live USB after "dnf install parted stratis*" and "systemctl enable stratisd --now"

2. Here there be weeds: I boot into the livecd, install and enable stratis, mkdir some mount points under /mnt, and
   * mnt /dev/stratis/stratis_b1/Fedora /mnt/FedoraStratis
   * mnt /dev/stratis/stratis_b1/Fedora /mnt/FedoraLVM
   * mnt /dev/disk/by-label/Boot        /mnt/Boot
   * cp -a /mnt/FedoraLVM/* /mnt/FedoraStratis
   * change /mnt/Fedora/Stratis/etc/machine-id to <new id>
   * cd /boot/loader/entries
        cp <old id>-5.13.12-200.fc34.x86_64.conf <new id>-5.13.12-200.fc34.x86_64.conf
        edit <new id>-5.13.12-200.fc34.x86_64.conf. Its new contents are

title Fedora (5.13.12-200.fc34.x86_64) 34 (Stratis Edition)
version 5.13.12-200.fc34.x86_64
linux /vmlinuz-5.13.12-200.fc34.x86_64
initrd /initramfs-5.13.12-200.fc34.x86_64.img
options root=/dev/stratis/stratis_b1/Fedora ro resume=UUID=ede1d432-feba-4a0f-8d8f-f6b27c637270 stratis.rootfs.pool_uuid=d2d74ead-14e6-4718-8fb8-4800d3870a9a rhgb quiet rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1
grub_users $grub_users
grub_arg --unrestricted
grub_class kernel

(The resume=UUID looks a bit shaky, as the one given is for the swap partition. But it was given by Anaconda, so it must be right...)

3. Last, I boot into working FedoraLVM with the same /vmlinuz-5.13.12-200.fc34.x86_64, 
   * stratis fs list to make sure stratisd is running
   * cd /boot; cp -p  initramfs-5.13.12-200.fc34.x86_64.img initramfs-5.13.12-200.fc34.x86_64.img.bak; dracut -a stratis --force

Rebooting into the new "Stratis Edition" entry fails with:
  [FAILED] Failed to start Virtual Maintainer Registration Service
  [FAILED] Failed to start D-Bus System Message Bus

...and after removing "rhgb quiet" I also get (among other stuff):
Failed to start D-Bus System Message Bus
Starting D-Bus System Message Bus
dbus-broker.service: Main process exited, code=exited, status=127/n/a
dbus-broker.service: Failed with result 'exit-code'
Failed to start D-Bus System Message Bus
dbus-broker.service: Start request repeated too quickly
dbus-broker.service: Failed with result 'exit-code'
dbus.socket: Failed with result 'service-start-limit-hit'

...and eventually grinds to a halt. I'll reboot into "LVM Edition" and see if I can't find more detailed info in the Stratis root folder's /var/messages. Meantime any suggestion you might have for copying the LVM/ext4 root partition to stratis, or boot flags in /boot/loader/entries, would be appreciated. 

The nvidia drivers are from rpmfusion-nonfree, but as I recall I saw the same problem without them.  If you'd like, I can uninstall nvidia and work from nouveau, which is fine for these purposes.

Comment 3 ed leaver 2021-09-01 21:01:13 UTC
Oh yeah. I changed the rootfs entry in the Stratis Edition's /etc/fstab:

/dev/stratis/stratis_b1/Fedora    /   xfs   defaults 0 1


-Ed

Comment 4 John Baublitz 2021-09-07 14:44:26 UTC
Hi Ed!
I apologize for the delay. What I'm really looking for is when the boot drops into the emergency console, can you use journalctl to determine why D-Bus is failing to start? An error message would be helpful here to determine exactly why it's failing. journalctl -xe should give you error messages for the failed services and just search for why D-Bus failed. Furthermore, can you confirm that the log message in journalctl that reads "Starting D-Bus System Message Bus" happens after the log message "Stopped target Initrd Root File System"?

What I'm really trying to determine here is two things:
1. That this error is occurring in early boot but after the initramfs exits
2. What the cause of the D-Bus failure is so I can potentially help you resolve it if it is an issue with the filesystem itself rather than the initramfs

Filesystem configuration issues do not have to do with our work on supporting set up of Stratis pools during boot, but rather they may have to do with the process of setting up the root filesystem prior to attempting to boot from it so I may be able to help you resolve it with no code changes. Thanks so much!

Comment 5 ed leaver 2021-09-07 23:10:16 UTC
No problem John: my LVM rootfs works fine so I've plenty to do. And I do very much appreciate your help. As for the Stratis rootfs, well... I'm really looking for when the boot drops into the emergency console as well. But it never does. Just trundles along for 3410-odd seconds and then just kinda grinds to a halt.

Booting into the working LVM rootfs, mounting the Stratis rootfs under /mnt/FedoraStratis, and doing "journalctl --root=/mnt/FedoraStratis -xe" didn't help either: the only entries journalctl could recover were from the initial installation, dnf update, and nvidia-drv install from Aug 29, back when the partition still lived under LVM and before I copied it to /mnt/FedoraStratis. (Using cp -a from the fc34 live usb so as not to be copying an active rootfs.)

Today I thought I'd give runlevel=3 a try, see if FedoraStratis would boot to there.

Nope, same problems. systemd.unit=single-user.target didn't boot either, even to an emergency console. 

But single-user.target did pause early on just long enough for me to note:

[8.892] systemd[1405]: /usr/lib/systemd/system-generators/systemd-gpt-auto-generator failed with exit status 127
[8.893] systemd[1405]: /usr/lib/systemd/system-generators/stratis-setup-generator failed with exit status 1
[8.961] systemd[1]: stratisd.service: Main process exited code=exited status=1/FAILURE
[8.963] systemd[1]: Failed to start Stratis daemon

...so *if* one were to ignore that first systemd-gpt-auto-generator failure at [8.892], one could conclusively conclude the problem is with Stratis.

But I doubt you are one to ignore a systemd-gpt-auto-generator failure. Not even sure *I* am.

So although it seems to boot my LVM rootfs just fine, I wouldn't rule out an initramfs issue, at least not quite just yet. Perhaps stratis is asking systemd for partition information that LVM gets otherwise?  Your suggestions are welcome!

Comment 6 John Baublitz 2021-09-08 13:54:40 UTC
Hi Ed!
That is problematic, but also indicates that this does not have to do with the initramfs, which is ultimately what I'm getting at. For stratisd.service to even be invoked, we have long since exited the initramfs.

Just to explain in a bit more depth, the stratis-setup-generator failure that you're noting should be fixed in a forthcoming release, but the reason for this is that stratis-setup-generator is for use in the initramfs. To create the dracut initramfs, we also need this generator to be present on the host system to be copied into the initramfs. This may be able to be solved using packaging by including the binary in our stratisd-dracut package so it gets installed into the dracut directories and no longer needs to be present in the system generator directory. This would mean that after the initramfs is done, this generator would not be invoked a second time where it is guaranteed to fail. However, I need to discuss this with the team first and see how we all want to proceed. All that said, the failure you're seeing is a known problem that I intend to fix that does not affect booting at all because, if you're seeing that error message with stratisd.service next to it in the log messages, we have already moved past the point where that generator was needed, and it has completed successfully.

For now, I'm going to try to reproduce this behavior in a VM, and I'll get back to you. I'm going to do a stock LVM/ext4 install using Anaconda, copy it over to a Stratis filesystem with the command you highlighted above, and attempt to boot. My guess is that I will be able to reproduce it given my own problems getting a Stratis filesystem to boot when copying the operating system from one filesystem to another. I will do my best to come up with some commands to fix this once I reproduce this issue. Thanks so much for the information you've been able to give me so far, and I'll do my best to figure this out!

Comment 7 John Baublitz 2021-09-08 17:29:27 UTC
Hi Ed!
Just wanted to let you know that I've reproduced the issue. I was able to fix it, and as I suspected, it appears to be an SELinux issue. Running restorecon -R / in a chroot and then boot into the operating system fixed it for me. Can you try that?

Comment 8 ed leaver 2021-09-09 14:44:26 UTC
Thanks John. I tried your restorecon suggestion, to no effect. Restorecon exit 0 but reported no changes. In a futile effort to simplify, I next made a fresh fc34 install from live usb on vanilla ext4 partition FEDORA_A1 on a different disk (/dev/sda) with different boot partition BOOT_A1. (No nvidia drivers on /dev/sda -- this is my internal backup drive.)

1. Then copied the new boot/loader/entry to the original /boot partition BOOT_B1. Ran dracut -a stratis in original /boot BOOT_B1. The new FEDORA_A1 boots fine from either location.

2. Then from original FedoraLVM, cp -a new FEDORA_A1 to another new vanilla /dev/sda4 ext4 partition FEDORA_A2.
   a. From original FedoraLVM, mount /dev/disk/by-label/FEDORA_A2 /mnt/FedoraA2; systemd-firstboot --setup-machine-id --root=/mnt/FedoraA2 --force
   b. make appropriate edits to /mnt/FedoraA2/etc/fstab
   c. make a new entry with the new machine-id for FedoraA2 in original BOOT_B1/loader/entries
   d. re-run dracut -a stratis on BOOT_B1

3. I *think* I've got all the partition UUID's set correctly in loader/entries<new entry> and fstab. But I must be missing something pretty basic, as the new FEDORA_A2 will not boot. Neither Stratis nor LVM is involved. This is about as vanilla as it gets.

4. When trying to boot FEDORA_A2 to systemd.unit=single-user.target (no graphical boot) everything looks good (i.e. no errors) until
   - Reached target Switch Root
   - Finished Plymouth switch root service
   - [FAILED] Failed to start Switch Root
     see 'systemctl status initrd-switch-root.service' for details

At which point the boot stops, again no emergency console, and no journalctl entry.

5. Color me dumb, but I'm sure missing something. Only thing I haven't done is eliminate -a stratis from the dracut command, but the new initramrd's seem to work fine on the Anaconda-installed FedoraLVM and FedoraA1 partitions with -a stratis, so I doubt that's a problem. I'll re-re-re-recheck my partition UUIDs later, but for now I'm stumped. This is all EFI. Is there anything grub needs to know about?

Thanks for your help!

Comment 9 John Baublitz 2021-09-14 17:48:35 UTC
Hi Ed!
Do you still have your original set up or are you able to reproduce it? The reason I ask is that I reproduced the issue that you described in the original set up exactly and restorecon -R / did resolve the issue for me.

Could you use the original set up and:
* give me all of the commands that you used to create your chroot
* try adding selinux=0 to the kernel command line and see if that resolves your issue
* if this does resolve your issue, remove selinux=0 add autorelabel=1 to the kernel command line and see if this permanently resolves your issue after you remove it again

This does ultimately seem like an SELinux issue to me for a number of reasons, and so I'd like to try to resolve it a few different ways just to ensure that it is not that prior to trying other debugging steps.

Thanks so much!

Comment 10 ed leaver 2021-09-14 20:33:39 UTC
Hi John, and thanks for getting back.

I copied my stratis rootfs from a working LVM installation as described above in https://bugzilla.redhat.com/show_bug.cgi?id=2000247#c2

For the chroot, from my LVM fc34 install (the one that works) I
1. sudo mount /dev/stratis/stratis_b/Fedora /mnt/FedoraStratis (this is the rootfs)
2. sudo su
3. chroot /mnt/FedoraStratis
4. restorecon -rp /

As described above, restorecon immediately exited (exit 0) and did not report any changes.

However, your latest selinux=0 boot flag worked fine. Following up with autorelabel=1 (without selinux=0) seemed to fix the problem permanently...

...until I broke it again by booting my newer ext4 test partition FEDORA_A2 with both selinux=0 and autorelabel=1. Which fixed FEDORA_A2, but broke /dev/stratis/stratis_b/Fedora oh well.

But another autorelabel=1 refixed /dev/stratis/stratis_b/Fedora, and I've now got four bootable fc34 partitions on two different disks.

I'm still writing this from the LVM partition; I'll dnf update the stratis rootfs later and hope to use it permanently. (Unless I need the LVM space for VMs, I'll keep the LVM fc34 install at least until next summer "just to be safe")

I think this ticket can be closed. Thanks again for all your help!

Comment 11 John Baublitz 2021-09-16 14:48:02 UTC
Hi Ed!
Okay, that's what I suspected. Just for future reference, I'm not entirely sure if this is the reason the chroot restorecon didn't work, but I typically run the following commands before the chroot command just to ensure that utilities operate correctly:
# mount -t proc proc /path/to/chroot/proc
# mount --rbind /dev /path/to/chroot/dev
# mount --rbind /sys /path/to/chroot/sys
# mount --rbind /run /path/to/chroot/run

I did these additional steps and restorecon -R / fixed my issue so my guess is that these additional mounts in the chroot are required for restorecon to operate properly.

Regardless, I'm happy the additional steps resolved the issue! I'll close this out.


Note You need to log in before you can comment on or make changes to this bug.