Bug 1925973 - Improve help, documentation to fix root disk /dev/... does not exist (emergency mode after copying sata root device to nvme root device)
Summary: Improve help, documentation to fix root disk /dev/... does not exist (emergen...
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: grub2
Version: 34
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Peter Jones
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-07 20:24 UTC by Basic Six
Modified: 2021-02-09 16:05 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: ---
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug


Attachments (Terms of Use)

Description Basic Six 2021-02-07 20:24:18 UTC
Description of problem:

After replacing the old SATA root device with an NVME SSD device and cloning it 1:1 (see below), the Linux installation won't boot anymore. The Grub boot loader works fine but then, it loads an emergency shell because it can't find the root device containing the OS installation, i.e., the new NVME SSD. It seems like the NVME driver is missing from the default boot image and I wish it was included by default. But other than that, this is mainly a request for documentation and for a repair tool as I was unable to find a solution in the official documentation.

It would look like this:

_________________ O/_________________________________________
                  O\

Reached target Paths.
dracut-initqueue: Warning: dracut-initqueue timeout - starting timeout scripts
dracut-initqueue: Warning: dracut-initqueue timeout - starting timeout scripts
dracut-initqueue: Warning: dracut-initqueue timeout - starting timeout scripts
(last message filled the screen)
dracut-initqueue: Warning: Could not boot.
Starting Setup Virtual Console...
Started Setup Virtual Console.
Starting Dracut Emergency Shell...
Warning: /dev/disk/by-uuid/... does not exist

Entering emergency mode. ...

Give root passwort for maintenance ...

# lsblk
sh: lsblk: command not found
# ls /dev/nvme*
ls: cannot access '/dev/nvme*': No such file or directory

_________________ O/_________________________________________
                  O\

Here's a brief description of the problem and how it could be fixed.

The old root device was a SATA SSD with 3 partitions: 1: fs = ext4, mountpoint = /boot, size = 1 GB; 2: fs = btrfs, mountpoint = /; 3: fs = swap, size = 8 GB

First, a 1:1 image copy was made:
- From a Fedora Live disk (Fedora install ISO copied to USB stick), mount an external USB hard disk (/dev/sdb) and make a 1:1 image of the full root device, identified as /dev/sda:
  # cd /mnt; mkdir -m 0 USB; mount /dev/sdb1 USB; dd bs=4M if=/dev/sda of=USB/old_root_$(date +%F).dd.img
- Replace the root device, boot into a live system again, mount the external hdd (now /dev/sda) and restore the image onto the new root device, identified as /dev/nvme0n1:
  # cd /mnt; mkdir -m 0 USB; mount -o ro /dev/sda1 USB; dd bs=4M if=USB/old_root_$(date +%F).dd.img of=/dev/nvme0n1
- Use parted (or gparted from a graphical live system) to grow and/or move partitions as required.

This would have worked if the device type hadn't changed but it has changed. Trying to reboot now loads an emergency shell as shown above. From that shell, no NVME device can be found which indicates that it's missing a driver for that NVME device.

The path to the partition is located in three places:
- grep ' / ' /etc/fstab:
  UUID=...7
  It doesn't need to be changed because the UUID stays the same when making a 1:1 copy.
- /boot/grub2/grub.cfg also contains the UUID path: root=UUID=...7
- In /etc/default/grub, GRUB_CMDLINE_LINUX contained "resume=/dev/sda3" which was added manually because it wasn't added by the Fedora installer, so it wasn't possible to hibernate the system in the default configuration. Although /dev/sda3 (swap partition, not smaller than ram) wasn't fixed before creating the 1:1 image, that's not why it won't boot and it can be fixed later.

So the question is how to repair the boot configuration to make the system recognize the new NVME device and boot from it. Here's the official documentation I found when searching for "fedora documentation update grub initramfs boot image with dracut from chroot":
https://docs.fedoraproject.org/en-US/quick-docs/bootloading-with-grub2/
I may have missed something but I couldn't find "dracut" anywhere on the page. It explains how to generate the Grub config file (grub2-mkconfig -o /boot/grub2/grub.cfg) but that file isn't the problem as Grub itself works fine.

Further down the page, there's an explanation how to mount a Fedora installation from a live system:
https://docs.fedoraproject.org/en-US/quick-docs/bootloading-with-grub2/#restoring-bootloader-using-live-disk

In this case, the OS installation was placed in a subvolume "root" (by the Fedora installer), so:

# cd /mnt; mkdir -m 0 ROOT
# mount -o subvol=root /dev/nvme0n1p2 ROOT
# for i in proc sys dev run; mount --bind /$i ROOT/$i; done
# mount /dev/nvme0n1p1 ROOT/boot/
# chroot ROOT/

There should probably be a helper script for that on the official live system, but more importantly, there should be one for recreating the boot loader or rather the initramfs boot image to include missing drivers. If such a helper already exists, it should be explained in the documentation in a way so that a someone who's in a hurry to fix a boot loader can find it. After looking for config files and more information online, this was one attempt:

# echo 'add_drives+=" nvme "' >/etc/dracut.conf.d/nvme.conf
# dracut -f

Which failed:

dracut: Cannot find module directory /lib/modules/5.8.15-301.fc33.x86-64/

Please note that when having to restore a computer that won't boot anymore, you may sometimes forget to use a specific option that may be necessary for some reason, especially if you have to guess your way around not being able to find a solution in the official documentation. Again, the idea was to recreate the initramfs image to include the nvme driver but as shown in the error message, it doesn't create one for the latest installed kernel (as claimed in certain inofficial online guides) but instead, it looks for module files of the currently running kernel, which is newer because it's the latest Fedora live system. The latest installed one would've been:

# ls -rth /boot/init* | tail -n1
/boot/initramfs-5.5.13-200.fc31.x86_64.img

This behavior is also explained on the manpage of dracut but as most manpages, it also explains lots of other stuff and at first glance, it seemed like the latest installed kernel version has to be supplied as argument. It takes some reading to get to the point where it says "A shortcut to generate the image at the default location", "dracut --kver 2.6.40-1.rc5.f20" so the required option would be --kver. However, it doesn't work as shown in that example as it's missing the ".x86-64" suffix.

To actually make dracut recreate the imitramfs boot image, this command finally worked:

# dracut -f --no-hostonly --add-drivers nvme --kver 5.5.13-200.fc31.x86-64

It recreated the boot image file /boot/initramfs-5.5.13-200.fc31.x86_64.img and with it, the system was again able to boot normally.

This would be a very simple procedure which should be explained in the official documentation and unlike the dracut manpage, all non-relevant info should be at the end, under the typical recovery steps. Because on that manpage, you first have to read things like this before --kver is mentioned:

       If you are dropped to an emergency shell, while booting your initramfs, the file /run/initramfs/rdsosreport.txt is created, which can be saved to a (to be mounted by hand) partition (usually /boot) or a USB stick. Additional
       debugging info can be produced by adding rd.debug to the kernel command line. /run/initramfs/rdsosreport.txt contains all logs and the output of some tools. It should be attached to any report about dracut problems.



Version-Release number of selected component (if applicable):


How reproducible:

Boot failure happens after cloning a root device to one of another type which requires an additional driver in the initrd/initramfs boot image.



Actual results:

The nvme driver is missing by default, so cloning the root device is not enough. But then, the official documentation does not seem to explain how to recover, how to add the missing driver to the boot image.



Expected results:

In addition to a documented solution for such a scenario (on a default installation, so no custom paths or anything), an official helper tool to fix or recreate the boot config + image from an official Fedora live system would be extremely helpful.

Comment 1 Ben Cotton 2021-02-09 16:05:05 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 34 development cycle.
Changing version to 34.


Note You need to log in before you can comment on or make changes to this bug.