Bug 2403249 - Anaconda fails to add root= parameter for MD RAID root devices, causing boot failure after kernel upgrades
Summary: Anaconda fails to add root= parameter for MD RAID root devices, causing boot ...
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: anaconda
Version: 42
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: anaconda-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2025-10-11 02:09 UTC by Raymond Johnson
Modified: 2025-10-11 02:35 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)
Complete technical analysis with source code evidence, exact file locations, line numbers, and proposed fix (10.72 KB, text/plain)
2025-10-11 02:09 UTC, Raymond Johnson
no flags Details

Description Raymond Johnson 2025-10-11 02:09:14 UTC
Created attachment 2109338 [details]
Complete technical analysis with source code evidence, exact file locations, line numbers, and proposed fix

## Summary

Anaconda installer does not add `root=` boot parameter to `/etc/default/grub` when installing on MD RAID root devices (RAID0, RAID1, etc.), causing new kernel installations to generate broken boot entries that fail to boot.

## Severity

High - System fails to boot after kernel upgrades, requires emergency console recovery

## Affected Configurations

- RAID Levels: RAID0, RAID1, RAID4, RAID5, RAID6, RAID10 (any MD RAID used as root)
- Distribution: Fedora 42, Nobara 42 (likely affects all recent Fedora/RHEL derivatives)
- Bootloader: GRUB2 with BLS (Boot Loader Specification)

## Steps to Reproduce

1. Install Fedora/Nobara with MD RAID (mdadm) as root filesystem
   - Example: 4x NVMe drives in RAID0 array (md0) mounted as `/`
2. Complete installation successfully
3. Boot system normally (initial boot works)
4. Install/upgrade kernel via DNF: `dnf upgrade kernel`
5. Reboot and select new kernel from GRUB menu
6. Result: System fails to boot, drops to emergency console

## Root Cause - THE SMOKING GUN

Found in source code at three locations:

### 1. blivet/devices/md.py (line 688-689)
```python
def dracut_setup_args(self):
    return set(["rd.md.uuid=%s" % self.mdadm_format_uuid])
```

This method ONLY returns RAID UUID. It does NOT return root=UUID= or ro.

### 2. pyanaconda/modules/storage/bootloader/base.py (line 826-881)
Calls dracut_setup_args() for RAID device, which only returns rd.md.uuid.
NEVER explicitly adds root=UUID= for the root filesystem.

### 3. pyanaconda/modules/storage/bootloader/grub2.py (line 269)
```python
defaults.write("GRUB_CMDLINE_LINUX=\"%s\"\n" % self.boot_args)
```
Writes boot_args to /etc/default/grub, missing root= parameter.

## Proposed Fix

In pyanaconda/modules/storage/bootloader/base.py, method _set_storage_boot_args(), after line 826:

```python
if storage.root_device:
    root_spec = storage.root_device.fstab_spec
    if root_spec:
        self.boot_args.add("root=%s" % root_spec)
        self.boot_args.add("ro")
```

This is a 10-line fix that works for all storage configurations.

## Evidence from Production System

- Distribution: Nobara 42 (Fedora 42-based)
- Root Device: /dev/md0 (RAID0, 4x NVMe drives)
- Root UUID: 957d9b7f-c9b1-42f7-b4a4-b1b050d622a5
- RAID UUID: ecf1873b:08dfd5d7:7110c8fd:bdb2786e

Generated boot entry (broken):
```
options rd.auto=1 rd.md=1 rd.md.uuid=ecf1873b:08dfd5d7:7110c8fd:bdb2786e ...
```
Missing: root=UUID=957d9b7f... ro

Result: Kernel panic - VFS: Unable to mount root fs

## Why This Matters

1. RAID1 is recommended for production servers (redundancy)
2. RAID0 is used for high-performance workstations (speed)
3. RAID5/6 common for NAS and storage servers
4. These are legitimate, supported configurations, not edge cases

## Not the GRUB2 Probing Bug (BZ #1443144)

This is DISTINCT from the known GRUB2 mdraid probing issue:
- Initial installation succeeds, first kernel boots
- New kernels fail after DNF upgrade
- Fixing /etc/default/grub resolves the issue

Full technical analysis available on request.

Comment 1 Raymond Johnson 2025-10-11 02:11:56 UTC
## CLARIFICATION: Initial Installation Also Fails to Boot

Important timeline correction: The bug occurs during **initial Anaconda installation**, not just kernel upgrades.

### Actual User Experience:

1. Fresh Nobara 42 installation with RAID0 root (4x NVMe drives as md0)
2. Installation completes successfully
3. **FIRST REBOOT AFTER INSTALLATION FAILS** - drops to emergency console
4. Manual fix required to make system bootable
5. Every subsequent kernel upgrade also fails (same missing root= issue)

### Why This Matters:

This proves Anaconda writes the broken `/etc/default/grub` **during initial installation**, not just during kernel upgrades. The system is literally **unusable right after installation** without manual emergency console intervention.

### Severity Increase:

- User cannot boot freshly installed system
- Requires emergency console knowledge to fix
- Affects all fresh RAID root installations
- Not limited to kernel upgrade scenario

The proposed fix in `pyanaconda/modules/storage/bootloader/base.py` will prevent this from happening during initial installation AND during subsequent kernel upgrades.

**This is a day-one installation bug, not just an upgrade bug.**

Comment 2 Raymond Johnson 2025-10-11 02:21:47 UTC
## ADDITIONAL CONTEXT: Anaconda RAID Installer Completely Non-Functional

### User's Actual Experience (Even Worse Than Initially Reported):

**Attempt 1:** Anaconda RAID0 installation → **BOOT FAILURE**

**Attempt 2:** Wiped disks, Anaconda RAID0 installation again → **BOOT FAILURE AGAIN**

**Conclusion:** Anaconda's RAID installation feature is completely broken, not just missing a parameter.

### User's Extreme Workaround (Required AI Assistance):

Since Anaconda's RAID installer failed twice, user had to manually work around it:

1. Normal Anaconda install to single nvme0n1 (64GB root partition) - this worked
2. Boot to desktop, complete updates
3. Create nvme0n1p4 partition manually for RAID
4. Clone partition to nvme1, nvme2, nvme3
5. Build RAID0 array manually with mdadm
6. Use FSTAB to remap OS folders to RAID volume
7. rsync entire root partition (nvme0n1p3) to /dev/md0
8. **MANUALLY add correct boot parameters** (root=UUID= and rd.md.uuid=) to bootloader
9. Delete original root partition
10. Successfully boot from RAID0

### Why This Matters:

- **Anaconda's RAID installer is unusable** - failed 2/2 attempts
- User required **AI assistance** to implement complex manual workaround
- Average user would **abandon RAID entirely** or abandon Fedora/Nobara
- This is not a "minor bug" - **core installer functionality is broken**

**The proposed fix is critical** - without it, RAID root installations are impossible through Anaconda.


Note You need to log in before you can comment on or make changes to this bug.