Bug 2232406

Summary: coreos-installer-growfs.service fails to start Grow root filesystem in Onlogic HX401 bare-metal hardware
Product: Red Hat Enterprise Linux 9 Reporter: Mario Cattamo <mcattamo>
Component: rust-coreos-installerAssignee: RHCOS SST <rhcos-sst>
Status: NEW --- QA Contact: RHCOS SST QE <rhcos-sst-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 9.3CC: amurdaca, miabbott, perobins, qzhang, xiaofwan, yih
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mario Cattamo 2023-08-16 16:50:35 UTC
Description of problem:
After generating an UEFI bootable disk with osbuild-composer edge-simplified-installer, the os installation fails (one time failure).

After coreos-installer-service disk reading finishes, the service coreos-installer-growfs.service shows the following failure message:

[FAILED] Failed to start Grow root filesystem.
See 'systemctl status coreos-installer-growfs.service' for details.
[DEPEND] Dependency failed for Initrd Default Target.

This seems to be a one time failure, that means, after rebooting the system, the failure message does not appear again and the installation seems to be ok.


Version-Release number of selected component (if applicable):
To generate UEFI bootable disk (edge-simplified-installer)
osbuild-composer-87-1.el9.x86_64
osbuild-91-1.el9.noarch
weldr-client-35.9-1.el9.x86_64

Bare Metal Hardware details:
Onlogic HX401

How reproducible:
100%

Steps to Reproduce:
1. Deploy RHEL-9.3 Openstack vm 
2. git clone https://github.com/virt-s1/rhel-edge.git
3. Edit simplified blueprint to specify proper installation device
$ vim ostree-simplified-installer.sh
  [customizations]
  - installation_device = "/dev/vda"
  + installation_device = "/dev/nvme0n1"
4. Generate iso UEFI bootable disk 
$ ./ostree-simplified-installer.sh
5. Write iso to bootable usb device
$ sudo dd if=123456-simplified-installer.iso of=/dev/sdx bs=1M status=progress
6. Boot bare-metal hardware from usb device

Actual results:
Installation failing one time.

Expected results:
Installation ok

Additional info:
The experiment was repeat on another bare-metal hardware (Onlogic Karbon K410), that had /dev/sda as installation device name. The boot process and installation was successful.
This experiment is part of the verification of bug https://bugzilla.redhat.com/show_bug.cgi?id=2177645

Comment 1 Xiaofeng Wang 2023-08-17 01:44:15 UTC
This should be a bug, because the nvme has different partition name, like nvme0n1p[0-9], compared with /dev/sda[0-9].
https://github.com/coreos/coreos-installer-dracut/blob/e3568c4be6b1db019b792fcd08323435b185c27a/dracut/scripts/coreos-installer-growfs#L30C24-L30C30 does not work with nvme disk partition.
Here's example of nvme partition:
➜ cat /proc/partitions
major minor  #blocks  name

 259        0 1000204632 nvme0n1
 259        1     524288 nvme0n1p1
 259        2    1048576 nvme0n1p2
 259        3  780519424 nvme0n1p3
 259        4  209715200 nvme0n1p4
 259        5    8388608 nvme0n1p5
 252        0    8388608 zram0

Comment 2 Micah Abbott 2023-08-22 14:56:29 UTC
As noted by @perobins in the Dev/Doc/QE sync meeting, we will encounter a similar error when using storage devices based on SD cards:

```
pi@pihole:~ $ cat /proc/partitions 
major minor  #blocks  name

   1        0       4096 ram0
   1        1       4096 ram1
   1        2       4096 ram2
   1        3       4096 ram3
   1        4       4096 ram4
   1        5       4096 ram5
   1        6       4096 ram6
   1        7       4096 ram7
   1        8       4096 ram8
   1        9       4096 ram9
   1       10       4096 ram10
   1       11       4096 ram11
   1       12       4096 ram12
   1       13       4096 ram13
   1       14       4096 ram14
   1       15       4096 ram15
 179        0   31166976 mmcblk0
 179        1     262144 mmcblk0p1
 179        2   30900736 mmcblk0p2
```

We need to make the `coreos-installer-growfs` script more robust to handle the different ways partitions are represented.  Perhaps by using one of the `/dev/disk/by-*` identifiers?

Comment 3 Mario Cattamo 2023-08-22 16:47:32 UTC
The output of lsblk command is showing that the size of partition nvme0n1p4 is 9.5G.

[admin@localhost ~]$ lsblk
NAME                                          MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
nvme0n1                                       259:0    0 119.2G  0 disk  
├─nvme0n1p1                                   259:1    0     1M  0 part  
├─nvme0n1p2                                   259:2    0   127M  0 part  /boot/efi
├─nvme0n1p3                                   259:3    0   384M  0 part  /boot
└─nvme0n1p4                                   259:4    0   9.5G  0 part
  └─luks-94da2594-8d33-4aba-b3d4-416a098861e6 253:0    0   9.5G  0 crypt
    └─rootvg-rootlv                           253:1    0     9G  0 lvm   /var
                                                                         /sysroot/ostree/deploy/redhat/var
                                                                         /usr
                                                                         /etc
                                                                         /
                                                                         /sysroot

Comparing that size with the command output of a successful installation generated by edge-installer, seems the partition above should be much bigger. Just like @perobins mentioned in the Dev/Doc/QE sync meeting.
[admin@localhost ~]$ lsblk
NAME          MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
nvme0n1       259:0    0 119.2G  0 disk  
├─nvme0n1p1   259:1    0   600M  0 part  /boot/efi
├─nvme0n1p2   259:2    0     1G  0 part  /boot
└─nvme0n1p3   259:3    0 117.7G  0 part  
  ├─rhel-root 253:0    0    70G  0 lvm   /var
  |                                      /usr
  |                                      /
  |                                      /sysroot
  ├─rhel-swap 253:1    0   7.6G  0 lvm   [SWAP]
  └─rhel-home 253:2    0  40.1G  0 lvm   /var/home