Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1949413

Summary: Automatic boot order setting is done incorrectly when using by-path style device names
Product: OpenShift Container Platform Reporter: Udi Kalifon <ukalifon>
Component: assisted-installerAssignee: Eran Cohen <ercohen>
assisted-installer sub component: assisted-service QA Contact: Yuri Obshansky <yobshans>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: alazar, aos-bugs, lgamliel, otuchfel, yobshans
Version: 4.8   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: AI-Team-Core
Fixed In Version: OCP-Metal-v1.0.19.1 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 23:00:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Udi Kalifon 2021-04-14 08:44:48 UTC
Description of problem:
The boot is always done by adding "2" to the boot device. For example it creates the string "sda2". See the code here: https://github.com/openshift/assisted-installer/blob/749f7737317f407c97d6ff347c569aac71d224db/src/ops/ops.go#L225-L229

However, when using by-path devices, this logic is incorrect. We see errors in the log:

level=info msg="Done writing image to disk"
level=info msg="Setting efibootmgr to boot from disk"
level=info msg="mount.nfs: Failed to resolve server /dev/disk/by-path/pci-0000: Name or service not known\n"
level=info msg="failed executing nsenter [-t 1 -m -i -- mount /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:02 /mnt], env vars [PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin TERM=xterm USER_UID=0 HTTPS_PROXY= http_proxy= HTTP_PROXY= https_proxy= container=podman no_proxy= NO_PROXY= PULL_SECRET_TOKEN=b3BlbnNoaWZ0LXJlbGVhc2UtZGV2K29jbV9hY2Nlc3NfM2ZlN2U2NTRiN2Q3NDk2ZThmMjA3MTNlMjk2ZTlmZDc6VlA0SzZaNkZWS0lWRVBYQ1Y4VFk3NjFTWjFSSk01Mkk5OEU2UTRYQ1c3WlRaMzZVQ0RSWTlVTkVMMDQ3QUxJVA== HOSTNAME=master-0-1 HOME=/root], error exit status 32, waitStatus 32, Output \"mount.nfs: Failed to resolve server /dev/disk/by-path/pci-0000: Name or service not known\""
level=error msg="Failed to mount device /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:02, err: failed executing nsenter [-t 1 -m -i -- mount /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:02 /mnt], Error exit status 32, LastOutput \"mount.nfs: Failed to resolve server /dev/disk/by-path/pci-0000: Name or service not known\""
level=warning msg="Failed to set boot order" error="failed executing nsenter [-t 1 -m -i -- mount /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:02 /mnt], Error exit status 32, LastOutput \"mount.nfs: Failed to resolve server /dev/disk/by-path/pci-0000: Name or service not known\""


How reproducible:
On some multi-disk hosts


Steps to Reproduce:
1. Create a libvirt environment of 3 masters and 3 workers
2. The workers in my case had 2 disks, sda and sdb, of identical sizes
3. Deploy a cluster


Actual results:
Automatic boot order is not set correctly, and the user has to fix it manually. In my case, I didn't get a chance to fix the boot order manually and the installation failed - we are still investigating if it's due to other errors.


Expected results:
We shouldn't see these errors in the log. The user should not have to fix the boot order manually.

Comment 1 Yuri Obshansky 2021-05-05 13:18:30 UTC
lgamliel
ercohen
Any details about fix. PR?
It moved from New to ON-QA
Thank you

Comment 2 Eran Cohen 2021-05-06 10:58:10 UTC
@yobshans the info you are looking for (fix details and PR) is linked to the jira ticket that match the BZ.
https://github.com/openshift/assisted-installer/pull/265

Comment 3 Udi Kalifon 2021-05-11 06:07:38 UTC
Verified.

Comment 6 errata-xmlrpc 2021-07-27 23:00:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438