Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1860186

Summary: [4.2] - worker RHCOS won't boot after reboot on physical machine
Product: OpenShift Container Platform Reporter: Vladislav Walek <vwalek>
Component: Machine Config OperatorAssignee: Antonio Murdaca <amurdaca>
Status: CLOSED NEXTRELEASE QA Contact: Michael Nguyen <mnguyen>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.2.zCC: bbreard, dornelas, imcleod, jligon, nstielau, smilner, stbenjam
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-07 14:06:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1186913    

Description Vladislav Walek 2020-07-23 22:54:39 UTC
Description of problem:

The RHCOS 4.2 won't boot after reboot when applying changes from API server. 
The scenario is as follows:

- worker system boots installer RHCOS
- the system is installed with ignition config and boots normally - showing login shell
- the worker node pulls configuration from machine server API and applying the changes
- after reboot executed in previous task - the system won't boot at all 

The boot loader boots PXE, admin selects "Boot from disk", the system tries to boot from disk and just shows "booting from disk...". It shows black screen - and reverts back to the PXE boot.

It seems like the boot sector was changed after applying the config from machine API.

Please note that it doesn't show any error nor message in the boot screen.

Moving the disk physically to different machine (different HW) it boots normally. (for sake of checking the boot process)


Version-Release number of selected component (if applicable):
OpenShift Container Platform 4.2 on UPI bare metal - physical machines


How reproducible:
- yes, on customer side


Steps to Reproduce:
1. boot installer and install from "metal" image and ignition, install completed
2. system successfully boots to the RHCOS image - providing login screen, pulling data from machine API
3. reboots and fails to boot again ever

Actual results:
- blank black screen - not booting

Expected results:


Additional info:
- will provide the dmesg from same HW but different RHCOS - from master, not worker
- seems like the machine API configuration breaks the system 

- the same installation on Master nodes works