Bug 1860186 - [4.2] - worker RHCOS won't boot after reboot on physical machine
Summary: [4.2] - worker RHCOS won't boot after reboot on physical machine
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.2.z
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.6.0
Assignee: Antonio Murdaca
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks: 1186913
TreeView+ depends on / blocked
 
Reported: 2020-07-23 22:54 UTC by Vladislav Walek
Modified: 2020-08-07 14:06 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-07 14:06:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Vladislav Walek 2020-07-23 22:54:39 UTC
Description of problem:

The RHCOS 4.2 won't boot after reboot when applying changes from API server. 
The scenario is as follows:

- worker system boots installer RHCOS
- the system is installed with ignition config and boots normally - showing login shell
- the worker node pulls configuration from machine server API and applying the changes
- after reboot executed in previous task - the system won't boot at all 

The boot loader boots PXE, admin selects "Boot from disk", the system tries to boot from disk and just shows "booting from disk...". It shows black screen - and reverts back to the PXE boot.

It seems like the boot sector was changed after applying the config from machine API.

Please note that it doesn't show any error nor message in the boot screen.

Moving the disk physically to different machine (different HW) it boots normally. (for sake of checking the boot process)


Version-Release number of selected component (if applicable):
OpenShift Container Platform 4.2 on UPI bare metal - physical machines


How reproducible:
- yes, on customer side


Steps to Reproduce:
1. boot installer and install from "metal" image and ignition, install completed
2. system successfully boots to the RHCOS image - providing login screen, pulling data from machine API
3. reboots and fails to boot again ever

Actual results:
- blank black screen - not booting

Expected results:


Additional info:
- will provide the dmesg from same HW but different RHCOS - from master, not worker
- seems like the machine API configuration breaks the system 

- the same installation on Master nodes works


Note You need to log in before you can comment on or make changes to this bug.