Bug 1733848

Summary: OpenShift v4.1.6 installation failed using UPI method on Bare Metal
Product: OpenShift Container Platform Reporter: rushil <rushil>
Component: InstallerAssignee: Abhinav Dahiya <adahiya>
Installer sub component: openshift-installer QA Contact: Johnny Liu <jialiu>
Status: CLOSED INSUFFICIENT_DATA Docs Contact:
Severity: urgent    
Priority: low CC: adahiya, bbreard, bleanhar, jerzhang, jmalde, misalunk, spuranam, wking
Version: 4.1.z   
Target Milestone: ---   
Target Release: 4.1.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-13 20:10:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 Miheer Salunke 2019-07-30 00:04:52 UTC
Description ->

We followed the instructions listed at https://docs.openshift.com/container-platform/4.1/installing/installing_bare_metal/installing-bare-metal.html#installation-user-infra-machines-pxe_installing-bare-metal to install OpenShift 4.1 on UEFI enabled systems. But all our attempts has been unsuccessful so far. I have attached our DHCP configurations, along with ignition file and IPXE chainloader menu.
Please note that using the very same configurations we are able to install BIOS and UEFI based RHEL8. Additionally we can also successfully install RHCOS on BIOS enabled server.  Any help will be highly appreciated.


I can confirm that using ipxe.efi NBP works for RHEL8, we even tried replacing s/ipxe.efi/grubx64.efi by copying the grubx64.efi  from RHEL ISO since it does not exists in RHCOS ISO, after making this change DHCP we place the file named "grub.conf" with following content in /tftproot/grub2/grub.conf it does not work as well, we then moved /tftproot/grub2/grub.conf  to /tftproot/grub2/grub.cfg-01-00-50-56-2b-b7-30 with same content, the RHCOS doc link https://docs.openshift.com/container-platform/4.1/installing/installing_bare_metal/installing-bare-metal.html#installation-user-infra-machines-pxe_installing-bare-metal
 mentions where this file should be placed.  



set default="1"

function load_video {
  insmod efi_gop
  insmod efi_uga
  insmod video_bochs
  insmod video_cirrus
  insmod all_video
}

load_video
set gfxpayload=keep
insmod gzio
insmod part_gpt
insmod ext2

set timeout=60

menuentry 'Install Red Hat Enterprise Linux CoreOS' --class fedora --class gnu-linux --class gnu --class os {
linux /images/rhcos/vmlinuz nomodeset rd.neednet=1 coreos.inst=yes coreos.inst.install_dev=sda coreos.inst.image_url=http://bastion.ocp4.local:8000/dist/rhcos/media/rhcos-4.1.0-x86_64-metal-uefi.raw.gz coreos.inst.ignition_url=http://bastion.ocp4.local:8000/dist/rhcos/ign/bootstrap.ign
initrd http://bastion.ocp4.local:8000/dist/rhcos/media/rhcos-4.1.0-x86_64-installer-initramfs.img
}

Comment 6 Ben Breard 2019-08-01 20:02:46 UTC
Looking at the configs I see they're passing in a file under /etc/environment. Can I ask what that is? This is a common way to configure proxy support, but that isn't supported in 4.1.

Comment 7 Satish Puranam 2019-08-03 00:59:30 UTC
Hello Ben, Miheer,

The original intent was to teach RHCOS instance about our HTTP proxies, we have since removed all automated patching we were doing to ignition files generated by the openshift-install utility including assigning static IP's after learning that this is not supported in OCP v4.1. But none of this changes the final outcome i.e. the cluster still fails to come online. I have attached updates support bundle to the case (https://access.redhat.com/support/cases/#/case/02431309?commentId=a0a2K00000RAFxQQAX). 

While the bootstrap node seems working and i am able SSH, on master and worker i see an endless stream of these messages:

ignition[764]: GET error: get https://api-int.ocp4.local:22623/config/master: x509: certificate has expired or not yet valid
dhclient[810]: DHCPREQUEST on ens160 to 172.17.5.5 port 67 (xid=0x91000d72)
dhclient[810]: DHCPACK from 172.17.5.5 (xid=0x91000d7c)
dhclient[810]: bound to 172.17.5.20 -- renewal in 10 seconds

ignition[764]: GET error: get https://api-int.ocp4.local:22623/config/worker: x509: certificate has expired or not yet valid
dhclient[810]: DHCPREQUEST on ens160 to 172.17.5.5 port 67 (xid=0x91000d72)
dhclient[810]: DHCPACK from 172.17.5.5 (xid=0x91000d7c)
dhclient[810]: bound to 172.17.5.30 -- renewal in 10 seconds

Regards,
Satish Puranam

Comment 11 Abhinav Dahiya 2019-08-13 20:10:01 UTC
Closing this as the user has mismatch in time across machines.