Description of problem: If secure boot is currently disabled, and user attempts to enable it via ZTP, install will not begin the first time ZTP was triggered. When secure boot is enabled viz ZTP, then boot options will be configured before virtual CD was attached, thus first boot will be booting into existing HD with secure boot on. Install will then get stuck because boot from CD was never triggered. Version-Release number of selected component (if applicable): 4.10 How reproducible: Always Steps to Reproduce: 1. Secure boot is currently disabled in bios 2. Attempt to deploy a cluster with secure boot enabled via ZTP 3. Actual results: - spoke cluster got booted with secure boot option toggled, into existing HD - spoke cluster did not boot into virtual CD, thus install never started. - agentclusterinstall gets stuck here: State: insufficient State Info: Cluster is not ready for install Expected results: - installation started and completed successfully Additional info: Secure boot config used in ZTP siteconfig: http://registry.kni-qe-0.lab.eng.rdu2.redhat.com:3000/kni-qe/ztp-site-configs/src/ff814164cdcd355ed980f1edf269dbc2afbe09aa/siteconfig/master-2.yaml#L40
Jim, can you triage this one while Ian is away?
Does this issue occur when the disk is blank, or does the installation happen correctly in that case (skipping over the blank disk and booting from the ISO)?
@imiller the disk was not blank.It booted into existing HD.
@yliu1 if the disk is erased prior to an installation where secureboot is enabled, does it succeed? What I am hoping to understand is whether this issue is limited to a re-deployment of a cluster (where secureboot is turned on) versus affecting deployment of new (blank) servers.
@imiller install started when disk was wiped. It attempted HD boot - failed, then fell back to CD boot.
but then, it seems like the real issue is that ZTP does not set "boot from virtual CD once" to override any existing boot order. in which case, I suspect similar issue will exist in any installation mode.. it should never boot with anything that exist on the disk, in any case.
I'm not sure what specific option is selected but the boot menu flashes a message "boot from virtual CD" and re-installs of clusters where there is an existing image already on disks is something we do regularly without problems.
@yliu1 are you sure machine (BIOS) is set to UEFI?
also can you add machine info? is that a Dell machine? something else? what's the iDRAC version?
Reproduced again. System had 4.11.6 installed without secureboot. Attempted ZTP (assisted service in ACM 2.5.2) installation of 4.10.30 with secureboot enabled. The BMH CR contents: spec: automatedCleaningMode: disabled bmc: address: idrac-virtualmedia+https://X.X.X.X/redfish/v1/Systems/System.Embedded.1 credentialsName: cnfocto1-master0 disableCertificateVerification: true bootMACAddress: E4:43:4B:F6:12:E0 bootMode: UEFISecureBoot online: true rootDeviceHints: hctl: "1:2:0:0" The system booted back into the 4.11.6 installation already on disk. The system is a Dell PowerEdge R740xd idrac version: 4.40.40.00
@ykashtan yes it's set to UEFI. It is a dell machine with following firmware. BIOS Version 2.14.2 iDRAC Firmware Version 5.10.10.00 And I believe @dpenney also encountered this issue with later idrac firmware, e.g., 5.10.30.00.
There seems to be a separate issue with the latest iDRAC firmware, 5.10.50.00, where the response to BIOS config request to enable secure boot is seen as an error, leading to the bmh being put into a "deprovisioning" state and getting powered off without completing the BIOS config change. Downgrading to 5.10.30.00 results in the same behaviour described in above comments. It seems as though the BIOS config request to change secure boot value is seen as completed immediately, but it's only "pending" in iDRAC. The "first boot" is then used to actually perform the config change job and then the node is rebooted. After this reboot, then, the node goes back to the default boot order and boots from disk. Similarly, if your node is already configured as secure boot and the bmh enters a "deprovisioning" state, another BIOS config change is triggered to set secure boot to false and then the node is powered off. However, this leaves the config change in a pending state, as well, and the bmh gets stuck in this "deprovisioning" state (as the secure boot state is "true", with pending config change, so it seems to keep trying to set it to "false"). As an odd side note, if I manually wipe the disk prior to triggering install with a change to enable secure boot, then the first boot after the config change completes does fail through to booting from the ISO, as expected. However, when the first "boot from disk" is attempted as part of the install, the node boots back to ISO again, as though the pre-wiped disk didn't get setup as a bootable disk prior to the reboot for some reason. At this point, the system blocks, requiring a manual reboot ("pending user action") to get it to boot from disk.
We're consulting with Dell engineers wrt path forward here. Meanwhile, > There seems to be a separate issue with the latest iDRAC firmware, 5.10.50.00, where the response to BIOS config request to enable secure boot is seen as an error, leading to the bmh being put into a "deprovisioning" state and getting powered off without completing the BIOS config change Could you file this bug separately, providing the logs? This is definitely worth fixing.
Got some information from a Dell engineer: 11:36 <dtantsur> ajya: could you check if SecureBootEnable is changed instantly or only after a reboot? If the latter, we can probably try rebooting and checking the updated value. 11:41 <ajya> dtantsur: only after reboot, PATCHing only creates a job in iDRAC that is is Scheduled state. Only during reboot it is started. The workflow is the same as for BIOS attribute update because it is a BIOS attribute change (probably, could do the same with BIOS clean step). 11:57 <dtantsur> ajya: okay, so if I update the redfish code to check if the value has changed immediately, reboot if not, and check again, it will work? 12:06 <ajya> dtantsur: yes, GET /SecureBoot returns old value until job is finished
Ian, if I prototype a patch, could we collaborate to test it on your environment? I have some Dell machines, but it may take me longer to reproduce the problem.
Dmitry, sure, I'm happy to provide access to my system for testing
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira. https://issues.redhat.com/browse/OCPBUGS-9303