Created attachment 1714486 [details] The screenshot of vm console Description of problem: When verifying Bug 1862851, enabling secure boot during installing ocp with rhcos-46.82.202009091306-0, hit the new issue from VM console: Failed at step STDIN spawning /bin/dracut-emergency: Inappropriate ioctl for device Please see attached screenshot. Version-Release number of selected component (if applicable): How reproducible: Always once enabling secure boot Steps to Reproduce: 1. Install OCP on vsphere with secureboot enabled, rhcos template is rhcos-46.82.202009091306-0 2. 3. Actual results: VMs should be up successfully. Expected results: VMc could not be started correctly. Additional info:
From the attached console screenshot: kernel_lockdown (which is automatically enabled in Secure Boot) is denying iopl access. Afterburn uses that to unlock I/O access to the hypervisor, in order to read guestinfo properties. The underlying root-cause is https://github.com/lucab/vmw_backdoor-rs/issues/6.
I'm trying to reproduce/investigate this with the current 4.6 pre-release OVA (4.6.0-0.nightly-2020-09-09-130911 [0]) but it even fails to boot to the kernel due to invalid signature (normal EFI boot is fine though). Jinyun, does that image boot up to the kernel for you? [0] https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/latest-4.6/rhcos-4.6.0-0.nightly-2020-09-09-130911-x86_64-vmware.x86_64.ova
See https://gitlab.cee.redhat.com/coreos/redhat-coreos/-/merge_requests/1112 - TL;DR 4.6 is back to an 8.2.z kernel which fixes this. You need to use the build browser to get the latest.
Indeed, the latest nightly internally available at this time (rhcos-46.82.202009101640-0) has a good signature and can reach kernel-space. I'm fully seeing the failure due to iopl ETERM there. As a (very bad) quick workaround, I've verified that this can be bypassed by catching the GRUB prompt and manually adding `ip=dhcp,dhcp6` as a kernel argument.
I had some bad feeling about Ignition, and indeed bypassing the Afterburn failure with the hackish workaround above leads to the same thing causing failures to Ignition fetch stages. For reference, that's tracked separately at https://github.com/coreos/ignition/issues/1092.
Not sure how many customers are using SecureBoot on VMware, so setting medium priority for now. I wonder if this should really be considered a TestBlocker because of that.
Created a doc BZ to advise users to not use SecureBoot on VMWware - https://bugzilla.redhat.com/show_bug.cgi?id=1878262
@jima, about the Regression keyword: can you confirm that Secure Boot worked on VMware on previous versions of RHCOS? Due to the nature of the bug, I suspect that it did not.
On 4.5 with Secure Boot enabled, how did you install the nodes? Did you use the RHCOS installer PXE or ISO image to install to the VM disk, passing the Ignition config via coreos.inst.ignition_url? Did you start from the OVA and pass the Ignition config via guestinfo ignition.config.data? Or something else?
(In reply to Benjamin Gilbert from comment #12) > On 4.5 with Secure Boot enabled, how did you install the nodes? Did you use > the RHCOS installer PXE or ISO image to install to the VM disk, passing the > Ignition config via coreos.inst.ignition_url? Did you start from the OVA > and pass the Ignition config via guestinfo ignition.config.data? Or > something else? On 4.5 with secure boot enabled, VMs are created from ova template and set ignition config via guestinfo ignition.config.data.
And I did a 4.5 "bare metal" installation on pre-created VMs, booting from the CoreOS ISO & passing the ignition config. On VMware, when I create a VM of type RHEL 8 it defaults to EFI boot and SecureBoot is enabled; it is the option I chose to create my VMs to do the bare metal install. If I pick CoreOS for a new VM, it defaults to BIOS, no secure boot.
JP: the "CoreOS" OS type in VMware probably refers to CoreOS Container Linux. Amending my statement in comment 10: I'd expect that both Ignition and Afterburn will fail when accessing VMware guestinfo variables if Secure Boot is enabled. Afterburn only gained this functionality in RHCOS 4.6, and Ignition does not access guestinfo when a machine is installed via the bare-metal installer. So the success report in comment 14 makes sense, but I'm surprised by the report in comment 13. At least as to Afterburn in the bare-metal install case, I agree that this is a regression.
> So the success report in comment 14 makes sense, but I'm surprised by the report in comment 13. I think I can explain that: the older library in Ignition 0.x did not perform an `iopl` and thus it won't fail this way (but that in turn means it is prone to other non-deterministic failures). I have noted more details and references about this at https://github.com/coreos/ignition/issues/1092#issuecomment-692549607.
Pushed `rust-afterburn-4.5.0-2.rhaos4.6.el8` with a patch to skip the `iopl` call (essentially matching previous Ignition 0.x behavior) as a quickfix for 4.6.
The other half of this fix have been pushed to `ignition-2.6.0-4.rhaos4.6.git947598e.el8`. Both sides landed in RHCOS 4.6 nightlies. I have manually tested that RHCOS `46.82.202009182140-0`: ``` $ head -2 /etc/os-release NAME="Red Hat Enterprise Linux CoreOS" VERSION="46.82.202009182140-0" $ grep -o ignition.platform.id='[[:alnum:]]*' /proc/cmdline ignition.platform.id=vmware $ mokutil --sb-state SecureBoot enabled ```
I installed ocp 4.6.0-0.nightly-2020-09-21-182309 with template rhcos 46.82.202009182140-0 on vsphere, secureboot error is not reproduced but got new error from master/worker node console when getting ignition file: x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0 Please see attached screenshot.
Created attachment 1715647 [details] new error from master/worker nodes.
Thanks for confirming the SecureBoot fixes worked. The additional error you are seeing depends on TLS certificates set up in your environment. If you are providing this custom certificate, you need to adjust the Subject Alternative Name as suggested in the error. If it was the Openshift Installer that auto-generated it, please open a BZ against that component so that the certificate generation logic is tweaked to introduce hostname-matching SAN entries.
Marking verified with 4.6.0-0.nightly-2020-09-21-182309 based on comment #27
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196
I am using Openshift 4.6.8 and using BIOS to boot my CoreOS vms and I get inappropriate ioctl for device errors, the problem looks like only happened in UEFI mode, there is something else that I can do? Thanks
(In reply to Matan Carmeli from comment #34) > I am using Openshift 4.6.8 and using BIOS to boot my CoreOS vms and I get > inappropriate ioctl for device errors, the problem looks like only happened > in UEFI mode, there is something else that I can do? > Thanks Please open a new BZ with the problem you are facing.