Bug 2106378
Summary: | Spoke BMH stuck “provisioning” after changing a BIOS attribute via the converged workflow | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | tali <tali> |
Component: | Bare Metal Hardware Provisioning | Assignee: | Dmitry Tantsur <dtantsur> |
Bare Metal Hardware Provisioning sub component: | ironic | QA Contact: | tali <tali> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | ccrum, rpittau, sasha |
Version: | 4.11 | Keywords: | OtherQA, TestBlocker, Triaged |
Target Milestone: | --- | ||
Target Release: | 4.12.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-01-17 19:52:29 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
tali@redhat.com
2022-07-12 13:39:36 UTC
The must-gather is available: https://drive.google.com/file/d/1nQhCrfHRwTT1c6TbvwOZG5wItXWSdVGU/view?usp=sharing reproducing trying with SNO spoke on real BM (Dell PowerEdge R640): 4.11.0-0.nightly-2022-07-11-080250 multicluster-engine.v2.1.0 [kni@r640-u01 ~]$ oc get bmh NAME STATE CONSUMER ONLINE ERROR AGE master-1-0 preparing true 35m [kni@r640-u01 ~]$ [kni@r640-u01 ~]$ [kni@r640-u01 ~]$ [kni@r640-u01 ~]$ oc get hostfirmwaresettings.metal3.io master-1-0 -o yaml apiVersion: metal3.io/v1alpha1 kind: HostFirmwareSettings metadata: creationTimestamp: "2022-07-12T16:18:48Z" generation: 1 name: master-1-0 namespace: qe1 resourceVersion: "1263089" uid: a9bdfd70-8eb0-4286-9c57-b197ab06bc84 spec: settings: SecureBoot: Disabled status: conditions: - lastTransitionTime: "2022-07-12T16:19:13Z" message: "" observedGeneration: 1 reason: Success status: "True" type: ChangeDetected - lastTransitionTime: "2022-07-12T16:19:13Z" message: "" observedGeneration: 1 reason: Success status: "True" type: Valid lastUpdated: "2022-07-12T16:19:13Z" schema: name: schema-8b6476c0 namespace: qe1 settings: AcPwrRcvry: Last AcPwrRcvryDelay: Immediate AcPwrRcvryUserDelay: "60" AesNi: Enabled AssetTag: "" AuthorizeDeviceFirmware: Disabled BootMode: Uefi BootSeqRetry: Enabled ConTermType: Vt100Vt220 ControlledTurbo: Disabled ControlledTurboMinusBin: "0" CorrEccSmi: Enabled CpuInterconnectBusLinkPower: Disabled CpuInterconnectBusSpeed: MaxDataRate CurrentEmbVideoState: Enabled DcuIpPrefetcher: Enabled DcuStreamerPrefetcher: Enabled DellAutoDiscovery: PlatformDefault DellWyseP25BIOSAccess: Enabled DynamicCoreAllocation: Disabled EmbSata: AhciMode EmbVideo: Enabled EnergyPerformanceBias: MaxPower ErrPrompt: Enabled ExtSerialConnector: Serial1 FailSafeBaud: "115200" ForceInt10: Disabled GenericUsbBoot: Disabled HddFailover: Enabled HddPlaceholder: Enabled HttpDev1EnDis: Disabled HttpDev1Interface: NIC.Integrated.1-1-1 HttpDev1Protocol: IPv4 HttpDev1Uri: "" HttpDev1VlanEnDis: Disabled HttpDev1VlanId: "1" HttpDev1VlanPriority: "0" HttpDev2EnDis: Disabled HttpDev2Interface: NIC.Integrated.1-1-1 HttpDev2Protocol: IPv4 HttpDev2Uri: "" HttpDev2VlanEnDis: Disabled HttpDev2VlanId: "1" HttpDev2VlanPriority: "0" HttpDev3EnDis: Disabled HttpDev3Interface: NIC.Integrated.1-1-1 HttpDev3Protocol: IPv4 HttpDev3Uri: "" HttpDev3VlanEnDis: Disabled HttpDev3VlanId: "1" HttpDev3VlanPriority: "0" HttpDev4EnDis: Disabled HttpDev4Interface: NIC.Integrated.1-1-1 HttpDev4Protocol: IPv4 HttpDev4Uri: "" HttpDev4VlanEnDis: Disabled HttpDev4VlanId: "1" HttpDev4VlanPriority: "0" InBandManageabilityInterface: Enabled IntegratedNetwork1: Enabled IntegratedRaid: Enabled IntelTxt: "Off" InternalUsb: "On" IoatEngine: Disabled IscsiDev1Con1Auth: None IscsiDev1Con1ChapName: "" IscsiDev1Con1ChapSecret: "" IscsiDev1Con1ChapType: OneWay IscsiDev1Con1DhcpEnDis: Disabled IscsiDev1Con1EnDis: Disabled IscsiDev1Con1Gateway: "" IscsiDev1Con1Interface: NIC.Integrated.1-1-1 IscsiDev1Con1Ip: "" IscsiDev1Con1IsId: "" IscsiDev1Con1Lun: "0" IscsiDev1Con1Mask: "" IscsiDev1Con1Port: "3260" IscsiDev1Con1Protocol: IPv4 IscsiDev1Con1Retry: "3" IscsiDev1Con1RevChapName: "" IscsiDev1Con1RevChapSecret: "" IscsiDev1Con1TargetIp: "" IscsiDev1Con1TargetName: "" IscsiDev1Con1TgtDhcpEnDis: Disabled IscsiDev1Con1Timeout: "10000" IscsiDev1Con1VlanEnDis: Disabled IscsiDev1Con1VlanId: "1" IscsiDev1Con1VlanPriority: "0" IscsiDev1Con2Auth: None IscsiDev1Con2ChapName: "" IscsiDev1Con2ChapSecret: "" IscsiDev1Con2ChapType: OneWay IscsiDev1Con2DhcpEnDis: Disabled IscsiDev1Con2EnDis: Disabled IscsiDev1Con2Gateway: "" IscsiDev1Con2Interface: NIC.Integrated.1-1-1 IscsiDev1Con2Ip: "" IscsiDev1Con2IsId: "" IscsiDev1Con2Lun: "0" IscsiDev1Con2Mask: "" IscsiDev1Con2Port: "3260" IscsiDev1Con2Protocol: IPv4 IscsiDev1Con2Retry: "3" IscsiDev1Con2RevChapName: "" IscsiDev1Con2RevChapSecret: "" IscsiDev1Con2TargetIp: "" IscsiDev1Con2TargetName: "" IscsiDev1Con2TgtDhcpEnDis: Disabled IscsiDev1Con2Timeout: "10000" IscsiDev1Con2VlanEnDis: Disabled IscsiDev1Con2VlanId: "1" IscsiDev1Con2VlanPriority: "0" IscsiDev1ConOrder: Con1Con2 IscsiDev1EnDis: Disabled IscsiInitiatorName: "" LogicalProc: Enabled MemFrequency: MaxPerf MemOpMode: OptimizerMode MemPatrolScrub: Standard MemRefreshRate: 1x MemTest: Disabled MemoryMappedIOH: 56TB MmioAbove4Gb: Enabled MonitorMwait: Enabled NodeInterleave: Disabled NumLock: "On" NvmeMode: NonRaid OneTimeBootMode: Disabled OneTimeUefiBootSeqDev: RAID.Integrated.1-1 OppSrefEn: Disabled OsWatchdogTimer: Disabled PcieAspmL1: Disabled PowerCycleRequest: None Proc1Brand: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz Proc1Id: 6-55-4 Proc1L2Cache: 16x1 MB Proc1L3Cache: 22 MB Proc1NumCores: "16" Proc1TurboCoreNum: All Proc2Brand: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz Proc2Id: 6-55-4 Proc2L2Cache: 16x1 MB Proc2L3Cache: 22 MB Proc2NumCores: "16" Proc2TurboCoreNum: All ProcAdjCacheLine: Enabled ProcBusSpeed: 10.40 GT/s ProcC1E: Disabled ProcCStates: Disabled ProcConfigTdp: Nominal ProcCoreSpeed: 2.10 GHz ProcCores: All ProcHwPrefetcher: Enabled ProcPwrPerf: MaxPerf ProcTurboMode: Enabled ProcVirtualization: Enabled ProcX2Apic: Disabled PwrButton: Enabled PxeDev1EnDis: Enabled PxeDev1Interface: NIC.Integrated.1-1-1 PxeDev1Protocol: IPv6 PxeDev1VlanEnDis: Disabled PxeDev1VlanId: "1" PxeDev1VlanPriority: "0" PxeDev2EnDis: Disabled PxeDev2Interface: NIC.Integrated.1-1-1 PxeDev2Protocol: IPv4 PxeDev2VlanEnDis: Disabled PxeDev2VlanId: "1" PxeDev2VlanPriority: "0" PxeDev3EnDis: Disabled PxeDev3Interface: NIC.Integrated.1-1-1 PxeDev3Protocol: IPv4 PxeDev3VlanEnDis: Disabled PxeDev3VlanId: "1" PxeDev3VlanPriority: "0" PxeDev4EnDis: Disabled PxeDev4Interface: NIC.Integrated.1-1-1 PxeDev4Protocol: IPv4 PxeDev4VlanEnDis: Disabled PxeDev4VlanId: "1" PxeDev4VlanPriority: "0" RedirAfterBoot: Enabled RedundantOsLocation: None SataPortA: Auto SataPortACapacity: N/A SataPortADriveType: Unknown Device SataPortAModel: Unknown SataPortB: Auto SataPortBCapacity: N/A SataPortBDriveType: Unknown Device SataPortBModel: Unknown SataPortC: Auto SataPortCCapacity: N/A SataPortCDriveType: Unknown Device SataPortCModel: Unknown SataPortD: Auto SataPortDCapacity: N/A SataPortDDriveType: Unknown Device SataPortDModel: Unknown SataPortE: Auto SataPortECapacity: N/A SataPortEDriveType: Unknown Device SataPortEModel: Unknown SataPortF: Auto SataPortFCapacity: N/A SataPortFDriveType: Unknown Device SataPortFModel: Unknown SataPortG: Auto SataPortGCapacity: N/A SataPortGDriveType: Unknown Device SataPortGModel: Unknown SataPortH: Auto SataPortHCapacity: N/A SataPortHDriveType: Unknown Device SataPortHModel: Unknown SataPortI: Auto SataPortICapacity: N/A SataPortIDriveType: Unknown Device SataPortIModel: Unknown SataPortJ: Auto SataPortJCapacity: N/A SataPortJDriveType: Unknown Device SataPortJModel: Unknown SataPortK: Auto SataPortKCapacity: N/A SataPortKDriveType: Unknown Device SataPortKModel: Unknown SataPortL: Auto SataPortLCapacity: N/A SataPortLDriveType: Unknown Device SataPortLModel: Unknown SataPortM: Auto SataPortMCapacity: N/A SataPortMDriveType: Unknown Device SataPortMModel: Unknown SataPortN: Auto SataPortNCapacity: N/A SataPortNDriveType: Unknown Device SataPortNModel: Unknown SecureBoot: Enabled SecureBootMode: DeployedMode SecureBootPolicy: Standard SecurityFreezeLock: Enabled SerialComm: OnConRedirAuto SerialPortAddress: Serial1Com2Serial2Com1 SetBootOrderDis: "" SetBootOrderEn: "" SetBootOrderFqdd1: "" SetBootOrderFqdd2: "" SetBootOrderFqdd3: "" SetBootOrderFqdd4: "" SetBootOrderFqdd5: "" SetBootOrderFqdd6: "" SetBootOrderFqdd7: "" SetBootOrderFqdd8: "" SetBootOrderFqdd9: "" SetBootOrderFqdd10: "" SetBootOrderFqdd11: "" SetBootOrderFqdd12: "" SetBootOrderFqdd13: "" SetBootOrderFqdd14: "" SetBootOrderFqdd15: "" SetBootOrderFqdd16: "" SetLegacyHddOrderFqdd1: "" SetLegacyHddOrderFqdd2: "" SetLegacyHddOrderFqdd3: "" SetLegacyHddOrderFqdd4: "" SetLegacyHddOrderFqdd5: "" SetLegacyHddOrderFqdd6: "" SetLegacyHddOrderFqdd7: "" SetLegacyHddOrderFqdd8: "" SetLegacyHddOrderFqdd9: "" SetLegacyHddOrderFqdd10: "" SetLegacyHddOrderFqdd11: "" SetLegacyHddOrderFqdd12: "" SetLegacyHddOrderFqdd13: "" SetLegacyHddOrderFqdd14: "" SetLegacyHddOrderFqdd15: "" SetLegacyHddOrderFqdd16: "" Slot1: Enabled Slot1Bif: x16 Slot2: Enabled Slot2Bif: x16 Slot3: Enabled Slot3Bif: x16 SriovGlobalEnable: Disabled SubNumaCluster: Disabled SysMemSize: 192 GB SysMemSpeed: 2666 Mhz SysMemType: ECC DDR4 SysMemVolt: 1.20 V SysMfrContactInfo: www.dell.com SysProfile: PerfOptimized SystemBiosVersion: 1.6.13 SystemCpldVersion: 1.0.2 SystemManufacturer: Dell Inc. SystemMeVersion: 4.0.4.401 SystemModelName: PowerEdge R640 SystemServiceTag: 176Q2W2 TpmInfo: No TPM present TpmPpiBypassClear: Disabled TpmPpiBypassProvision: Disabled UefiComplianceVersion: "2.5" UefiVariableAccess: Standard UncoreFrequency: MaxUFS UpiPrefetch: Enabled UsbManagedPort: "On" UsbPorts: AllOn VideoMem: 16 MB WorkloadProfile: NotAvailable WriteCache: Disabled WriteDataCrc: Disabled [kni@r640-u01 ~]$ ssh core.134.3 sudo crictl ps -a|head CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD 21b96ac5e5faf b4ce2d5db90a4b1f12ae9d799ccf1400fb4507b1cf5974bfe3814f631f89054e 10 minutes ago Exited collect-profiles 0 f80cd95fc1cfa collect-profiles-27627405-f99h5 6ce890c1f56a7 b4ce2d5db90a4b1f12ae9d799ccf1400fb4507b1cf5974bfe3814f631f89054e 21 minutes ago Exited collect-profiles 0 da3f273ba0299 collect-profiles-27627390-gtfls cc1898b6d972a quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a3143920c200b67fcd60d05abaae99c125a450b7b95fe682d1c84960f0a1e897 21 minutes ago Running dns 1 c44cc526d67ca dns-default-r4hds 4231d0dcfed58 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:884b46e766f92e6a4ddfd493b0eeab5b6ad72f01cab7651a7f457509d8d76fb0 21 minutes ago Running csi-snapshot-controller-operator 1 ba5062861adce csi-snapshot-controller-operator-86884c7b4d-lvzb7 5f19b6878dc03 f334e212c405aa88a76f9e23708952da3d5f4cf6715632895f4d50035f51538a 22 minutes ago Running kube-rbac-proxy-thanos 1 318c46737f3ab prometheus-k8s-0 f97bdfed2c2ab f334e212c405aa88a76f9e23708952da3d5f4cf6715632895f4d50035f51538a 22 minutes ago Running kube-rbac-proxy 1 318c46737f3ab prometheus-k8s-0 a9f4533f88033 e081cc8c15586c74c1683dd70325ad55cbc08fc6627359bbe65801e1ac5e35f7 22 minutes ago Running prometheus-proxy 1 318c46737f3ab prometheus-k8s-0 260ae1e801a31 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d791a91d25a8f8d8eae66d888ede55de4eda82806d722e944b4ce8564444ace7 22 minutes ago Running thanos-query 1 1c24c73bbad47 thanos-querier-7b57f85644-zgfhc 4bfc96d853025 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d791a91d25a8f8d8eae66d888ede55de4eda82806d722e944b4ce8564444ace7 22 minutes ago Running thanos-sidecar 1 318c46737f3ab prometheus-k8s-0 [kni@r640-u01 ~]$ The solution to address this issue, should not introduce an additional reboot (as the host is installed as expected at the stage). Has anyone been able to reproduce this on a virtual setup? My naive attempt to create a BMH with Redfish, change a settings and provision an image has succeeded. I think I'm onto something. Technical notes for myself: when fast-track is on, we never re-configure the ISO, even when rebooting the node after a clean step. And since we use one-time ISO boot by default, the node boots into nothing. We need to re-configure the ISO before rebooting. Cannot be reproduced on a virtual environment because these don't support one-time boot. The fix should be available now. Would it be possible for you to test the fix in 4.12? I think it should solve the issue, but I haven't been able to reproduce the issue to begin with. I tested the fix on 4.12.0-0.nightly-2022-08-23-031342. The BMH is still stuck in a “provisioning” state. It booted to a live-ISO when rebooting after BIOS/RAID settings changes. Clean step timed out event was also generated (although the change was applied successfully). Could you retest with a newer nightly? The patch merged on the same day, I have a feeling the nightly did not include it yet. Ok, I will test the fix with a newer nightly. I tested the fix on 4.12.0-0.nightly-2022-08-29-102035. The original problem has been fixed. The host was installed properly after changing a BIOS attribute. However, I observed two reboots during cleaning. One reboot is intended to finish the clean step( although it acts as an extra reboot in comparison to non converged ZTP workflow). The second reboot needs to be investigated. > I tested the fix on 4.12.0-0.nightly-2022-08-29-102035. The original problem has been fixed. The host was installed properly after changing a BIOS attribute. Great, thank you! I'll close this bug as verified by you. > However, I observed two reboots during cleaning. Let's file a new bug for this (it's done in jira OCPBUGS nowadays). Please include must-gather. P.S. > it acts as an extra reboot in comparison to non converged ZTP workflow Well, the non-converged workflow did not have BIOS settings :) We need the reboot to make sure the settings are applied before deployment. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399 |