Description of problem: Deploy a spoke cluster with a HostFirmwareSettings CR via the converged workflow. The BMH is stuck in “available” state after successfully changing a BIOS attribute during deployment. cat HostFirmwareSettings.yaml apiVersion: metal3.io/v1alpha1 kind: HostFirmwareSettings metadata: name: "cnfde11.ptp.lab.eng.bos.redhat.com" namespace: "cnfde11" spec: settings: PowerButtonFunction: "4 Seconds Override" oc get bmh -n cnfde11 NAME STATE CONSUMER ONLINE ERROR AGE cnfde11.ptp.lab.eng.bos.redhat.com available true 70m Ironic log shows applying the configuration change: 2022-07-05 13:47:57.713 1 DEBUG ironic.drivers.modules.redfish.bios [req-e29ff593-3e6f-4dd3-a34f-06568e01d2da - - - - -] Apply BIOS configuration for node 93422fec-4ada-4009-becb-a8799ce3f7c3: [{'name': 'PowerButtonFunction', 'value': '4 Seconds Override'}] apply_configuration /usr/lib/python3.6/site-packages/ironic/drivers/modules/redfish/bios.py:230^[[00m 2022-07-05 13:47:57.715 1 DEBUG sushy.connector [req-e29ff593-3e6f-4dd3-a34f-06568e01d2da - - - - -] HTTP request: PATCH https://10.16.231.98/redfish/v1/Systems/1/Bios/SD; headers: {'Content-Type': 'application/json', 'OData-Version': '4.0'}; body: {'Attributes': {'PowerButtonFunction': '4 Seconds Override'}}; blocking: False; timeout: 60; session arguments: {}; _op /usr/lib/python3.6/site-packages/sushy/connector.py:111^[[00m Version-Release number of selected component (if applicable): - Latest upstream assisted-service-operator - OCP 4.11 on hub (4.11.0-fc.3) - 4.10 spoke How reproducible: 100% Steps to Reproduce: 1. Deploy OCP 4.11 hub with upstream assisted-service-operator 2. Try to deploy spoke using manually created CRs including a HostFirmwareSettings CR Actual results: BMH stuck "available" Expected results: The SuperMicro server is deployed as expected Additional info:
The must-gather is available: https://drive.google.com/file/d/1c_7Eg5-6Vf6YSzPyjJjYSRlmhPgAYrkm/view?usp=sharing
> operationalStatus: detached This is suspicious, reconciliation won't happen for detached nodes.
According to the agent CR this host was successfully installed: [root@cnfdt08-installer ~]# oc get agent -A NAMESPACE NAME CLUSTER APPROVED ROLE STAGE cnfde11 aba3d84c-44c5-f521-e1b8-f24d29c26080 cnfde11 true master Done Unclear why the BMH is still Available and not provisioned, detached From the host motd we can see that the host is no longer running the discovery ISO and completed the installation successfully. Since the Ironic agent is the one starting the assisted-agent and also the one rebooting the node I don't think this is related to the assisted part of the converged flow
The BMH was detached while it was still in preparing state. oc describe bmh -n cnfde11 cnfde11.ptp.lab.eng.bos.redhat.com Name: cnfde11.ptp.lab.eng.bos.redhat.com Namespace: cnfde11 Labels: infraenvs.agent-install.openshift.io=cnfde11 Annotations: argocd.argoproj.io/sync-wave: 1 baremetalhost.metal3.io/detached: assisted-service-controller bmac.agent-install.openshift.io/hostname: cnfde11.ptp.lab.eng.bos.redhat.com bmac.agent-install.openshift.io/role: master ran.openshift.io/ztp-gitops-generated: {} API Version: metal3.io/v1alpha1 Kind: BareMetalHost Metadata: Creation Timestamp: 2022-07-06T15:54:16Z Finalizers: baremetalhost.metal3.io Generation: 2 Managed Fields: API Version: metal3.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: f:baremetalhost.metal3.io/detached: f:spec: f:customDeploy: .: f:method: Manager: assisted-service Operation: Update Time: 2022-07-06T15:54:16Z API Version: metal3.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:finalizers: .: v:"baremetalhost.metal3.io": Manager: baremetal-operator Operation: Update Time: 2022-07-06T15:54:16Z API Version: metal3.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:argocd.argoproj.io/sync-wave: f:bmac.agent-install.openshift.io/hostname: f:bmac.agent-install.openshift.io/role: f:kubectl.kubernetes.io/last-applied-configuration: f:ran.openshift.io/ztp-gitops-generated: f:labels: .: f:infraenvs.agent-install.openshift.io: f:spec: .: f:automatedCleaningMode: f:bmc: .: f:address: f:credentialsName: f:disableCertificateVerification: f:bootMACAddress: f:bootMode: f:online: f:rootDeviceHints: .: f:deviceName: Manager: kubectl-client-side-apply Operation: Update Time: 2022-07-06T15:54:16Z API Version: metal3.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:errorCount: f:errorMessage: f:goodCredentials: .: f:credentials: .: f:name: f:namespace: f:credentialsVersion: f:hardware: .: f:cpu: .: f:arch: f:clockMegahertz: f:count: f:flags: f:model: f:firmware: .: f:bios: .: f:date: f:vendor: f:version: f:hostname: f:nics: f:ramMebibytes: f:storage: f:systemVendor: .: f:manufacturer: f:productName: f:serialNumber: f:hardwareProfile: f:lastUpdated: f:operationHistory: .: f:deprovision: .: f:end: f:start: f:inspect: .: f:end: f:start: f:provision: .: f:end: f:start: f:register: .: f:end: f:start: f:operationalStatus: f:poweredOn: f:provisioning: .: f:ID: f:bootMode: f:image: .: f:url: f:raid: .: f:hardwareRAIDVolumes: f:softwareRAIDVolumes: f:rootDeviceHints: .: f:deviceName: f:state: f:triedCredentials: .: f:credentials: .: f:name: f:namespace: f:credentialsVersion: Manager: baremetal-operator Operation: Update Subresource: status Time: 2022-07-06T16:07:48Z Resource Version: 5800011 UID: bc5e2d8d-1ce3-4936-a064-806052192629 Spec: Automated Cleaning Mode: disabled Bmc: Address: redfish-virtualmedia+https://10.16.231.98/redfish/v1/Systems/1 Credentials Name: bmh-secret Disable Certificate Verification: true Boot MAC Address: 3c:ec:ef:1e:d3:5e Boot Mode: UEFI Custom Deploy: Method: start_assisted_install Online: true Root Device Hints: Device Name: /dev/sdb Status: Error Count: 0 Error Message: Good Credentials: Credentials: Name: bmh-secret Namespace: cnfde11 Credentials Version: 5792566 Hardware: Cpu: Arch: x86_64 Clock Megahertz: 3900 Count: 48 Flags: 3dnowprefetch abm acpi adx aes aperfmperf apic arat arch_capabilities arch_perfmon art avx avx2 avx512_vnni avx512bw avx512cd avx512dq avx512f avx512vl bmi1 bmi2 bts cat_l3 cdp_l3 clflush clflushopt clwb cmov constant_tsc cpuid cpuid_fault cqm cqm_llc cqm_mbm_local cqm_mbm_total cqm_occup_llc cx16 cx8 dca de ds_cpl dtes64 dtherm dts epb ept ept_ad erms est f16c flexpriority flush_l1d fma fpu fsgsbase fxsr hle ht ibpb ibrs ibrs_enhanced ida intel_ppin intel_pt invpcid invpcid_single lahf_lm lm mba mca mce md_clear mmx monitor movbe mpx msr mtrr nonstop_tsc nopl nx ospke pae pat pbe pcid pclmulqdq pdcm pdpe1gb pebs pge pku pln pni popcnt pse pse36 pts rdrand rdseed rdt_a rdtscp rep_good sdbg sep smap smep smx ss ssbd sse sse2 sse4_1 sse4_2 ssse3 stibp syscall tm tm2 tpr_shadow tsc tsc_adjust tsc_deadline_timer vme vmx vnmi vpid x2apic xgetbv1 xsave xsavec xsaveopt xsaves xtopology xtpr Model: Intel(R) Xeon(R) Gold 6212U CPU @ 2.40GHz Firmware: Bios: Date: 05/18/2021 Vendor: American Megatrends Inc. Version: 3.5 Hostname: api.cnfde11.ptp.lab.eng.bos.redhat.com Nics: Mac: ac:1f:6b:e1:1d:d2 Model: 0x8086 0x158b Name: ens2f0 Ip: 10.16.231.52 Mac: 3c:ec:ef:1e:d3:5e Model: 0x8086 0x37d2 Name: eno1 Mac: ac:1f:6b:e1:1d:d3 Model: 0x8086 0x158b Name: ens2f1 Mac: 3c:ec:ef:1e:d3:5f Model: 0x8086 0x37d2 Name: eno2 Ram Mebibytes: 98304 Storage: Model: INTEL SSDPELKX010T8 Name: /dev/nvme0n1 Size Bytes: 1000204886016 Type: NVME System Vendor: Manufacturer: Supermicro Product Name: Super Server (To be filled by O.E.M.) Serial Number: SHUBIWC00001 Hardware Profile: unknown Last Updated: 2022-07-06T16:07:48Z Operation History: Deprovision: End: <nil> Start: <nil> Inspect: End: 2022-07-06T16:07:48Z Start: 2022-07-06T15:54:38Z Provision: End: <nil> Start: <nil> Register: End: 2022-07-06T15:54:38Z Start: 2022-07-06T15:54:16Z Operational Status: OK Powered On: false Provisioning: ID: ffc5213a-8271-4942-982e-7831d543efdd Boot Mode: UEFI Image: URL: Raid: Hardware RAID Volumes: <nil> Software RAID Volumes: Root Device Hints: Device Name: /dev/sdb State: preparing Tried Credentials: Credentials: Name: bmh-secret Namespace: cnfde11 Credentials Version: 5792566 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Registered 39m metal3-baremetal-controller Registered new host Normal BMCAccessValidated 38m metal3-baremetal-controller Verified access to BMC Normal InspectionStarted 38m metal3-baremetal-controller Hardware inspection started Normal InspectionComplete 25m metal3-baremetal-controller Hardware inspection completed Normal ProfileSet 25m metal3-baremetal-controller Hardware profile set: unknown
After fixing the premature detach annotation in BMAC there is still an issue @tali : I looked at the ironic logs from the previous tests, the clean_step timed out as well. The only difference is that BMH is stuck in "provisioning" state (instead of "available" state without AI changes). I think we need something to drive Ironic cleaning complete and BMH transition to “provisioned”.
Ironic clean_step times out after the host is rebooted and installation is completed. BMH stuck in "provisioning" state after the premature detachment is fixed. I will raise a new BZ to track the new issue.
The BMH is no longer stuck in "“available” state after fixing the premature detach annotation in BMAC. It is now stuck in ““provisioning” state (tracked by 2106378).
Will move to verified per previous note.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Advanced Cluster Management 2.6.0 security updates and bug fixes), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6370