Created attachment 1782320 [details] metal3-ironic-inspector logs Description of problem: ---------------- During adding of the bmh we got errors and problem in the IDRAC. [kni@ocp-edge24 ~]$ oc get bmh -A -o yaml status: errorCount: 1 errorMessage: 'Failed to inspect hardware. Reason: unable to start inspection: Redfish exception occurred. Error: iDRAC Redfish set boot device failed for node ee79f546-4aff-4d91-9828-5ebf08450dbe, because system 4c4c4544-0037-3610-8050-b1c04f325732 has no manager which could.' errorType: inspection error And another problem was with sushy_oem_idrac couldn't proceed with the operation because of jobs that are running on IDRAC even after we clear the jobs. Also we noticed there is a warning regarding "pending attribute that changed or pending job" but as we mentioned job has be cleared. ---------------- 2021-05-12 06:47:33.604 1 ERROR sushy_oem_idrac.resources.manager.manager [req-720fe7ce-66d5-4bda-8081-94e04fd8a8d5 ironic-user - - - -] Too many (10) retries, bailing out.: sushy.exceptions.BadRequestError: HTTP POST https://<ip>/redfish/v1/Managers/iDRAC.Embedded.1/Actions/Oem/EID_674_Manager.ImportSystemConfiguration returned code 400. Base.1.7.GeneralError: Unable to perform the import or export operation because there are pending attribute changes or a configuration job is in progress. Extended information: [{'Message': 'Unable to perform the import or export operation because there are pending attribute changes or a configuration job is in progress.', 'MessageArgs': [], 'MessageArgs': 0, 'MessageId': 'IDRAC.2.2.LC068', 'RelatedProperties': [], 'RelatedProperties': 0, 'Resolution': 'Apply or cancel any pending attribute changes. Changes can be applied by creating a targeted configuration job, or the changes can be cancelled by invoking the DeletePendingConfiguration method. If a configuration job is in progress, wait until it is completed before retrying the import or export system configuration operation.', 'Severity': 'Warning'}] ---------------- Version-Release number of selected component (if applicable): iDARC 4.40.0.0 Dell server R640 ----------------- How reproducible: 100% Steps to Reproduce: 1. vi add-bmh.yaml ----------------- apiVersion: v1 kind: Secret metadata: name: rwn-5-secret type: Opaque data: username: <base-64-username> password: <base-64-password> --- apiVersion: metal3.io/v1alpha1 kind: BareMetalHost metadata: name: openshift-worker-4 spec: online: true bmc: address: idrac-virtualmedia://<ip>/redfish/v1/Systems/System.Embedded.1 credentialsName: rwn-5-secret disableCertificateVerification: True username: <username> password: <password> bootMACAddress: <mac-address> ----------------- 2. oc create -f add-bmh.yaml -n openshift-machine-api 3. oc get bmh openshift-worker-4 -n openshift-machine-api -o yaml ----------------- Actual results: NAME STATE CONSUMER ONLINE ERROR openshift-master-0 externally provisioned ocp-edge3-ws94g-master-0 true openshift-master-1 externally provisioned ocp-edge3-ws94g-master-1 true openshift-master-2 externally provisioned ocp-edge3-ws94g-master-2 true openshift-worker-0 provisioned ocp-edge3-ws94g-worker-0-jkqn7 true openshift-worker-1 provisioned ocp-edge3-ws94g-worker-0-jhzcb true openshift-worker-4 inspecting true inspection error ----------------- [kni@ocp-edge24 ~]$ oc get bmh openshift-worker-4 -A -o yaml status: errorCount: 1 errorMessage: 'Failed to inspect hardware. Reason: unable to start inspection: Redfish exception occurred. Error: iDRAC Redfish set boot device failed for node ee79f546-4aff-4d91-9828-5ebf08450dbe, because system 4c4c4544-0037-3610-8050-b1c04f325732 has no manager which could.' errorType: inspection error ----------------- Expected results: Watch the "PROVISIONING STATUS" of the newly added bmh switching to "inspecting" and once finished to "ready" ----------------- NAME STATE CONSUMER ONLINE ERROR openshift-master-0 externally provisioned ocp-edge3-ws94g-master-0 true openshift-master-1 externally provisioned ocp-edge3-ws94g-master-1 true openshift-master-2 externally provisioned ocp-edge3-ws94g-master-2 true openshift-worker-0 provisioned ocp-edge3-ws94g-worker-0-jkqn7 true openshift-worker-1 provisioned ocp-edge3-ws94g-worker-0-jhzcb true openshift-worker-4 ready ----------------- Additional info: I attached logs from ironic containers. metal3-ironic-inspector logs metal3-ironic-conductor logs metal3-baremetal-operator logs
Created attachment 1782321 [details] metal3-ironic conductor logs
Created attachment 1782322 [details] metal3-baremetal-operator.log
> even after we clear the jobs Have you also tried a complete iDRAC reset?
(In reply to Dmitry Tantsur from comment #3) > > even after we clear the jobs > > Have you also tried a complete iDRAC reset? Yes, simple reset and configuration factory reset (except the NIC)
Can you try the following approach and tell us the result: <ajya> iurygregory: ok, interesting. Can they do anything else with the system? That is, can they manually (iDRAC web) create a BIOS or RAID job? Can they do Import manually? (Configuration->Server Configuration Profile->Import). <ajya> If it is possible manually, then there is something wrong going in sushy-oem-idrac, if they can't, then it's iDRAC
Look like it was issue with upgrade process, it seem like the 4.40.0.0 installed while there was problem and it need re-install. After we re-install 4.40.0.0 it was move to the next issue (provision network and IPv6)
Polina, thank you! Do you mind if I close this bz?
I think yes, thank you!