DescriptionFujitsu container team
2021-07-28 02:32:21 UTC
Description of Problem:
 Note: This article is published in the public state at the request of RedHat.
Ironic node enters the clean failed state after delete_configuration clean step failure.
This is caused when hardwareRAIDVolumes is nil and the target node doesn't have a RAID controller.
The cause is that BuildRAIDCleanSteps function does not consider such case and always do delete_configration.
We have already created PR in Metal3 community to address such case.
https://github.com/metal3-io/baremetal-operator/pull/942
This PR adds the case that when hardwareRAIDVolumes is nil, keep the actual RAID configuration(does not do delete_configration).
Version-Release number of selected component:
This issue was detected in the Pre-GA version.
Red Hat OpenShift Container Platform Version Number: 4.9.0-0.nightly-2021-07-26-071921
Release Number: 4.9
Kubernetes Version: 1.21
Cri-o Version: 0.1.0
Related Component: None
Related Middleware/Application: None
Underlying RHCOS Release Number: 4.9
Underlying RHCOS Architecture: x86_64
Underlying RHCOS Kernel Version: 4.18.0
Drivers or hardware or architecture dependency:
This error occurs when the target node doesn't have a RAID controller.
How reproducible:
Always
Step to Reproduce:
1. Create install-config.yaml in clusterconfigs:
Worker machine does not install raid card.
$ vim ~/clusterconfigs/install-config.yaml
2. Create manifests:
$ openshift-baremetal-install --dir ~/clusterconfigs create manifests
3. Create cluster:
$ openshift-baremetal-install --dir ~/clusterconfigs --log-level debug create cluster
Actual Results:
Ironic node enters the clean failed state.
Expected Results:
Ironic node does not enter the clean failed state.
Summary of actions taken to resolve issue:
We need to merge upstream(Metal3) and downstream(RHOCP) PRs.
- Upstream: https://github.com/metal3-io/baremetal-operator/pull/942
- Downstream: https://github.com/openshift/baremetal-operator/pull/170
Location of diagnostic data:
None
Hardware configuration:
Model: RX2540 M4
Target Release:
RHOCP4.9
Additional Info:
None
Could you, please, verify this bz. We don't have Fujitsu machines to verify.
Comment 9Fujitsu container team
2021-08-24 02:28:07 UTC
Hi, Lubov
Yes, Fujitsu is going to verify it, please wait.
Best Regards,
Yasuhiro Futakawa
Comment 10Fujitsu container team
2021-08-26 01:27:07 UTC
Hi, Lubov,
Fujitsu verified that it works correctly with 4.9.0-0.nightly-2021-08-23-192406.
We also confirmed the fix of this BZ was included in this nightly build.
Best Regards,
Yasuhiro Futakawa
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2021:3759