Bug 1986656 - [OCP4.9 Bug] Ironic node enters the clean failed state when the target node doesn't have a RAID controller.
Summary: [OCP4.9 Bug] Ironic node enters the clean failed state when the target node d...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.9.0
Assignee: Steven Hardy
QA Contact: Lubov
URL:
Whiteboard:
Depends On:
Blocks: 1920358
TreeView+ depends on / blocked
 
Reported: 2021-07-28 02:32 UTC by Fujitsu container team
Modified: 2021-10-18 17:42 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:42:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift baremetal-operator pull 170 0 None open Fix missing case of BuildRAIDCleanSteps 2021-07-28 02:32:21 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:42:58 UTC

Description Fujitsu container team 2021-07-28 02:32:21 UTC
Description of Problem:

 Note: This article is published in the public state at the request of RedHat.

  Ironic node enters the clean failed state after delete_configuration clean step failure.
  This is caused when hardwareRAIDVolumes is nil and the target node doesn't have a RAID controller.
  The cause is that BuildRAIDCleanSteps function does not consider such case and always do delete_configration.

  We have already created PR in Metal3 community to address such case.
  https://github.com/metal3-io/baremetal-operator/pull/942

  This PR adds the case that when hardwareRAIDVolumes is nil, keep the actual RAID configuration(does not do delete_configration).

Version-Release number of selected component:

  This issue was detected in the Pre-GA version.

  Red Hat OpenShift Container Platform Version Number: 4.9.0-0.nightly-2021-07-26-071921
  Release Number: 4.9
  Kubernetes Version: 1.21
  Cri-o Version: 0.1.0
  Related Component: None
  Related Middleware/Application: None
  Underlying RHCOS Release Number: 4.9
  Underlying RHCOS Architecture: x86_64
  Underlying RHCOS Kernel Version: 4.18.0

Drivers or hardware or architecture dependency:

  This error occurs when the target node doesn't have a RAID controller.

How reproducible:

  Always

Step to Reproduce:

  1. Create install-config.yaml in clusterconfigs:
     Worker machine does not install raid card.
     $ vim ~/clusterconfigs/install-config.yaml

  2. Create manifests:

     $ openshift-baremetal-install --dir ~/clusterconfigs create manifests

  3. Create cluster:

     $ openshift-baremetal-install --dir ~/clusterconfigs --log-level debug create cluster

Actual Results:

  Ironic node enters the clean failed state.

Expected Results:

  Ironic node does not enter the clean failed state.

Summary of actions taken to resolve issue:

  We need to merge upstream(Metal3) and downstream(RHOCP) PRs.
  - Upstream: https://github.com/metal3-io/baremetal-operator/pull/942
  - Downstream: https://github.com/openshift/baremetal-operator/pull/170

Location of diagnostic data:

  None

Hardware configuration:

  Model: RX2540 M4

Target Release:

  RHOCP4.9

Additional Info:

  None

Comment 8 Lubov 2021-08-23 15:58:34 UTC
Could you, please, verify this bz. We don't have Fujitsu machines to verify.

Comment 9 Fujitsu container team 2021-08-24 02:28:07 UTC
Hi, Lubov

Yes, Fujitsu is going to verify it, please wait.

Best Regards,
Yasuhiro Futakawa

Comment 10 Fujitsu container team 2021-08-26 01:27:07 UTC
Hi, Lubov,

Fujitsu verified that it works correctly with 4.9.0-0.nightly-2021-08-23-192406.
We also confirmed the fix of this BZ was included in this nightly build.

Best Regards,
Yasuhiro Futakawa

Comment 11 Lubov 2021-08-26 12:15:23 UTC
Good news, closing

Comment 14 errata-xmlrpc 2021-10-18 17:42:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.