Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2083445

Summary: [FJ OCP4.11 Bug]: RAID setting during IPI cluster deployment fails if iRMC port number is specified
Product: OpenShift Container Platform Reporter: Fujitsu container team <fj-lsoft-rh-cnt>
Component: InstallerAssignee: Jacob Anders <janders>
Installer sub component: OpenShift on Bare Metal IPI QA Contact: Amit Ugol <augol>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: ecosystem-partners-infrastructure, fj-lsoft-bm, hfukumot, jniu, kahara, mvalsecc, rpittau
Version: 4.11Keywords: OtherQA, Triaged
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: QJ220510-001
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 11:10:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1920358    

Description Fujitsu container team 2022-05-10 06:18:03 UTC
Customer Contact Name:

  Yasuhiro Futakawa

Description of Problem:

  When installing OpenShift against Fujitsu iRMC servers by IPI, if iRMC port is specified as 443 in the bmc address in install-config.yaml:

  -----------------------
  bmc:
    address: irmc://127.0.0.1:443
  -----------------------

  The following error occurs when configuring RAID:

  -----------------------
  Normal 14m metal3-baremetal-controller Node 87829fb9-ccc0-4bd0-8cd5-3ccf288b8473 failed step {‘interface’: ‘raid’, ‘step’: ‘delete_configuration’, ‘abortable’: False, ‘priority’: 0}: %d format: a number is required, not str
  -----------------------

  The reason of this error is that the type of irmc_port is not converted to int.
  Therefore, this error is not limited to "443", but occurs when any value is specified for port.

  Also, if we don't set the port, "443" is selected as the default value, which works fine.
  This error only occurs if we explicitly specify the port number.

  The following official documentation does not disclose how to configure the iRMC port, so port configuration is not supported in RHOCP(when using iRMC driver).
  https://docs.openshift.com/container-platform/4.10/installing/installing_bare_metal_ipi/ipi-install-installation-workflow.html#configuring-raid-for-worker-node_ipi-install-installation-workflow

  Therefore, the customer uses the default port number(443) and this error does not occur.
  So, in reality, this issue has no effect.
  However, Fujitsu uses DCI environment internally to constantly validate the latest OCPs.
  https://docs.distributed-ci.io/dci-openshift-agent/

  This error occurs in DCI, because the iRMC port must be explicitly specified when auto testing with DCI.
  So while this issue has no impact on our customers, it does have an impact on Fujitsu's CI environment.
  Therefore, I would like to fix it with the latest OCP.
  No backporting to past releases is required.

Version-Release number of selected component: 4.11

  This issue was detected in the Pre-GA version.
    Red Hat OpenShift Container Platform Version Number: 4.11
    Release Number:  4.11.0-0.nightly-2022-04-16-163450
    Kubernetes Version: 1.23.3
    Cri-o Version: 1.24.0
    Related Component: NONE
    Related Middleware/Application: None
    Underlying RHCOS Release Number: 8.5
    Underlying RHCOS Architecture: x86_64
    Underlying RHCOS Kernel Version: 4.18.0

Drivers or hardware or architecture dependency:

  Fujitsu iRMC driver

How reproducible:

  Every time

Step to Reproduce:

  $ openshift-install --dir ~/clusterconfigs create manifests

  Change the bmh file corresponding to the worker:
    $ vim ~/clusterconfigs/openshift/99_openshift-cluster-api_hosts-3.yaml

  1. Add the port number in its address field.
    address: irmc://192.168.1.1 -> address: irmc://192.168.1.1:443

  2. Add raid config to spec

  -----------------------
  spec:
    raid:
      hardwareRAIDVolumes:
      - level: "0"
        name: "raid0"
  -----------------------

  $openshift-install --dir ~/clusterconfigs --log-level debug create cluster

Actual Results:

  The worker node cannot be provisioned successfully, and the following error message appears in ironic:

  -----------------------
  Normal 14m metal3-baremetal-controller Node 87829fb9-ccc0-4bd0-8cd5-3ccf288b8473 failed step {‘interface’: ‘raid’, ‘step’: ‘delete_configuration’, ‘abortable’: False, ‘priority’: 0}: %d format: a number is required, not str
  -----------------------

Expected Results:

  Worker can provision successfully.

Summary of actions taken to resolve issue:

  Fujitsu has already sent PRs to Ironic:
  https://review.opendev.org/c/openstack/ironic/+/839122 (Merged)
  https://review.opendev.org/c/openstack/ironic/+/839675 (Merged)

  839122 is the patch for master branch, and it has been already merged.
  839675 is the patch for backport to yoga branch, since the latest OCP uses yoga branch.

Location of diagnostic data:

  None

Hardware configuration:

  Model: RX2540 M4

Target Release: OCP4.11

Comment 1 Riccardo Pittau 2022-05-10 16:47:32 UTC
the change for OCP 4.11 is included in https://github.com/openshift/ironic-image/pull/273 which upgrades all the packages to the latest available versions

Comment 2 Riccardo Pittau 2022-05-11 08:48:44 UTC
prevalidation tests have passed, I'm tagging the new packages in production and the code will be available today

Comment 3 Riccardo Pittau 2022-05-12 07:05:20 UTC
change has successfully merged and it's available in OCP 4.11
I'll leave to Jacob for the final verification

Comment 4 Fujitsu container team 2022-05-20 00:51:31 UTC
Hi,

Fujitsu verified that this bug was fixed with the latest nightly(4.11.0-0.nightly-2022-05-18-010528).

Best Regards,
Yasuhiro Futakawa

Comment 5 Jacob Anders 2022-06-01 11:12:04 UTC
Based on Yasuhiro's comment https://bugzilla.redhat.com/show_bug.cgi?id=2083445#c4, setting this to VERIFIED with OtherQA.

Comment 7 errata-xmlrpc 2022-08-10 11:10:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069