Bug 1992240 - Unable to install a zVM hosted OCP 4.7.24 on Z Cluster based on new RHCOS 47 RHEL 8.4 based build
Summary: Unable to install a zVM hosted OCP 4.7.24 on Z Cluster based on new RHCOS 47 ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Multi-Arch
Version: 4.7
Hardware: s390x
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.z
Assignee: Prashanth Sundararaman
QA Contact: Douglas Slavens
URL:
Whiteboard:
: 1992245 1992676 (view as bug list)
Depends On: 1950974
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-10 19:24 UTC by krmoser
Modified: 2021-09-01 18:24 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-01 18:23:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2709 0 None None None 2021-08-11 15:11:00 UTC
Red Hat Issue Tracker MULTIARCH-1579 0 None None None 2021-08-10 19:25:48 UTC
Red Hat Product Errata RHSA-2021:3262 0 None None None 2021-09-01 18:24:21 UTC

Description krmoser 2021-08-10 19:24:21 UTC
Description of problem:
1. When attempting to install a z/VM hosted OCP 4.7.24 on Z cluster, using the 4.7.0-0.nightly-s390x-2021-08-06-180814 build and the accompanying RHCOS 47.84.202108052001-0 build for the bootstrap node, the master/control nodes do not acquire their private network interface and the installation process does not continue, with the OCP cluster installation ultimately failing.

2. The master/control nodes do not acquire their network interface and hostname, with the master/control nodes booted but not network accessible, and with hostname "localhost".

3. For the OCP 4.7.23 on Z build, the corresponding required RHCOS build version is 47.83.202107271611-0, which is built on RHEL 8.3.  This consistently installs the OCP 4.7.23 on Z z/VM hosted cluster without issue.

4. For the OCP 4.7.24 on Z build, the corresponding required RHCOS build version is 47.84.202108052001-0, which is built on RHEL 8.4.  This consistently fails to install the OCP 4.7.24 on Z z/VM hosted cluster.

5. The OCP 4.7.24 on Z build is the first OCP 4.7.z build that includes an RHCOS 47 build version built on RHEL 8.4.


6. A workaround for this RHCOS 47.84 build install issue is to use the RHCOS 47.83 build 47.83.202107271611-0 for the bootstrap node install, after which all of the master/control nodes and worker/compute nodes will successfully install, including all network configuration.  These master/control nodes and worker/compute nodes will successfully install the required RHCOS 47.84.202108052001-0 build as part of the install process.

7. This OCP 4.7.24 on Z installation issue with the introduction of RHCOS 47.84 is very similar to the issue found with the initial introduction of RHCOS 48.84 for OCP 4.8 and as documented in Red Hat OpenShift bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1950974 



Version-Release number of selected component (if applicable):
OCP 4.7.24 (4.7.0-0.nightly-s390x-2021-08-06-180814) 

How reproducible:
Consistently reproducible

Steps to Reproduce:
1. Initiate installation of OCP 4.7.24 on Z cluster using the OCP 4.7.0-0.nightly-s390x-2021-08-06-180814 build and the corresponding RHCOS 47.84.202108052001-0 build.



Actual results:
The z/VM hosted OCP 4.7.24 on Z install process will fail as the master/control nodes do not configure their network interfaces.

Expected results:
The z/VM hosted OCP 4.7.24 on Z install process should succeed with the master/control nodes properly configuring their network interfaces.


Additional info:



Thank you.

Comment 1 Prashanth Sundararaman 2021-08-11 14:00:19 UTC
Yes. Because 4.7 moved to RHEL8.4, it pulled in the new systemd which has a bug because of which the system-connnections-merged directory does not mount: https://bugzilla.redhat.com/show_bug.cgi?id=1952686. Looks like https://github.com/openshift/machine-config-operator/pull/2543 needs to be backported to 4.7.

Kyle,

Just to confirm, can you check the status of this systemd unit? "systemctl status etc-NetworkManager-systems\x2dconnections\x2dmerged.mount"

Thanks
Prashanth

Comment 2 Dan Li 2021-08-11 14:20:09 UTC
Re-assigning to Prashanth as he has worked on similar bug before. The bug triage team is setting "Blocker+" as the bug blocks the build. Also adding "reviewed-in-sprint" as this is a new bug and may take time to fix.

Comment 3 Prashanth Sundararaman 2021-08-12 13:53:52 UTC
*** Bug 1992676 has been marked as a duplicate of this bug. ***

Comment 4 krmoser 2021-08-17 13:27:14 UTC
Folks,

The same or similar network configuration related issue(s) for the master/control plane nodes appears to be present with the RHCOS 47.84.202108161003-0 build specified for the OCP 4.7.25 4.7.0-0.nightly-s390x-2021-08-16-204650 build.

Thank you,
Kyle

Comment 5 Scott Dodson 2021-08-18 14:03:10 UTC
You're not using the bootimages defined in the installer which is not a supported installation path. We should fix this bug in case we ever bump the boot images but dropping blocker+.

Comment 6 Prashanth Sundararaman 2021-08-18 14:25:38 UTC
Hi Kyle,

Are you getting the RHCOS bootimages from https://releases-rhcos-art.cloud.privileged.psi.redhat.com/ ? The bootimages you should be using need to be aligned with the ones here: https://github.com/openshift/installer/blob/release-4.7/data/data/rhcos-s390x.json as the installer is the definitive source of bootimages.

These images are available here: https://mirror.openshift.com/pub/openshift-v4/s390x/dependencies/rhcos/4.7/latest/ and should be the ones used which is how the customers would install.

Thanks
Prashanth

Comment 7 Prashanth Sundararaman 2021-08-19 18:46:58 UTC
*** Bug 1992245 has been marked as a duplicate of this bug. ***

Comment 10 Prashanth Sundararaman 2021-08-25 15:44:34 UTC
Hi Kyle,

The latest nightlies have the fix: https://openshift-release-s390x.apps.ci.l2s4.p1.openshiftapps.com/#4.7.0-0.nightly-s390x . Could you please test it and confirm that the problem is fixed?

Thanks
Prashanth

Comment 11 krmoser 2021-08-25 19:24:26 UTC
Prashanth,

Thanks for the update.  

1. We've successfully performed multiple zVM hosted install tests with OCP 4.9 nightly build 4.7.0-0.nightly-s390x-2021-08-21-044859, with RHCOS 47.84.202108181404-0 build for the bootstrap

2. We're in the process of performing some additional zVM hosted tests with OCP 4.9 nightly build 4.7.0-0.nightly-s390x-2021-08-25-185227 and RHCOS 47.84.202108251004-0 for the bootstrap, and will provide a follow-on update here.

Thank you,
Kyle

Comment 12 krmoser 2021-08-25 19:28:17 UTC
(In reply to Prashanth Sundararaman from comment #6)
> Hi Kyle,
> 
> Are you getting the RHCOS bootimages from
> https://releases-rhcos-art.cloud.privileged.psi.redhat.com/ ? The bootimages
> you should be using need to be aligned with the ones here:
> https://github.com/openshift/installer/blob/release-4.7/data/data/rhcos-
> s390x.json as the installer is the definitive source of bootimages.
> 
> These images are available here:
> https://mirror.openshift.com/pub/openshift-v4/s390x/dependencies/rhcos/4.7/
> latest/ and should be the ones used which is how the customers would install.
> 
> Thanks
> Prashanth


Prashanth,

Thanks for the information.

For due diligence purposes, including to help find RHCOS build issues before an RHCOS build they may potentially be elevated/promoted to the latest customer available RHCOS build (for example at https://mirror.openshift.com/pub/openshift-v4/s390x/dependencies/rhcos/4.7/latest/), we test with both this latest customer available RHCOS build and the RHCOS build documented in the OCP 4.x vuild's release.txt file.

Thank you,
Kyle

Comment 13 krmoser 2021-08-26 13:24:53 UTC
Kyle(In reply to krmoser from comment #11)
> Prashanth,
> 
> Thanks for the update.  
> 
> 1. We've successfully performed multiple zVM hosted install tests with OCP
> 4.9 nightly build 4.7.0-0.nightly-s390x-2021-08-21-044859, with RHCOS
> 47.84.202108181404-0 build for the bootstrap
> 
> 2. We're in the process of performing some additional zVM hosted tests with
> OCP 4.9 nightly build 4.7.0-0.nightly-s390x-2021-08-25-185227 and RHCOS
> 47.84.202108251004-0 for the bootstrap, and will provide a follow-on update
> here.
> 
> Thank you,
> Kyle

My apologies for the typos in comment 11 above where I indicated "OCP 4.9" which I meant to indicate "OCP 4.7".

Thank you.

Comment 14 krmoser 2021-08-26 13:33:30 UTC
Prashanth,

We've successfully installed the following OCP 4.7 nightly builds in z/VM hosted environments, with the listed RHCOS latest and 47.84 builds for the bootstrap node.

1. OCP 4.7 nightly build 4.7.0-0.nightly-s390x-2021-08-21-044859
================================================================
  1. RHCOS 4.7.13 (latest) 
  2. RHCOS 47.84.202108181404-0
  3. RHCOS 47.84.202108251004-0


2. OCP 4.7 nightly build 4.7.0-0.nightly-s390x-2021-08-25-185227
================================================================ 
  1. RHCOS 4.7.13 (latest) 
  2. RHCOS 47.84.202108181404-0
  3. RHCOS 47.84.202108251004-0

Thank you,
Kyle

Comment 16 errata-xmlrpc 2021-09-01 18:23:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.7.28 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3262


Note You need to log in before you can comment on or make changes to this bug.