Bug 2098123 - OCP 4.11 on Z build 4.11.0-0.nightly-s390x-2022-06-16-003753 fails to install for KVM and zVM environments
Summary: OCP 4.11 on Z build 4.11.0-0.nightly-s390x-2022-06-16-003753 fails to install...
Keywords:
Status: CLOSED DUPLICATE of bug 2098151
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Multi-Arch
Version: 4.11
Hardware: s390x
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Jeremy Poulin
QA Contact: Douglas Slavens
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-17 10:53 UTC by krmoser
Modified: 2022-06-18 07:51 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-06-17 16:25:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
master-0 node journalctl log (19.00 MB, text/plain)
2022-06-17 13:31 UTC, krmoser
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker MULTIARCH-2613 0 None None None 2022-06-17 11:01:28 UTC

Description krmoser 2022-06-17 10:53:35 UTC
Description of problem:
OCP 4.11 on Z build 4.11.0-0.nightly-s390x-2022-06-16-003753 does not install in zVM environments (and potentially KVM).

The installation does not proceed past attempting to install the network cluster operator.  The 3 master nodes remain in the "NotReady" state.



For zVM environments:
===============================================================================================================================================
Installs will fail with the network operator failing to complete installation with the following "oc get clusterversion", "oc get nodes", and "oc get co" output, as a repeatable example:


NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          13m     Unable to apply 4.11.0-0.nightly-s390x-2022-06-16-003753: an unknown error has occurred: MultipleErrors

NAME                                          STATUS     ROLES    AGE   VERSION
master-0.pok-96.ocptest.pok.stglabs.ibm.com   NotReady   master   13m   v1.24.0+25f9057
master-1.pok-96.ocptest.pok.stglabs.ibm.com   NotReady   master   13m   v1.24.0+25f9057
master-2.pok-96.ocptest.pok.stglabs.ibm.com   NotReady   master   13m   v1.24.0+25f9057


NAME                                       VERSION                                    AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                                                                                                                     
baremetal                                                                                                                          
cloud-controller-manager                   4.11.0-0.nightly-s390x-2022-06-16-003753   True        False         False      12m     
cloud-credential                                                                      True        False         False      12m     
cluster-autoscaler                                                                                                                 
config-operator                                                                                                                    
console                                                                                                                            
csi-snapshot-controller                                                                                                            
dns                                                                                                                                
etcd                                                                                                                               
image-registry                                                                                                                     
ingress                                                                                                                            
insights                                                                                                                           
kube-apiserver                                                                                                                     
kube-controller-manager                                                                                                            
kube-scheduler                                                                                                                     
kube-storage-version-migrator                                                                                                      
machine-api                                                                                                                        
machine-approver                                                                                                                   
machine-config                                                                                                                     
marketplace                                                                                                                        
monitoring                                                                                                                         
network                                                                               False       True          True       12m     The network is starting up
node-tuning                                                                                                                        
openshift-apiserver                                                                                                                
openshift-controller-manager                                                                                                       
openshift-samples                                                                                                                  
operator-lifecycle-manager                                                                                                         
operator-lifecycle-manager-catalog                                                                                                 
operator-lifecycle-manager-packageserver                                                                                           
service-ca                                                                                                                         
storage                                                                                                                            




Version-Release number of selected component (if applicable):
OCP 4.11 on Z build 4.11.0-0.nightly-s390x-2022-06-16-003753

How reproducible:
Consistently reproducible.

Steps to Reproduce:
1. Attempt to install OCP on Z build 4.11.0-0.nightly-s390x-2022-06-16-003753 in a zVM environment.


Actual results:
The OCP 4.11 on Z 4.11.0-0.nightly-s390x-2022-06-16-003753 cluster will fail to install, with the etcd and network cluster operators failing to complete installation.

Expected results:
The OCP 4.11 on Z 4.11.0-0.nightly-s390x-2022-06-16-003753 cluster should consistently successfully install.


Additional info:
Will provide the partial results of a must-gather a bit later this morning. 

Thank you.

Comment 1 krmoser 2022-06-17 13:24:42 UTC
Here is the output when attempting to collect an "oc adm must-gather":

[root@ospbmgr4 ~]# oc adm must-gather
[must-gather      ] OUT the server could not find the requested resource (get imagestreams.image.openshift.io must-gather)
[must-gather      ] OUT
[must-gather      ] OUT Using must-gather plug-in image: registry.redhat.io/openshift4/ose-must-gather:latest
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information:
ClusterID: c7c6149b-cbc7-4721-843c-4083934c88dc
ClusterVersion: Installing "4.11.0-0.nightly-s390x-2022-06-16-003753" for 3 hours: Working towards 4.11.0-0.nightly-s390x-2022-06-16-003753: 647 of 802 done (80% complete)
ClusterOperators:
        clusteroperator/authentication is not available (<missing>) because <missing>
        clusteroperator/baremetal is not available (<missing>) because <missing>
        clusteroperator/cluster-autoscaler is not available (<missing>) because <missing>
        clusteroperator/config-operator is not available (<missing>) because <missing>
        clusteroperator/console is not available (<missing>) because <missing>
        clusteroperator/csi-snapshot-controller is not available (<missing>) because <missing>
        clusteroperator/dns is not available (<missing>) because <missing>
        clusteroperator/etcd is not available (<missing>) because <missing>
        clusteroperator/image-registry is not available (<missing>) because <missing>
        clusteroperator/ingress is not available (<missing>) because <missing>
        clusteroperator/insights is not available (<missing>) because <missing>
        clusteroperator/kube-apiserver is not available (<missing>) because <missing>
        clusteroperator/kube-controller-manager is not available (<missing>) because <missing>
        clusteroperator/kube-scheduler is not available (<missing>) because <missing>
        clusteroperator/kube-storage-version-migrator is not available (<missing>) because <missing>
        clusteroperator/machine-api is not available (<missing>) because <missing>
        clusteroperator/machine-approver is not available (<missing>) because <missing>
        clusteroperator/machine-config is not available (<missing>) because <missing>
        clusteroperator/marketplace is not available (<missing>) because <missing>
        clusteroperator/monitoring is not available (<missing>) because <missing>
        clusteroperator/network is not available (The network is starting up) because DaemonSet "/openshift-ovn-kubernetes/ovn-ipsec" rollout is not making progress - last change 2022-06-17T10:31:45Z
DaemonSet "/openshift-ovn-kubernetes/ovnkube-master" rollout is not making progress - pod ovnkube-master-7bbwg is in CrashLoopBackOff State
DaemonSet "/openshift-ovn-kubernetes/ovnkube-master" rollout is not making progress - pod ovnkube-master-mft6s is in CrashLoopBackOff State
DaemonSet "/openshift-ovn-kubernetes/ovnkube-master" rollout is not making progress - pod ovnkube-master-nwbz5 is in CrashLoopBackOff State
DaemonSet "/openshift-ovn-kubernetes/ovnkube-master" rollout is not making progress - last change 2022-06-17T10:31:46Z
DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2022-06-17T10:31:46Z
        clusteroperator/node-tuning is not available (<missing>) because <missing>
        clusteroperator/openshift-apiserver is not available (<missing>) because <missing>
        clusteroperator/openshift-controller-manager is not available (<missing>) because <missing>
        clusteroperator/openshift-samples is not available (<missing>) because <missing>
        clusteroperator/operator-lifecycle-manager is not available (<missing>) because <missing>
        clusteroperator/operator-lifecycle-manager-catalog is not available (<missing>) because <missing>
        clusteroperator/operator-lifecycle-manager-packageserver is not available (<missing>) because <missing>
        clusteroperator/service-ca is not available (<missing>) because <missing>
        clusteroperator/storage is not available (<missing>) because <missing>


[must-gather      ] OUT namespace/openshift-must-gather-mp78s created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-2npds created
Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (containers "gather", "copy" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "gather", "copy" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or containers "gather", "copy" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "gather", "copy" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
[must-gather      ] OUT pod for plug-in image registry.redhat.io/openshift4/ose-must-gather:latest created

^C

Comment 2 krmoser 2022-06-17 13:31:11 UTC
Created attachment 1890914 [details]
master-0 node journalctl log

master-0 node journalctl log

Comment 3 Prashanth Sundararaman 2022-06-17 14:21:01 UTC
same as: https://bugzilla.redhat.com/show_bug.cgi?id=2098151

All nightly payloads are experiencing this issue and a fix is being worked on.

Comment 5 Jeremy Poulin 2022-06-17 16:25:25 UTC
Going to go ahead and close this as a duplicate. Based on the info in https://bugzilla.redhat.com/show_bug.cgi?id=2098123#c3.

*** This bug has been marked as a duplicate of bug 2098151 ***

Comment 6 krmoser 2022-06-18 06:53:04 UTC
FYI.  The OCP on Z Solution Test team has confirmed the same install issue exists with KVM environments.

Thank you.


Note You need to log in before you can comment on or make changes to this bug.