Bug 1995622 - [IBM Z] : Storage cluster creation fails and the monitor pods crash continuously on a Metro DR stretch cluster
Summary: [IBM Z] : Storage cluster creation fails and the monitor pods crash continuou...
Keywords:
Status: CLOSED DUPLICATE of bug 2000301
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Travis Nielsen
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-19 14:08 UTC by Sravika
Modified: 2023-08-09 17:03 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-13 17:43:05 UTC
Embargoed:


Attachments (Terms of Use)

Description Sravika 2021-08-19 14:08:16 UTC
Description of problem (please be detailed as possible and provide log
snippests):

Storage cluster creation fails and the monitor pods are in "CrashLoopBackOff" state on a Metro DR stretch cluster.

# oc get nodes -L topology.kubernetes.io/zone
NAME                             STATUS   ROLES    AGE   VERSION                ZONE
master-0.ocsm4205001.lnxne.boe   Ready    master   19h   v1.22.0-rc.0+3dfed96   arbiter
master-1.ocsm4205001.lnxne.boe   Ready    master   19h   v1.22.0-rc.0+3dfed96   datacenter1
master-2.ocsm4205001.lnxne.boe   Ready    master   19h   v1.22.0-rc.0+3dfed96   datacenter2
worker-0.ocsm4205001.lnxne.boe   Ready    worker   18h   v1.22.0-rc.0+3dfed96   datacenter1
worker-1.ocsm4205001.lnxne.boe   Ready    worker   18h   v1.22.0-rc.0+3dfed96   datacenter1
worker-2.ocsm4205001.lnxne.boe   Ready    worker   18h   v1.22.0-rc.0+3dfed96   datacenter2
worker-3.ocsm4205001.lnxne.boe   Ready    worker   18h   v1.22.0-rc.0+3dfed96   datacenter2


Version of all relevant components (if applicable):

OCP: 4.9.0-0.nightly-s390x-2021-08-17-145334
LSO: 4.9.0-202107210242
OCS-operator: 4.9.0-91.ci

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.Install OCP
2.Apply topology zone labels to OCP nodes
#oc label node master-0 topology.kubernetes.io/zone=arbiter
#oc label node master-1 worker-0 worker-1 topology.kubernetes.io/zone=datacenter1

#oc label node master-2 worker-2 worker-3 topology.kubernetes.io/zone=datacenter2

3. Install  Local storage Operator
4. Install Openshift Contianer Storage Operator
5.  Create Storage cluster.

Actual results:


Storage Cluster creation hangs and  monitor pods crash continuously


# oc get po -n openshift-storage
NAME                                            READY   STATUS             RESTARTS        AGE
csi-cephfsplugin-grl8r                          3/3     Running            0               3h56m
csi-cephfsplugin-h9g5z                          3/3     Running            0               3h56m
csi-cephfsplugin-lc8lx                          3/3     Running            0               3h56m
csi-cephfsplugin-provisioner-7b96dbcbff-cstpc   6/6     Running            0               3h56m
csi-cephfsplugin-provisioner-7b96dbcbff-g249g   6/6     Running            0               3h56m
csi-cephfsplugin-zxsfh                          3/3     Running            0               3h56m
csi-rbdplugin-2nlgq                             3/3     Running            0               3h56m
csi-rbdplugin-7bfqd                             3/3     Running            0               3h56m
csi-rbdplugin-provisioner-8665ff549b-vnvnx      6/6     Running            0               3h56m
csi-rbdplugin-provisioner-8665ff549b-zxskx      6/6     Running            0               3h56m
csi-rbdplugin-rm788                             3/3     Running            0               3h56m
csi-rbdplugin-v76fh                             3/3     Running            0               3h56m
noobaa-operator-d64687bc8-nfl6h                 1/1     Running            0               4h6m
ocs-metrics-exporter-7f98855d8-xvv8s            1/1     Running            0               4h6m
ocs-operator-5fb4896949-9qk5w                   0/1     Running            0               4h6m
rook-ceph-mon-a-869869cc7d-8z9pv                1/2     CrashLoopBackOff   51 (43s ago)    3h56m
rook-ceph-mon-b-6c6c6cb6b4-rqqsv                1/2     CrashLoopBackOff   48 (4m8s ago)   3h45m
rook-ceph-mon-c-cff959f5c-mdngp                 1/2     CrashLoopBackOff   46 (3m5s ago)   3h33m
rook-ceph-mon-d-767db5c6f9-6jvz7                1/2     CrashLoopBackOff   44 (104s ago)   3h22m
rook-ceph-mon-e-6586455cf4-nz4dg                1/2     CrashLoopBackOff   42 (54s ago)    3h10m
rook-ceph-operator-56f4df8697-d8mpt             1/1     Running            0               4h6m



# oc get csv -A
NAMESPACE                              NAME                                        DISPLAY                       VERSION              REPLACES   PHASE
openshift-local-storage                local-storage-operator.4.9.0-202107210242   Local Storage                 4.9.0-202107210242              Succeeded
openshift-operator-lifecycle-manager   packageserver                               Package Server                0.18.3                          Succeeded
openshift-storage                      ocs-operator.v4.9.0-91.ci                   OpenShift Container Storage   4.9.0-91.ci                     Installing


Expected results:

Storage Cluster creation should be successful

Additional info:

Must Gather Logs:


https://drive.google.com/file/d/1kAJXuTuNdp_t5NcjvfN1mVjLRvt19X0D/view?usp=sharing

Comment 2 Travis Nielsen 2021-09-13 17:43:05 UTC
This is actually being tracked by BZ 2000301, not sure how this one was missed being evaluated earlier.

*** This bug has been marked as a duplicate of bug 2000301 ***


Note You need to log in before you can comment on or make changes to this bug.