Bug 2140001

Summary: mon pods are not coming up in 4.6
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Vijay Avuthu <vavuthu>
Component: ocs-operatorAssignee: Mudit Agarwal <muagarwa>
Status: CLOSED DUPLICATE QA Contact: Martin Bukatovic <mbukatov>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.6CC: madam, ocs-bugs, odf-bz-bot, sostapov
Target Milestone: ---Keywords: Automation
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-04 12:12:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vijay Avuthu 2022-11-04 05:11:06 UTC
Description of problem (please be detailed as possible and provide log
snippests):

mon pods are not coming in 4.6


Version of all relevant components (if applicable):

openshift installer (4.6.0-0.nightly-2022-11-01-164348)
ocs-registry:4.6.16-210.ci

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
2/2

Can this issue reproduce from the UI?
not tried

If this is a regression, please provide more details to justify this:
Yes

Steps to Reproduce:
1. install FIPS enabled cluster using ocs-ci
2. check all pods are created or not
3.


Actual results:

2022-11-04 00:01:31  18:31:31 - MainThread - ocs_ci.utility.utils - INFO  - Executing command: oc -n openshift-storage get Pod  -n openshift-storage --selector=app=rook-ceph-mon -o yaml
2022-11-04 00:01:31  18:31:31 - MainThread - ocs_ci.ocs.ocp - INFO  - status of  at column STATUS - item(s) were [], but we were waiting for all 3 of them to be Running
2022-11-04 00:01:31  18:31:31 - MainThread - ocs_ci.utility.utils - INFO  - Going to sleep for 3 seconds before next iteration
2022-11-04 00:01:35  18:31:34 - MainThread - ocs_ci.ocs.ocp - ERROR  - timeout expired: Timed out after 900s running get("", True, "app=rook-ceph-mon")


Expected results:

all mon pods should be running

Additional info:

$ oc get pods
NAME                                                  READY   STATUS      RESTARTS   AGE
compute-0-debug                                       0/1     Completed   0          10h
compute-1-debug                                       0/1     Completed   0          10h
compute-2-debug                                       0/1     Completed   0          10h
csi-cephfsplugin-2s64f                                3/3     Running     0          10h
csi-cephfsplugin-j666b                                3/3     Running     0          10h
csi-cephfsplugin-lkb4m                                3/3     Running     0          10h
csi-cephfsplugin-provisioner-5bbfb886b8-tnhnc         6/6     Running     0          10h
csi-cephfsplugin-provisioner-5bbfb886b8-x5bp2         6/6     Running     0          10h
csi-rbdplugin-7q4mq                                   3/3     Running     0          10h
csi-rbdplugin-c89vg                                   3/3     Running     0          10h
csi-rbdplugin-mgtlr                                   3/3     Running     0          10h
csi-rbdplugin-provisioner-8644596f59-ppcqv            6/6     Running     0          10h
csi-rbdplugin-provisioner-8644596f59-r46xt            6/6     Running     0          10h
must-gather-w7q97-helper                              1/1     Running     0          10h
noobaa-operator-dd76d76c9-7f977                       1/1     Running     0          10h
ocs-metrics-exporter-65ddfd844b-7c7ll                 1/1     Running     0          10h
ocs-operator-5b887d69fd-5rt5q                         0/1     Running     0          10h
rook-ceph-crashcollector-compute-0-68bcc94444-cgn2r   0/1     Init:0/2    0          10h
rook-ceph-mon-a-6fd649d577-d5c67                      1/1     Running     0          10h
rook-ceph-operator-798bdd4699-fvvqg                   1/1     Running     0          10h


> rook-ceph-crashcollector is still in init stage

$ oc describe pod rook-ceph-crashcollector-compute-0-68bcc94444-cgn2r
Name:           rook-ceph-crashcollector-compute-0-68bcc94444-cgn2r
Namespace:      openshift-storage
Priority:       0
Node:           compute-0/10.1.112.122
Start Time:     Fri, 04 Nov 2022 00:14:56 +0530
Labels:         app=rook-ceph-crashcollector
                ceph-version=14.2.11-208
                ceph_daemon_id=crash
                crashcollector=crash
                kubernetes.io/hostname=compute-0
                node_name=compute-0
                pod-template-hash=68bcc94444
                rook-version=4.6-127.a177726.release_4.6
                rook_cluster=openshift-storage
Annotations:    openshift.io/scc: rook-ceph
Status:         Pending

Events:
  Type     Reason       Age                    From     Message
  ----     ------       ----                   ----     -------
  Warning  FailedMount  87m (x51 over 10h)     kubelet  Unable to attach or mount volumes: unmounted volumes=[rook-ceph-crash-collector-keyring], unattached volumes=[default-token-p9kp5 rook-ceph-crash-collector-keyring rook-config-override rook-ceph-log rook-ceph-crash]: timed out waiting for the condition
  Warning  FailedMount  51m (x47 over 10h)     kubelet  Unable to attach or mount volumes: unmounted volumes=[rook-ceph-crash-collector-keyring], unattached volumes=[rook-ceph-log rook-ceph-crash default-token-p9kp5 rook-ceph-crash-collector-keyring rook-config-override]: timed out waiting for the condition
  Warning  FailedMount  42m (x40 over 10h)     kubelet  Unable to attach or mount volumes: unmounted volumes=[rook-ceph-crash-collector-keyring], unattached volumes=[rook-ceph-crash-collector-keyring rook-config-override rook-ceph-log rook-ceph-crash default-token-p9kp5]: timed out waiting for the condition
  Warning  FailedMount  38m (x68 over 10h)     kubelet  Unable to attach or mount volumes: unmounted volumes=[rook-ceph-crash-collector-keyring], unattached volumes=[rook-config-override rook-ceph-log rook-ceph-crash default-token-p9kp5 rook-ceph-crash-collector-keyring]: timed out waiting for the condition
  Warning  FailedMount  17m (x50 over 9h)      kubelet  Unable to attach or mount volumes: unmounted volumes=[rook-ceph-crash-collector-keyring], unattached volumes=[rook-ceph-crash default-token-p9kp5 rook-ceph-crash-collector-keyring rook-config-override rook-ceph-log]: timed out waiting for the condition
  Warning  FailedMount  2m55s (x308 over 10h)  kubelet  MountVolume.SetUp failed for volume "rook-ceph-crash-collector-keyring" : secret "rook-ceph-crash-collector-keyring" not found



must gather logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-227vuf1cs33-t1/j-227vuf1cs33-t1_20221103T174049/logs/failed_testcase_ocs_logs_1667499206/deployment_ocs_logs/ocs_must_gather/

Comment 3 Vijay Avuthu 2022-11-04 12:12:29 UTC

*** This bug has been marked as a duplicate of bug 2139951 ***