Bug 2112021
| Summary: | After shutting down 2 worker nodes on the MS provider cluster 2 mons are down and ceph health is not recovered | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Itzhak <ikave> | |
| Component: | odf-managed-service | Assignee: | Nobody <nobody> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Itzhak <ikave> | |
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.10 | CC: | aeyal, dbindra, fbalak, mmuench, ocs-bugs, odf-bz-bot, owasserm, rchikatw, tnielsen | |
| Target Milestone: | --- | Keywords: | Automation, Tracking | |
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2133683 (view as bug list) | Environment: | ||
| Last Closed: | 2023-03-14 15:37:27 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 2133683 | |||
| Bug Blocks: | ||||
|
Description
Itzhak
2022-07-28 16:06:34 UTC
Additional info: $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-135-143.ec2.internal Ready worker 104m v1.23.5+012e945 ip-10-0-137-202.ec2.internal Ready master 6h59m v1.23.5+012e945 ip-10-0-139-102.ec2.internal Ready infra,worker 6h36m v1.23.5+012e945 ip-10-0-147-116.ec2.internal Ready infra,worker 6h37m v1.23.5+012e945 ip-10-0-154-186.ec2.internal Ready master 7h v1.23.5+012e945 ip-10-0-158-49.ec2.internal Ready worker 46m v1.23.5+012e945 ip-10-0-163-159.ec2.internal Ready master 7h v1.23.5+012e945 ip-10-0-172-207.ec2.internal Ready worker 46m v1.23.5+012e945 ip-10-0-174-144.ec2.internal Ready infra,worker 6h37m v1.23.5+012e945 $ oc get pods -n openshift-storage NAME READY STATUS RESTARTS AGE addon-ocs-provider-qe-catalog-nfqm5 1/1 Running 0 49m alertmanager-managed-ocs-alertmanager-0 2/2 Running 0 49m alertmanager-managed-ocs-alertmanager-1 2/2 Running 0 49m alertmanager-managed-ocs-alertmanager-2 2/2 Running 0 49m csi-addons-controller-manager-b4495976c-l9xxz 2/2 Running 0 53m ocs-metrics-exporter-97cdff48f-zdsq4 1/1 Running 0 53m ocs-operator-5bf7c58cc9-gghmj 1/1 Running 0 53m ocs-osd-controller-manager-67658f4d75-hj6p2 2/3 Running 0 53m ocs-provider-server-67fd6b6885-kx95k 1/1 Running 0 53m odf-console-5f4494795-mdpmr 1/1 Running 0 53m odf-operator-controller-manager-7ff6cc9d4-8w662 2/2 Running 0 53m prometheus-managed-ocs-prometheus-0 3/3 Running 0 49m prometheus-operator-8547cc9f89-xp6wm 1/1 Running 0 53m rook-ceph-crashcollector-ip-10-0-135-143.ec2.internal-7b88cc69v 1/1 Running 0 104m rook-ceph-crashcollector-ip-10-0-158-49.ec2.internal-66cb6hwkgz 1/1 Running 0 47m rook-ceph-crashcollector-ip-10-0-172-207.ec2.internal-7577s66zw 1/1 Running 0 46m rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-5975758bqx7lf 1/2 CrashLoopBackOff 19 (2m14s ago) 58m rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-f9fd6d9bdr6zb 1/2 Running 16 (5m38s ago) 58m rook-ceph-mgr-a-6dc6b5bf94-lxjct 1/2 CrashLoopBackOff 19 (3m15s ago) 58m rook-ceph-mon-a-8d9b6979b-d6wq2 0/2 Pending 0 58m rook-ceph-mon-e-bbbc799b6-5dswt 0/2 Pending 0 58m rook-ceph-mon-f-6c8c6c979-bm5pb 2/2 Running 0 89m rook-ceph-operator-848fbd9dd7-wf9ph 1/1 Running 0 53m rook-ceph-osd-0-b47bcf64-nvcd7 1/2 Running 14 (5m48s ago) 58m rook-ceph-osd-1-6cdb75979c-kj5gt 1/2 Running 14 (6m28s ago) 58m rook-ceph-osd-10-84b6676bbb-tffdz 2/2 Running 0 77m rook-ceph-osd-11-6ff74fc9f4-xk2rr 2/2 Running 0 77m rook-ceph-osd-12-77f96b4dfd-564tm 2/2 Running 0 77m rook-ceph-osd-13-bd5dbc5f-sp8bx 2/2 Running 0 77m rook-ceph-osd-14-78c457f467-wf546 2/2 Running 0 77m rook-ceph-osd-2-689458fc4c-ntvxj 1/2 Running 14 (6m18s ago) 52m rook-ceph-osd-3-b9657b758-5f2nc 1/2 Running 14 (6m18s ago) 58m rook-ceph-osd-4-8499df47d7-sbr66 1/2 Running 14 (6m28s ago) 58m rook-ceph-osd-5-7bf556b477-z2t9z 1/2 Running 14 (6m48s ago) 58m rook-ceph-osd-6-676dcbc4f8-tflxl 1/2 Running 14 (5m48s ago) 58m rook-ceph-osd-7-7f5fdd757d-rxpvn 1/2 Running 14 (5m48s ago) 58m rook-ceph-osd-8-5754cc984b-wkrbn 1/2 Running 14 (5m48s ago) 58m rook-ceph-osd-9-fcbf77c67-hwskw 1/2 Running 14 (5m48s ago) 58m rook-ceph-tools-74fb4f5d9c-6pfvv 1/1 Running 0 53m *** Bug 2072612 has been marked as a duplicate of this bug. *** This will be verified by rolling shutdown test as shutting down 2 nodes at the same time is not supported case. I ran the test "test_rolling_shutdown_and_recovery_in_controlled_fashion": https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-odf-multicluster/1338/console, and it passed successfully. So, I am moving the bug to Verified. Provider cluster versions: OC version: Client Version: 4.10.24 Server Version: 4.10.50 Kubernetes Version: v1.23.12+8a6bfe4 OCS version: ocs-operator.v4.10.9 OpenShift Container Storage 4.10.9 ocs-operator.v4.10.5 Succeeded Cluster version NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.50 True False 4h25m Cluster version is 4.10.50 Rook version: rook: v4.10.9-0.b7b3a0044169fd9364683e2e4e6968361f8f3c08 go: go1.16.12 Ceph version: ceph version 16.2.7-126.el8cp (fe0af61d104d48cb9d116cde6e593b5fc8c197e4) pacific (stable) CSV version: NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.10.9 NooBaa Operator 4.10.9 mcg-operator.v4.10.8 Succeeded observability-operator.v0.0.20 Observability Operator 0.0.20 observability-operator.v0.0.19 Succeeded ocs-operator.v4.10.9 OpenShift Container Storage 4.10.9 ocs-operator.v4.10.5 Succeeded ocs-osd-deployer.v2.0.11 OCS OSD Deployer 2.0.11-11 ocs-osd-deployer.v2.0.10 Succeeded odf-csi-addons-operator.v4.10.9 CSI Addons 4.10.9 odf-csi-addons-operator.v4.10.5 Succeeded odf-operator.v4.10.9 OpenShift Data Foundation 4.10.9 odf-operator.v4.10.5 Succeeded ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 ose-prometheus-operator.4.8.0 Succeeded route-monitor-operator.v0.1.461-dbddf1f Route Monitor Operator 0.1.461-dbddf1f route-monitor-operator.v0.1.456-02ea942 Succeeded Closing this bug as fixed in v2.0.11 and tested by QE. |