Bug 1915953
| Summary: | Must-gather takes hours to complete if the OCS cluster is not fully deployed, delay seen in ceph command collection step | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | Neha Berry <nberry> |
| Component: | must-gather | Assignee: | Pulkit Kundra <pkundra> |
| Status: | CLOSED ERRATA | QA Contact: | Oded <oviner> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.7 | CC: | ebenahar, muagarwa, nobody+410372, ocs-bugs, pkundra, sabose |
| Target Milestone: | --- | Keywords: | AutomationBackLog |
| Target Release: | OCS 4.7.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | 4.7.0-731.ci | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-05-19 09:18:01 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Neha Berry
2021-01-13 19:33:43 UTC
I too faced a similar issue like this, taking a lot of time during ceph cmds collection steps. [must-gather-cw76p] OUT gather logs unavailable: http2: server sent GOAWAY and closed the connection; LastStreamID=13, ErrCode=NO_ERROR, debug="" [must-gather-cw76p] OUT waiting for gather to complete [must-gather-cw76p] OUT gather never finished: timed out waiting for the condition [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-tkzd6 deleted [must-gather ] OUT namespace/openshift-must-gather-2vmj2 deleted error: gather never finished for pod must-gather-cw76p: timed out waiting for the condition In the end, cmd oc adm must-gather --image="quay.io/rhceph-dev/ocs-must-gather:latest-4.7" failed with the above error Tested on ocs-operator.v4.7.0-278.ci
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-955fl 3/3 Running 0 178m
csi-cephfsplugin-provisioner-5f84f94c57-2vcft 6/6 Running 0 178m
csi-cephfsplugin-provisioner-5f84f94c57-p29vb 6/6 Running 3 178m
csi-cephfsplugin-qv9qf 3/3 Running 0 178m
csi-cephfsplugin-rlphb 3/3 Running 0 178m
csi-rbdplugin-7td22 3/3 Running 0 178m
csi-rbdplugin-8cmhv 3/3 Running 0 178m
csi-rbdplugin-jqqx7 3/3 Running 0 178m
csi-rbdplugin-provisioner-68bd88fb68-lzjvn 6/6 Running 4 178m
csi-rbdplugin-provisioner-68bd88fb68-qrn7t 6/6 Running 0 178m
noobaa-core-0 1/1 Running 0 175m
noobaa-db-pg-0 0/1 Pending 0 175m
noobaa-operator-6fb598688b-vxx8d 1/1 Running 0 3h3m
ocs-metrics-exporter-64967ddb76-nxfck 1/1 Running 0 3h3m
ocs-operator-6fd8ccdcf5-vmrdf 1/1 Running 1 3h3m
rook-ceph-crashcollector-compute-0-8474776685-2c56z 1/1 Running 0 177m
rook-ceph-crashcollector-compute-1-5f7f757894-s4h9n 1/1 Running 0 176m
rook-ceph-crashcollector-compute-2-758fc7df9-656w5 1/1 Running 0 177m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-c4d5547c72srj 2/2 Running 0 174m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-6dffb4749d4wp 2/2 Running 0 174m
rook-ceph-mgr-a-679dff6dbd-8xbk5 2/2 Running 0 176m
rook-ceph-mon-a-b856bdcb-jn4hk 2/2 Running 0 177m
rook-ceph-mon-b-f6545f4fb-kb8rg 2/2 Running 0 177m
rook-ceph-mon-c-59dd86bf4d-kzpw4 2/2 Running 0 176m
rook-ceph-operator-7778fb54f9-5hfmw 1/1 Running 0 3h3m
rook-ceph-osd-0-598d454d8b-v2rz6 0/2 Init:1/9 0 57m
rook-ceph-osd-1-f88db587f-drp8t 0/2 Init:CrashLoopBackOff 34 154m
rook-ceph-osd-2-869799c9f6-tndl8 0/2 Init:CrashLoopBackOff 32 143m
rook-ceph-osd-prepare-ocs-deviceset-thin-0-data-0bwwhj-hvnxc 0/1 Completed 0 176m
rook-ceph-osd-prepare-ocs-deviceset-thin-1-data-0xr9tm-556wl 0/1 Completed 0 176m
rook-ceph-osd-prepare-ocs-deviceset-thin-2-data-0l696c-r7tpx 0/1 Completed 0 176m
rook-ceph-tools-5c5f779f59-9w7gb 1/1 Running 0 76m
============================
[root@compute-2 /]# ceph -s
cluster:
id: 03ed1f0c-6b32-40f6-979a-ca1412f9ef05
health: HEALTH_WARN
2 MDSs report slow metadata IOs
Reduced data availability: 176 pgs inactive
services:
mon: 3 daemons, quorum a,b,c (age 2h)
mgr: a(active, since 2h)
mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-b=up:creating} 1 up:standby-replay
osd: 3 osds: 0 up, 0 in
task status:
scrub status:
mds.ocs-storagecluster-cephfilesystem-a: idle
mds.ocs-storagecluster-cephfilesystem-b: idle
data:
pools: 10 pools, 176 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs: 100.000% pgs unknown
176 unknown
================================
With the total time is taken to collect logs
2021-03-02 09:10:40.355834012 +0000 UTC m=+0.409726534
2021-03-02 09:16:47.297227082 +0000 UTC m=+367.351119644
Gather debug log: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz1915953/gather-debug.log
Moving the bug to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2041 |