Bug 2187580
| Summary: | [GSS] CU deleted 3 nodes from AWS, 2 related to storage then 16 OSDs down, we try to rebuild OSDs and so on | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | lema |
| Component: | ceph | Assignee: | Radoslaw Zarzynski <rzarzyns> |
| ceph sub component: | RADOS | QA Contact: | Elad <ebenahar> |
| Status: | CLOSED DUPLICATE | Docs Contact: | |
| Severity: | high | ||
| Priority: | unspecified | CC: | bkunal, bniver, hnallurv, juqiao, lsantann, muagarwa, ocs-bugs, odf-bz-bot, rzarzyns, sostapov, tnielsen |
| Version: | 4.10 | Flags: | lsantann:
needinfo?
(rzarzyns) |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-05-04 06:47:13 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
lema
2023-04-18 03:28:48 UTC
ceph health detail
sh-4.4$ ceph health detail
HEALTH_WARN 1 clients failing to respond to capability release; 1 MDSs report slow metadata IOs; 1 MDSs report slow requests; Reduced data availability: 10 pgs inactive, 4 pgs incomplete; 1 daemons have recently crashed; 56 slow ops, oldest one blocked for 15249 sec, daemons [osd.13,osd.3,osd.5,osd.9] have slow ops.
[WRN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability release
mds.ocs-storagecluster-cephfilesystem-a(mds.0): Client ip-10-40-9-207:csi-cephfs-node failing to respond to capability release client_id: 6560476
[WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs
mds.ocs-storagecluster-cephfilesystem-a(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest blocked for 7513 secs
[WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
mds.ocs-storagecluster-cephfilesystem-a(mds.0): 3 slow requests are blocked > 30 secs
[WRN] PG_AVAILABILITY: Reduced data availability: 10 pgs inactive, 4 pgs incomplete
pg 2.1c is stuck inactive for 4h, current state unknown, last acting []
pg 2.24 is stuck inactive for 4h, current state unknown, last acting []
pg 2.27 is stuck inactive for 4h, current state unknown, last acting []
pg 2.3f is incomplete, acting [3,19,12] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 2.b8 is incomplete, acting [0,3,12] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 2.e9 is stuck inactive for 4h, current state unknown, last acting []
pg 2.189 is stuck inactive for 4h, current state unknown, last acting []
pg 2.1b2 is incomplete, acting [19,3,18] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 2.1c7 is incomplete, acting [13,0,3] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 4.30 is stuck inactive for 4h, current state unknown, last acting []
[WRN] RECENT_CRASH: 1 daemons have recently crashed
client.admin crashed on host rook-ceph-osd-15-75875f74b4-mcl2f at 2023-04-18T12:22:56.598997Z
[WRN] SLOW_OPS: 56 slow ops, oldest one blocked for 15249 sec, daemons [osd.13,osd.3,osd.5,osd.9] have slow ops.
|