Bug 2174612
| Summary: | [GSS] OSDs are crashing due to ceph_assert(r == 0) | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | kelwhite |
| Component: | ceph | Assignee: | Michael J. Kidd <linuxkidd> |
| ceph sub component: | RADOS | QA Contact: | Elad <ebenahar> |
| Status: | NEW --- | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | akupczyk, bhull, bkunal, bniver, bskopova, jansingh, lema, linuxkidd, lsantann, mduasope, mmuench, muagarwa, nojha, nravinas, odf-bz-bot, pdhange, rzarzyns, sapillai, sostapov, tnielsen, tpetr, vumrao |
| Version: | 4.10 | Flags: | linuxkidd:
needinfo-
|
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 2
kelwhite
2023-03-07 14:16:15 UTC
ceph health detail
sh-4.4$ ceph health detail
HEALTH_WARN 1 clients failing to respond to capability release; 1 MDSs report slow metadata IOs; 1 MDSs report slow requests; Reduced data availability: 10 pgs inactive, 4 pgs incomplete; 1 daemons have recently crashed; 56 slow ops, oldest one blocked for 15249 sec, daemons [osd.13,osd.3,osd.5,osd.9] have slow ops.
[WRN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability release
mds.ocs-storagecluster-cephfilesystem-a(mds.0): Client ip-10-40-9-207:csi-cephfs-node failing to respond to capability release client_id: 6560476
[WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs
mds.ocs-storagecluster-cephfilesystem-a(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest blocked for 7513 secs
[WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
mds.ocs-storagecluster-cephfilesystem-a(mds.0): 3 slow requests are blocked > 30 secs
[WRN] PG_AVAILABILITY: Reduced data availability: 10 pgs inactive, 4 pgs incomplete
pg 2.1c is stuck inactive for 4h, current state unknown, last acting []
pg 2.24 is stuck inactive for 4h, current state unknown, last acting []
pg 2.27 is stuck inactive for 4h, current state unknown, last acting []
pg 2.3f is incomplete, acting [3,19,12] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 2.b8 is incomplete, acting [0,3,12] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 2.e9 is stuck inactive for 4h, current state unknown, last acting []
pg 2.189 is stuck inactive for 4h, current state unknown, last acting []
pg 2.1b2 is incomplete, acting [19,3,18] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 2.1c7 is incomplete, acting [13,0,3] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 4.30 is stuck inactive for 4h, current state unknown, last acting []
[WRN] RECENT_CRASH: 1 daemons have recently crashed
client.admin crashed on host rook-ceph-osd-15-75875f74b4-mcl2f at 2023-04-18T12:22:56.598997Z
[WRN] SLOW_OPS: 56 slow ops, oldest one blocked for 15249 sec, daemons [osd.13,osd.3,osd.5,osd.9] have slow ops.
ceph osd df
sh-4.4$ ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
15 ssd 0.50000 1.00000 512 GiB 302 GiB 301 GiB 128 KiB 1.0 GiB 210 GiB 59.04 1.09 108 up
18 ssd 0.50000 1.00000 512 GiB 275 GiB 273 GiB 161 KiB 1.6 GiB 237 GiB 53.71 0.99 101 up
1 ssd 0.50000 1.00000 512 GiB 247 GiB 246 GiB 0 B 916 MiB 265 GiB 48.15 0.89 91 up
17 ssd 0.50000 1.00000 512 GiB 298 GiB 297 GiB 0 B 1.0 GiB 214 GiB 58.15 1.07 100 up
4 ssd 0.50000 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 up
6 ssd 0.50000 1.00000 512 GiB 225 GiB 224 GiB 116 KiB 1.5 GiB 287 GiB 43.94 0.81 84 up
13 ssd 0.50000 1.00000 512 GiB 199 GiB 198 GiB 247 KiB 995 MiB 313 GiB 38.79 0.71 80 up
12 ssd 0.50000 1.00000 512 GiB 313 GiB 312 GiB 0 B 996 MiB 199 GiB 61.12 1.12 103 up
7 ssd 0.50000 1.00000 512 GiB 244 GiB 243 GiB 127 KiB 1.4 GiB 268 GiB 47.71 0.88 94 up
11 ssd 0.50000 1.00000 512 GiB 263 GiB 262 GiB 36 KiB 984 MiB 249 GiB 51.40 0.95 98 up
20 ssd 0.50000 1.00000 512 GiB 243 GiB 242 GiB 77 KiB 1.6 GiB 269 GiB 47.56 0.87 87 up
9 ssd 0.50000 1.00000 512 GiB 288 GiB 287 GiB 0 B 906 MiB 224 GiB 56.22 1.03 97 up
3 ssd 0.50000 1.00000 512 GiB 261 GiB 260 GiB 191 KiB 550 MiB 251 GiB 50.88 0.94 100 up
2 ssd 0.50000 1.00000 512 GiB 288 GiB 287 GiB 0 B 1.1 GiB 224 GiB 56.17 1.03 104 up
5 ssd 0.50000 1.00000 512 GiB 269 GiB 268 GiB 0 B 828 MiB 243 GiB 52.46 0.96 87 up
8 ssd 0.50000 1.00000 512 GiB 292 GiB 291 GiB 276 KiB 1.4 GiB 220 GiB 57.06 1.05 108 up
10 ssd 0.50000 1.00000 512 GiB 284 GiB 283 GiB 14 MiB 972 MiB 228 GiB 55.42 1.02 103 up
14 ssd 0.50000 1.00000 512 GiB 326 GiB 325 GiB 174 KiB 1.1 GiB 186 GiB 63.66 1.17 116 up
0 ssd 0.50000 1.00000 512 GiB 331 GiB 329 GiB 205 KiB 1.3 GiB 181 GiB 64.58 1.19 117 up
16 ssd 0.50000 1.00000 512 GiB 324 GiB 323 GiB 44 KiB 1.1 GiB 188 GiB 63.24 1.16 110 up
19 ssd 0.50000 1.00000 512 GiB 299 GiB 298 GiB 147 KiB 1.4 GiB 213 GiB 58.41 1.07 113 up
TOTAL 10 TiB 5.4 TiB 5.4 TiB 16 MiB 23 GiB 4.6 TiB 54.38
MIN/MAX VAR: 0.71/1.19 STDDEV: 6.66
*** Bug 2187580 has been marked as a duplicate of this bug. *** |