Bug 2174612 - [GSS] OSDs are crashing due to ceph_assert(r == 0)
Summary: [GSS] OSDs are crashing due to ceph_assert(r == 0)
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ceph
Version: 4.10
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Michael J. Kidd
QA Contact: Elad
URL:
Whiteboard:
: 2187580 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-03-02 00:56 UTC by kelwhite
Modified: 2023-08-09 16:37 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
linuxkidd: needinfo-


Attachments (Terms of Use)

Comment 2 kelwhite 2023-03-07 14:16:15 UTC
Any updates on this BZ?

Comment 73 Levy Sant'Anna 2023-04-18 18:37:02 UTC
ceph health detail

sh-4.4$ ceph health detail
HEALTH_WARN 1 clients failing to respond to capability release; 1 MDSs report slow metadata IOs; 1 MDSs report slow requests; Reduced data availability: 10 pgs inactive, 4 pgs incomplete; 1 daemons have recently crashed; 56 slow ops, oldest one blocked for 15249 sec, daemons [osd.13,osd.3,osd.5,osd.9] have slow ops.
[WRN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability release
    mds.ocs-storagecluster-cephfilesystem-a(mds.0): Client ip-10-40-9-207:csi-cephfs-node failing to respond to capability release client_id: 6560476
[WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs
    mds.ocs-storagecluster-cephfilesystem-a(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest blocked for 7513 secs
[WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
    mds.ocs-storagecluster-cephfilesystem-a(mds.0): 3 slow requests are blocked > 30 secs
[WRN] PG_AVAILABILITY: Reduced data availability: 10 pgs inactive, 4 pgs incomplete
    pg 2.1c is stuck inactive for 4h, current state unknown, last acting []
    pg 2.24 is stuck inactive for 4h, current state unknown, last acting []
    pg 2.27 is stuck inactive for 4h, current state unknown, last acting []
    pg 2.3f is incomplete, acting [3,19,12] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 2.b8 is incomplete, acting [0,3,12] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 2.e9 is stuck inactive for 4h, current state unknown, last acting []
    pg 2.189 is stuck inactive for 4h, current state unknown, last acting []
    pg 2.1b2 is incomplete, acting [19,3,18] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 2.1c7 is incomplete, acting [13,0,3] (reducing pool ocs-storagecluster-cephblockpool min_size from 2 may help; search ceph.com/docs for 'incomplete')
    pg 4.30 is stuck inactive for 4h, current state unknown, last acting []
[WRN] RECENT_CRASH: 1 daemons have recently crashed
    client.admin crashed on host rook-ceph-osd-15-75875f74b4-mcl2f at 2023-04-18T12:22:56.598997Z
[WRN] SLOW_OPS: 56 slow ops, oldest one blocked for 15249 sec, daemons [osd.13,osd.3,osd.5,osd.9] have slow ops.

Comment 74 Levy Sant'Anna 2023-04-18 18:39:05 UTC
ceph osd df

sh-4.4$ ceph osd df
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     AVAIL    %USE   VAR   PGS  STATUS
15    ssd  0.50000   1.00000  512 GiB  302 GiB  301 GiB  128 KiB  1.0 GiB  210 GiB  59.04  1.09  108      up
18    ssd  0.50000   1.00000  512 GiB  275 GiB  273 GiB  161 KiB  1.6 GiB  237 GiB  53.71  0.99  101      up
 1    ssd  0.50000   1.00000  512 GiB  247 GiB  246 GiB      0 B  916 MiB  265 GiB  48.15  0.89   91      up
17    ssd  0.50000   1.00000  512 GiB  298 GiB  297 GiB      0 B  1.0 GiB  214 GiB  58.15  1.07  100      up
 4    ssd  0.50000         0      0 B      0 B      0 B      0 B      0 B      0 B      0     0    0      up
 6    ssd  0.50000   1.00000  512 GiB  225 GiB  224 GiB  116 KiB  1.5 GiB  287 GiB  43.94  0.81   84      up
13    ssd  0.50000   1.00000  512 GiB  199 GiB  198 GiB  247 KiB  995 MiB  313 GiB  38.79  0.71   80      up
12    ssd  0.50000   1.00000  512 GiB  313 GiB  312 GiB      0 B  996 MiB  199 GiB  61.12  1.12  103      up
 7    ssd  0.50000   1.00000  512 GiB  244 GiB  243 GiB  127 KiB  1.4 GiB  268 GiB  47.71  0.88   94      up
11    ssd  0.50000   1.00000  512 GiB  263 GiB  262 GiB   36 KiB  984 MiB  249 GiB  51.40  0.95   98      up
20    ssd  0.50000   1.00000  512 GiB  243 GiB  242 GiB   77 KiB  1.6 GiB  269 GiB  47.56  0.87   87      up
 9    ssd  0.50000   1.00000  512 GiB  288 GiB  287 GiB      0 B  906 MiB  224 GiB  56.22  1.03   97      up
 3    ssd  0.50000   1.00000  512 GiB  261 GiB  260 GiB  191 KiB  550 MiB  251 GiB  50.88  0.94  100      up
 2    ssd  0.50000   1.00000  512 GiB  288 GiB  287 GiB      0 B  1.1 GiB  224 GiB  56.17  1.03  104      up
 5    ssd  0.50000   1.00000  512 GiB  269 GiB  268 GiB      0 B  828 MiB  243 GiB  52.46  0.96   87      up
 8    ssd  0.50000   1.00000  512 GiB  292 GiB  291 GiB  276 KiB  1.4 GiB  220 GiB  57.06  1.05  108      up
10    ssd  0.50000   1.00000  512 GiB  284 GiB  283 GiB   14 MiB  972 MiB  228 GiB  55.42  1.02  103      up
14    ssd  0.50000   1.00000  512 GiB  326 GiB  325 GiB  174 KiB  1.1 GiB  186 GiB  63.66  1.17  116      up
 0    ssd  0.50000   1.00000  512 GiB  331 GiB  329 GiB  205 KiB  1.3 GiB  181 GiB  64.58  1.19  117      up
16    ssd  0.50000   1.00000  512 GiB  324 GiB  323 GiB   44 KiB  1.1 GiB  188 GiB  63.24  1.16  110      up
19    ssd  0.50000   1.00000  512 GiB  299 GiB  298 GiB  147 KiB  1.4 GiB  213 GiB  58.41  1.07  113      up
                       TOTAL   10 TiB  5.4 TiB  5.4 TiB   16 MiB   23 GiB  4.6 TiB  54.38
MIN/MAX VAR: 0.71/1.19  STDDEV: 6.66

Comment 92 lema 2023-05-04 06:47:13 UTC
*** Bug 2187580 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.