Bug 1832993 - RHCS 4.0 : OSDs flapping issue as workload is applied on the cluster
Summary: RHCS 4.0 : OSDs flapping issue as workload is applied on the cluster
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 4.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: z1
: 4.1
Assignee: Neha Ojha
QA Contact: Manohar Murthy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-07 15:20 UTC by karan singh
Modified: 2020-05-10 10:00 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-10 10:00:43 UTC
Embargoed:


Attachments (Terms of Use)
Logs from one of the OSD (8.31 MB, text/plain)
2020-05-07 15:22 UTC, karan singh
no flags Details
COSBench Workload file (23.80 KB, application/xml)
2020-05-07 15:26 UTC, karan singh
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 44532 0 None None None 2020-05-07 16:56:28 UTC
Github ceph ceph pull 33910 0 None closed osd/PeeringState: do not trim pg log past last_update_ondisk 2021-01-26 10:32:08 UTC
Github ceph ceph pull 34957 0 None closed nautilus: osd/PeeringState: do not trim pg log past last_update_ondisk 2021-01-26 10:32:51 UTC

Description karan singh 2020-05-07 15:20:46 UTC
Description of problem:

I am performing a benchmarking activity where i am supposed to run workload through COSBench. As soon as i am applying workload the OSDs in ceph cluster start to flap.


Version-Release number of selected component (if applicable):

Image : registry.redhat.io/rhceph/rhceph-4-rhel8:latest
ceph version 14.2.4-125.el8cp (db63624068590e593c47150c7574d08c1ec0d3e4) nautilus (stable)

How reproducible:

ALWAYS, whenever i apply load

Steps to Reproduce:
1. RHCS 4.0 Continerized deployment
2. 12 x RGW instances
3. Run COSBench workload (file attached)

Actual results:
- As soon as i am running cosbench workload, OSDs starts to flap and the entire cluster becomes unstable


Expected results:

- workload should complete performance degradation
- OSDs should not flap


Additional info:

Ceph.conf in use
=================

[client.rgw.rgw-5.rgw0]
host = rgw-5
keyring = /var/lib/ceph/radosgw/ceph-rgw.rgw-5.rgw0/keyring
log file = /var/log/ceph/ceph-rgw-rgw-5.rgw0.log
rgw frontends = beast endpoint=172.18.7.55:8080
rgw thread pool size = 1024

[client.rgw.rgw-5.rgw1]
host = rgw-5
keyring = /var/lib/ceph/radosgw/ceph-rgw.rgw-5.rgw1/keyring
log file = /var/log/ceph/ceph-rgw-rgw-5.rgw1.log
rgw frontends = beast endpoint=172.18.7.55:8081
rgw thread pool size = 1024


[global]
bluefs_buffered_io = False
cluster network = 172.18.7.0/24
fsid = ebe0aa4b-4fb5-4c68-84ab-cbf1118937a2
log_file = /dev/null
max_open_files = 500000
mon host = [v2:172.18.7.51:3300,v1:172.18.7.51:6789],[v2:172.18.7.52:3300,v1:172.18.7.52:6789],[v2:172.18.7.53:3300,v1:172.18.7.53:6789]
mon_allow_pool_delete = True
mon_max_pg_per_osd = 1000
ms_dispatch_throttle_bytes = 1048576000
objecter_inflight_op_bytes = 1048576000
objecter_inflight_ops = 5120
osd_enable_op_tracker = False
osd_max_pg_log_entries = 10
osd_min_pg_log_entries = 10
osd_pg_log_dups_tracked = 10
osd_pg_log_trim_min = 10
public network = 172.18.7.0/24
rgw_list_buckets_max_chunk = 999999
osd_op_thread_timeout = 900
osd_op_thread_suicide_timeout = 2000

Comment 1 karan singh 2020-05-07 15:22:43 UTC
Created attachment 1686216 [details]
Logs from one of the OSD

Attached are the logs from one of the OSD

Comment 2 karan singh 2020-05-07 15:24:14 UTC
Here are some Ceph commands outputs


[root@rgw-4 /]# ceph -s
  cluster:
    id:     ebe0aa4b-4fb5-4c68-84ab-cbf1118937a2
    health: HEALTH_WARN
            1 osds down
            Reduced data availability: 3 pgs inactive, 3 pgs incomplete
            Degraded data redundancy: 448392/146731677 objects degraded (0.306%), 345 pgs degraded
            24 slow ops, oldest one blocked for 4014 sec, daemons [mon,rgw-1,mon,rgw-2,mon,rgw-3] have slow ops.

  services:
    mon: 3 daemons, quorum rgw-1,rgw-2,rgw-3 (age 47m)
    mgr: rgw-5(active, since 38m), standbys: rgw-4, rgw-6
    osd: 318 osds: 317 up (since 3s), 318 in (since 47h); 21 remapped pgs
    rgw: 12 daemons active (rgw-1.rgw0, rgw-1.rgw1, rgw-2.rgw0, rgw-2.rgw1, rgw-3.rgw0, rgw-3.rgw1, rgw-4.rgw0, rgw-4.rgw1, rgw-5.rgw0, rgw-5.rgw1, rgw-6.rgw0, rgw-6.rgw1)

  task status:

  data:
    pools:   7 pools, 21760 pgs
    objects: 24.46M objects, 1.4 TiB
    usage:   260 TiB used, 4.5 PiB / 4.8 PiB avail
    pgs:     0.142% pgs not active
             448392/146731677 objects degraded (0.306%)
             21395 active+clean
             269   active+undersized+degraded+remapped+backfill_wait
             41    active+recovery_wait+undersized+degraded+remapped
             28    activating+undersized+degraded+remapped
             14    active+undersized
             4     active+undersized+degraded
             3     remapped+incomplete
             3     active+undersized+remapped+backfill_wait
             2     active+undersized+degraded+remapped+backfilling
             1     active+recovering+undersized+degraded+remapped

  io:
    client:   1.7 KiB/s rd, 1 op/s rd, 0 op/s wr
    recovery: 9.8 KiB/s, 0 objects/s

[root@rgw-4 /]#
[root@rgw-4 /]#
[root@rgw-4 /]#
[root@rgw-4 /]#
[root@rgw-4 /]#
[root@rgw-4 /]# ceph df
RAW STORAGE:
    CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
    hdd       4.8 PiB     4.5 PiB     260 TiB      260 TiB          5.34
    TOTAL     4.8 PiB     4.5 PiB     260 TiB      260 TiB          5.34

POOLS:
    POOL                          ID     STORED      OBJECTS     USED        %USED     MAX AVAIL
    .rgw.root                     10     1.2 KiB           4     768 KiB         0       1.4 PiB
    default.rgw.control           11         0 B           8         0 B         0       1.5 PiB
    default.rgw.meta              12     302 KiB       1.25k     235 MiB         0       1.4 PiB
    default.rgw.log               13         0 B         208         0 B         0       1.4 PiB
    default.rgw.buckets.index     14         0 B         626         0 B         0       1.4 PiB
    default.rgw.data.root         15         0 B           0         0 B         0       1.4 PiB
    default.rgw.buckets.data      16     1.5 TiB      24.45M     8.7 TiB      0.20       2.9 PiB
[root@rgw-4 /]#
[root@rgw-4 /]#
[root@rgw-4 /]#
[root@rgw-4 /]#
[root@rgw-4 /]# ceph osd tree
ID  CLASS WEIGHT     TYPE NAME       STATUS REWEIGHT PRI-AFF
 -1       4878.71826 root default
 -7        813.11523     host rgw-1
  0   hdd   15.34180         osd.0       up  1.00000 1.00000
  1   hdd   15.34180         osd.1       up  1.00000 1.00000
  2   hdd   15.34180         osd.2       up  1.00000 1.00000
  3   hdd   15.34180         osd.3       up  1.00000 1.00000
  4   hdd   15.34180         osd.4       up  1.00000 1.00000
  5   hdd   15.34180         osd.5       up  1.00000 1.00000
  6   hdd   15.34180         osd.6       up  1.00000 1.00000
  7   hdd   15.34180         osd.7       up  1.00000 1.00000
  8   hdd   15.34180         osd.8       up  1.00000 1.00000
  9   hdd   15.34180         osd.9       up  1.00000 1.00000
 10   hdd   15.34180         osd.10      up  1.00000 1.00000
 11   hdd   15.34180         osd.11      up  1.00000 1.00000
 14   hdd   15.34180         osd.14      up  1.00000 1.00000
 15   hdd   15.34180         osd.15      up  1.00000 1.00000
 18   hdd   15.34180         osd.18      up  1.00000 1.00000
 19   hdd   15.34180         osd.19      up  1.00000 1.00000
 20   hdd   15.34180         osd.20      up  1.00000 1.00000
 23   hdd   15.34180         osd.23      up  1.00000 1.00000
 24   hdd   15.34180         osd.24      up  1.00000 1.00000
 26   hdd   15.34180         osd.26      up  1.00000 1.00000
 28   hdd   15.34180         osd.28      up  1.00000 1.00000
 29   hdd   15.34180         osd.29      up  1.00000 1.00000
 32   hdd   15.34180         osd.32      up  1.00000 1.00000
 33   hdd   15.34180         osd.33      up  1.00000 1.00000
 35   hdd   15.34180         osd.35      up  1.00000 1.00000
 37   hdd   15.34180         osd.37      up  1.00000 1.00000
 38   hdd   15.34180         osd.38      up  1.00000 1.00000
 41   hdd   15.34180         osd.41      up  1.00000 1.00000
 42   hdd   15.34180         osd.42      up  1.00000 1.00000
 45   hdd   15.34180         osd.45      up  1.00000 1.00000
 46   hdd   15.34180         osd.46      up  1.00000 1.00000
 47   hdd   15.34180         osd.47      up  1.00000 1.00000
 50   hdd   15.34180         osd.50      up  1.00000 1.00000
 51   hdd   15.34180         osd.51      up  1.00000 1.00000
 54   hdd   15.34180         osd.54      up  1.00000 1.00000
 55   hdd   15.34180         osd.55      up  1.00000 1.00000
 58   hdd   15.34180         osd.58      up  1.00000 1.00000
 59   hdd   15.34180         osd.59      up  1.00000 1.00000
 61   hdd   15.34180         osd.61      up  1.00000 1.00000
 63   hdd   15.34180         osd.63      up  1.00000 1.00000
 65   hdd   15.34180         osd.65      up  1.00000 1.00000
 67   hdd   15.34180         osd.67      up  1.00000 1.00000
 68   hdd   15.34180         osd.68      up  1.00000 1.00000
 71   hdd   15.34180         osd.71      up  1.00000 1.00000
 72   hdd   15.34180         osd.72      up  1.00000 1.00000
 75   hdd   15.34180         osd.75      up  1.00000 1.00000
 76   hdd   15.34180         osd.76      up  1.00000 1.00000
 79   hdd   15.34180         osd.79      up  1.00000 1.00000
 80   hdd   15.34180         osd.80      up  1.00000 1.00000
 83   hdd   15.34180         osd.83      up  1.00000 1.00000
 84   hdd   15.34180         osd.84      up  1.00000 1.00000
 87   hdd   15.34180         osd.87      up  1.00000 1.00000
 88   hdd   15.34180         osd.88      up  1.00000 1.00000
 -5        813.11523     host rgw-2
 13   hdd   15.34180         osd.13      up  1.00000 1.00000
 17   hdd   15.34180         osd.17      up  1.00000 1.00000
 22   hdd   15.34180         osd.22      up  1.00000 1.00000
 27   hdd   15.34180         osd.27      up  1.00000 1.00000
 31   hdd   15.34180         osd.31      up  1.00000 1.00000
 36   hdd   15.34180         osd.36      up  1.00000 1.00000
 40   hdd   15.34180         osd.40      up  1.00000 1.00000
 44   hdd   15.34180         osd.44      up  1.00000 1.00000
 49   hdd   15.34180         osd.49      up  1.00000 1.00000
 53   hdd   15.34180         osd.53      up  1.00000 1.00000
 57   hdd   15.34180         osd.57      up  1.00000 1.00000
 62   hdd   15.34180         osd.62      up  1.00000 1.00000
 66   hdd   15.34180         osd.66      up  1.00000 1.00000
 70   hdd   15.34180         osd.70      up  1.00000 1.00000
 74   hdd   15.34180         osd.74      up  1.00000 1.00000
 78   hdd   15.34180         osd.78      up  1.00000 1.00000
 82   hdd   15.34180         osd.82      up  1.00000 1.00000
 86   hdd   15.34180         osd.86      up  1.00000 1.00000
 90   hdd   15.34180         osd.90      up  1.00000 1.00000
 92   hdd   15.34180         osd.92      up  1.00000 1.00000
 94   hdd   15.34180         osd.94      up  1.00000 1.00000
 96   hdd   15.34180         osd.96      up  1.00000 1.00000
 98   hdd   15.34180         osd.98      up  1.00000 1.00000
100   hdd   15.34180         osd.100     up  1.00000 1.00000
102   hdd   15.34180         osd.102     up  1.00000 1.00000
104   hdd   15.34180         osd.104     up  1.00000 1.00000
106   hdd   15.34180         osd.106     up  1.00000 1.00000
108   hdd   15.34180         osd.108     up  1.00000 1.00000
110   hdd   15.34180         osd.110     up  1.00000 1.00000
112   hdd   15.34180         osd.112     up  1.00000 1.00000
114   hdd   15.34180         osd.114     up  1.00000 1.00000
116   hdd   15.34180         osd.116     up  1.00000 1.00000
118   hdd   15.34180         osd.118     up  1.00000 1.00000
120   hdd   15.34180         osd.120     up  1.00000 1.00000
122   hdd   15.34180         osd.122     up  1.00000 1.00000
124   hdd   15.34180         osd.124     up  1.00000 1.00000
126   hdd   15.34180         osd.126     up  1.00000 1.00000
128   hdd   15.34180         osd.128     up  1.00000 1.00000
130   hdd   15.34180         osd.130     up  1.00000 1.00000
132   hdd   15.34180         osd.132     up  1.00000 1.00000
134   hdd   15.34180         osd.134     up  1.00000 1.00000
136   hdd   15.34180         osd.136     up  1.00000 1.00000
138   hdd   15.34180         osd.138     up  1.00000 1.00000
140   hdd   15.34180         osd.140     up  1.00000 1.00000
142   hdd   15.34180         osd.142     up  1.00000 1.00000
144   hdd   15.34180         osd.144     up  1.00000 1.00000
146   hdd   15.34180         osd.146     up  1.00000 1.00000
148   hdd   15.34180         osd.148     up  1.00000 1.00000
150   hdd   15.34180         osd.150     up  1.00000 1.00000
152   hdd   15.34180         osd.152     up  1.00000 1.00000
154   hdd   15.34180         osd.154     up  1.00000 1.00000
156   hdd   15.34180         osd.156     up  1.00000 1.00000
158   hdd   15.34180         osd.158     up  1.00000 1.00000
 -3        813.11523     host rgw-3
 43   hdd   15.34180         osd.43      up  1.00000 1.00000
 52   hdd   15.34180         osd.52      up  1.00000 1.00000
107   hdd   15.34180         osd.107     up  1.00000 1.00000
109   hdd   15.34180         osd.109     up  1.00000 1.00000
111   hdd   15.34180         osd.111     up  1.00000 1.00000
113   hdd   15.34180         osd.113     up  1.00000 1.00000
115   hdd   15.34180         osd.115     up  1.00000 1.00000
117   hdd   15.34180         osd.117     up  1.00000 1.00000
119   hdd   15.34180         osd.119     up  1.00000 1.00000
121   hdd   15.34180         osd.121     up  1.00000 1.00000
123   hdd   15.34180         osd.123     up  1.00000 1.00000
125   hdd   15.34180         osd.125     up  1.00000 1.00000
127   hdd   15.34180         osd.127     up  1.00000 1.00000
129   hdd   15.34180         osd.129     up  1.00000 1.00000
131   hdd   15.34180         osd.131     up  1.00000 1.00000
133   hdd   15.34180         osd.133     up  1.00000 1.00000
135   hdd   15.34180         osd.135     up  1.00000 1.00000
137   hdd   15.34180         osd.137     up  1.00000 1.00000
139   hdd   15.34180         osd.139     up  1.00000 1.00000
141   hdd   15.34180         osd.141     up  1.00000 1.00000
143   hdd   15.34180         osd.143     up  1.00000 1.00000
145   hdd   15.34180         osd.145     up  1.00000 1.00000
147   hdd   15.34180         osd.147     up  1.00000 1.00000
149   hdd   15.34180         osd.149     up  1.00000 1.00000
151   hdd   15.34180         osd.151     up  1.00000 1.00000
153   hdd   15.34180         osd.153     up  1.00000 1.00000
155   hdd   15.34180         osd.155     up  1.00000 1.00000
157   hdd   15.34180         osd.157     up  1.00000 1.00000
187   hdd   15.34180         osd.187     up  1.00000 1.00000
188   hdd   15.34180         osd.188     up  1.00000 1.00000
190   hdd   15.34180         osd.190     up  1.00000 1.00000
191   hdd   15.34180         osd.191     up  1.00000 1.00000
192   hdd   15.34180         osd.192     up  1.00000 1.00000
193   hdd   15.34180         osd.193     up  1.00000 1.00000
194   hdd   15.34180         osd.194     up  1.00000 1.00000
196   hdd   15.34180         osd.196     up  1.00000 1.00000
197   hdd   15.34180         osd.197     up  1.00000 1.00000
198   hdd   15.34180         osd.198     up  1.00000 1.00000
199   hdd   15.34180         osd.199     up  1.00000 1.00000
200   hdd   15.34180         osd.200     up  1.00000 1.00000
202   hdd   15.34180         osd.202     up  1.00000 1.00000
203   hdd   15.34180         osd.203     up  1.00000 1.00000
204   hdd   15.34180         osd.204     up  1.00000 1.00000
205   hdd   15.34180         osd.205     up  1.00000 1.00000
206   hdd   15.34180         osd.206     up  1.00000 1.00000
208   hdd   15.34180         osd.208     up  1.00000 1.00000
209   hdd   15.34180         osd.209     up  1.00000 1.00000
210   hdd   15.34180         osd.210     up  1.00000 1.00000
211   hdd   15.34180         osd.211     up  1.00000 1.00000
212   hdd   15.34180         osd.212     up  1.00000 1.00000
214   hdd   15.34180         osd.214     up  1.00000 1.00000
215   hdd   15.34180         osd.215     up  1.00000 1.00000
216   hdd   15.34180         osd.216     up  1.00000 1.00000
 -9        813.11523     host rgw-4
265   hdd   15.34180         osd.265     up  1.00000 1.00000
268   hdd   15.34180         osd.268     up  1.00000 1.00000
271   hdd   15.34180         osd.271     up  1.00000 1.00000
273   hdd   15.34180         osd.273     up  1.00000 1.00000
276   hdd   15.34180         osd.276     up  1.00000 1.00000
279   hdd   15.34180         osd.279     up  1.00000 1.00000
281   hdd   15.34180         osd.281     up  1.00000 1.00000
284   hdd   15.34180         osd.284     up  1.00000 1.00000
287   hdd   15.34180         osd.287     up  1.00000 1.00000
290   hdd   15.34180         osd.290     up  1.00000 1.00000
293   hdd   15.34180         osd.293     up  1.00000 1.00000
296   hdd   15.34180         osd.296     up  1.00000 1.00000
299   hdd   15.34180         osd.299     up  1.00000 1.00000
301   hdd   15.34180         osd.301     up  1.00000 1.00000
304   hdd   15.34180         osd.304     up  1.00000 1.00000
307   hdd   15.34180         osd.307     up  1.00000 1.00000
310   hdd   15.34180         osd.310     up  1.00000 1.00000
313   hdd   15.34180         osd.313     up  1.00000 1.00000
316   hdd   15.34180         osd.316     up  1.00000 1.00000
319   hdd   15.34180         osd.319     up  1.00000 1.00000
321   hdd   15.34180         osd.321     up  1.00000 1.00000
324   hdd   15.34180         osd.324     up  1.00000 1.00000
327   hdd   15.34180         osd.327     up  1.00000 1.00000
329   hdd   15.34180         osd.329     up  1.00000 1.00000
332   hdd   15.34180         osd.332     up  1.00000 1.00000
335   hdd   15.34180         osd.335     up  1.00000 1.00000
338   hdd   15.34180         osd.338     up  1.00000 1.00000
341   hdd   15.34180         osd.341     up  1.00000 1.00000
344   hdd   15.34180         osd.344     up  1.00000 1.00000
347   hdd   15.34180         osd.347     up  1.00000 1.00000
349   hdd   15.34180         osd.349     up  1.00000 1.00000
352   hdd   15.34180         osd.352     up  1.00000 1.00000
354   hdd   15.34180         osd.354     up  1.00000 1.00000
357   hdd   15.34180         osd.357     up  1.00000 1.00000
360   hdd   15.34180         osd.360     up  1.00000 1.00000
363   hdd   15.34180         osd.363     up  1.00000 1.00000
366   hdd   15.34180         osd.366     up  1.00000 1.00000
369   hdd   15.34180         osd.369     up  1.00000 1.00000
372   hdd   15.34180         osd.372     up  1.00000 1.00000
375   hdd   15.34180         osd.375     up  1.00000 1.00000
378   hdd   15.34180         osd.378     up  1.00000 1.00000
380   hdd   15.34180         osd.380     up  1.00000 1.00000
382   hdd   15.34180         osd.382     up  1.00000 1.00000
385   hdd   15.34180         osd.385     up  1.00000 1.00000
388   hdd   15.34180         osd.388     up  1.00000 1.00000
391   hdd   15.34180         osd.391     up  1.00000 1.00000
394   hdd   15.34180         osd.394     up  1.00000 1.00000
397   hdd   15.34180         osd.397     up  1.00000 1.00000
399   hdd   15.34180         osd.399   down  1.00000 1.00000
402   hdd   15.34180         osd.402     up  1.00000 1.00000
404   hdd   15.34180         osd.404     up  1.00000 1.00000
407   hdd   15.34180         osd.407     up  1.00000 1.00000
410   hdd   15.34180         osd.410     up  1.00000 1.00000
-13        813.14203     host rgw-5
189   hdd   15.34279         osd.189     up  1.00000 1.00000
195   hdd   15.34279         osd.195     up  1.00000 1.00000
201   hdd   15.34180         osd.201     up  1.00000 1.00000
207   hdd   15.34180         osd.207     up  1.00000 1.00000
213   hdd   15.34180         osd.213     up  1.00000 1.00000
217   hdd   15.34180         osd.217     up  1.00000 1.00000
218   hdd   15.34279         osd.218     up  1.00000 1.00000
219   hdd   15.34180         osd.219     up  1.00000 1.00000
220   hdd   15.34279         osd.220     up  1.00000 1.00000
221   hdd   15.34180         osd.221     up  1.00000 1.00000
222   hdd   15.34279         osd.222     up  1.00000 1.00000
223   hdd   15.34180         osd.223     up  1.00000 1.00000
224   hdd   15.34279         osd.224     up  1.00000 1.00000
225   hdd   15.34180         osd.225     up  1.00000 1.00000
226   hdd   15.34279         osd.226     up  1.00000 1.00000
227   hdd   15.34180         osd.227     up  1.00000 1.00000
228   hdd   15.34279         osd.228     up  1.00000 1.00000
229   hdd   15.34180         osd.229     up  1.00000 1.00000
230   hdd   15.34279         osd.230     up  1.00000 1.00000
231   hdd   15.34180         osd.231     up  1.00000 1.00000
232   hdd   15.34279         osd.232     up  1.00000 1.00000
233   hdd   15.34180         osd.233     up  1.00000 1.00000
234   hdd   15.34180         osd.234     up  1.00000 1.00000
235   hdd   15.34279         osd.235     up  1.00000 1.00000
236   hdd   15.34180         osd.236     up  1.00000 1.00000
237   hdd   15.34279         osd.237     up  1.00000 1.00000
238   hdd   15.34180         osd.238     up  1.00000 1.00000
239   hdd   15.34279         osd.239     up  1.00000 1.00000
240   hdd   15.34180         osd.240     up  1.00000 1.00000
241   hdd   15.34279         osd.241     up  1.00000 1.00000
242   hdd   15.34180         osd.242     up  1.00000 1.00000
243   hdd   15.34279         osd.243     up  1.00000 1.00000
244   hdd   15.34180         osd.244     up  1.00000 1.00000
245   hdd   15.34279         osd.245     up  1.00000 1.00000
246   hdd   15.34180         osd.246     up  1.00000 1.00000
247   hdd   15.34279         osd.247     up  1.00000 1.00000
248   hdd   15.34180         osd.248     up  1.00000 1.00000
249   hdd   15.34279         osd.249     up  1.00000 1.00000
250   hdd   15.34180         osd.250     up  1.00000 1.00000
251   hdd   15.34279         osd.251     up  1.00000 1.00000
252   hdd   15.34180         osd.252     up  1.00000 1.00000
253   hdd   15.34279         osd.253     up  1.00000 1.00000
254   hdd   15.34180         osd.254     up  1.00000 1.00000
255   hdd   15.34279         osd.255     up  1.00000 1.00000
256   hdd   15.34180         osd.256     up  1.00000 1.00000
257   hdd   15.34279         osd.257     up  1.00000 1.00000
258   hdd   15.34180         osd.258     up  1.00000 1.00000
259   hdd   15.34279         osd.259     up  1.00000 1.00000
260   hdd   15.34180         osd.260     up  1.00000 1.00000
261   hdd   15.34279         osd.261     up  1.00000 1.00000
262   hdd   15.34279         osd.262     up  1.00000 1.00000
263   hdd   15.34279         osd.263     up  1.00000 1.00000
264   hdd   15.34279         osd.264     up  1.00000 1.00000
-11        813.11523     host rgw-6
 12   hdd   15.34180         osd.12      up  1.00000 1.00000
 16   hdd   15.34180         osd.16      up  1.00000 1.00000
 21   hdd   15.34180         osd.21      up  1.00000 1.00000
 25   hdd   15.34180         osd.25      up  1.00000 1.00000
 30   hdd   15.34180         osd.30      up  1.00000 1.00000
 34   hdd   15.34180         osd.34      up  1.00000 1.00000
 39   hdd   15.34180         osd.39      up  1.00000 1.00000
 48   hdd   15.34180         osd.48      up  1.00000 1.00000
 56   hdd   15.34180         osd.56      up  1.00000 1.00000
 60   hdd   15.34180         osd.60      up  1.00000 1.00000
 64   hdd   15.34180         osd.64      up  1.00000 1.00000
 69   hdd   15.34180         osd.69      up  1.00000 1.00000
 73   hdd   15.34180         osd.73      up  1.00000 1.00000
 77   hdd   15.34180         osd.77      up  1.00000 1.00000
 81   hdd   15.34180         osd.81      up  1.00000 1.00000
 85   hdd   15.34180         osd.85      up  1.00000 1.00000
 89   hdd   15.34180         osd.89      up  1.00000 1.00000
 91   hdd   15.34180         osd.91      up  1.00000 1.00000
 93   hdd   15.34180         osd.93      up  1.00000 1.00000
 95   hdd   15.34180         osd.95      up  1.00000 1.00000
 97   hdd   15.34180         osd.97      up  1.00000 1.00000
 99   hdd   15.34180         osd.99      up  1.00000 1.00000
101   hdd   15.34180         osd.101     up  1.00000 1.00000
103   hdd   15.34180         osd.103     up  1.00000 1.00000
105   hdd   15.34180         osd.105     up  1.00000 1.00000
159   hdd   15.34180         osd.159     up  1.00000 1.00000
160   hdd   15.34180         osd.160     up  1.00000 1.00000
161   hdd   15.34180         osd.161     up  1.00000 1.00000
162   hdd   15.34180         osd.162     up  1.00000 1.00000
163   hdd   15.34180         osd.163     up  1.00000 1.00000
164   hdd   15.34180         osd.164     up  1.00000 1.00000
165   hdd   15.34180         osd.165     up  1.00000 1.00000
166   hdd   15.34180         osd.166     up  1.00000 1.00000
167   hdd   15.34180         osd.167     up  1.00000 1.00000
168   hdd   15.34180         osd.168     up  1.00000 1.00000
169   hdd   15.34180         osd.169     up  1.00000 1.00000
170   hdd   15.34180         osd.170     up  1.00000 1.00000
171   hdd   15.34180         osd.171     up  1.00000 1.00000
172   hdd   15.34180         osd.172     up  1.00000 1.00000
173   hdd   15.34180         osd.173     up  1.00000 1.00000
174   hdd   15.34180         osd.174     up  1.00000 1.00000
175   hdd   15.34180         osd.175     up  1.00000 1.00000
176   hdd   15.34180         osd.176     up  1.00000 1.00000
177   hdd   15.34180         osd.177     up  1.00000 1.00000
178   hdd   15.34180         osd.178     up  1.00000 1.00000
179   hdd   15.34180         osd.179     up  1.00000 1.00000
180   hdd   15.34180         osd.180     up  1.00000 1.00000
181   hdd   15.34180         osd.181     up  1.00000 1.00000
182   hdd   15.34180         osd.182     up  1.00000 1.00000
183   hdd   15.34180         osd.183     up  1.00000 1.00000
184   hdd   15.34180         osd.184     up  1.00000 1.00000
185   hdd   15.34180         osd.185     up  1.00000 1.00000
186   hdd   15.34180         osd.186     up  1.00000 1.00000
[root@rgw-4 /]#
[root@rgw-4 /]#
[root@rgw-4 /]#
[root@rgw-4 /]#
[root@rgw-4 /]#
[root@rgw-4 /]# ceph osd dump | grep -i pool
pool 10 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode warn last_change 12075 lfor 0/0/12064 flags hashpspool stripe_width 0 application rgw
pool 11 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode warn last_change 12075 lfor 0/0/12068 flags hashpspool stripe_width 0 application rgw
pool 12 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode warn last_change 12075 lfor 0/0/12066 flags hashpspool stripe_width 0 application rgw
pool 13 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode warn last_change 12075 lfor 0/0/12068 flags hashpspool stripe_width 0 application rgw
pool 14 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 4096 pgp_num 4096 autoscale_mode warn last_change 12172 lfor 0/0/12085 flags hashpspool stripe_width 0 application rgw
pool 15 'default.rgw.data.root' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode warn last_change 12173 flags hashpspool stripe_width 0 application rgw
pool 16 'default.rgw.buckets.data' erasure size 6 min_size 5 crush_rule 1 object_hash rjenkins pg_num 16384 pgp_num 16384 autoscale_mode warn last_change 12174 lfor 0/0/12162 flags hashpspool stripe_width 16384 application rgw
[root@rgw-4 /]#

Comment 3 karan singh 2020-05-07 15:26:34 UTC
Created attachment 1686217 [details]
COSBench Workload file

Attaching cosbench workload file

Comment 4 karan singh 2020-05-07 15:29:48 UTC
Clarification on ceph -s output , here is the pattern
- Multiple OSDS fails/flaps from 1 node (5-10)
- All 53 OSDs flap from one node (but comes back up in say 30 seconds)



I am also seeing slow ops 

[root@rgw-4 /]# ceph -s
  cluster:
    id:     ebe0aa4b-4fb5-4c68-84ab-cbf1118937a2
    health: HEALTH_WARN
            1 osds down
            Reduced data availability: 2 pgs inactive, 19 pgs incomplete
            Degraded data redundancy: 491400/146999901 objects degraded (0.334%), 339 pgs degraded
            24 slow ops, oldest one blocked for 4335 sec, daemons [mon,rgw-1,mon,rgw-2,mon,rgw-3] have slow ops.

Comment 5 Vikhyat Umrao 2020-05-07 15:30:14 UTC
Logs from one of the OSD:

2020-05-07 08:51:17.141 7fd8360af700  1 osd.399 pg_epoch: 15005 pg[16.3368s5( v 14993'1582 (14977'1572,14993'1582] lb MIN (bitwise) local-lis/les=14995/14996 n=0 ec=12150/12099 lis/c 15002/13313 les/c/f 15003/13317/0 15004/15005/14022) [35,56,203,74,254,399]/[35,56,203,74,254,2147483647]p35(0) r=-1 lpr=15005 pi=[13313,15005)/4 crt=14993'1582 lcod 0'0 remapped NOTIFY mbc={}] state<Start>: transitioning to Stray
2020-05-07 08:51:17.141 7fd8360af700  1 osd.399 pg_epoch: 15005 pg[16.35f7s5( v 14998'1548 (14929'1538,14998'1548] lb MIN (bitwise) local-lis/les=15000/15001 n=0 ec=12152/12099 lis/c 15002/13292 les/c/f 15003/13295/0 15004/15005/12251) [125,103,116,47,244,399]/[125,103,116,47,244,2147483647]p125(0) r=-1 lpr=15005 pi=[13292,15005)/3 crt=14998'1548 lcod 0'0 remapped NOTIFY mbc={}] start_peering_interval up [125,103,116,47,244,399] -> [125,103,116,47,244,399], acting [125,103,116,47,244,399] -> [125,103,116,47,244,2147483647], acting_primary 125(0) -> 125, up_primary 125(0) -> 125, role 5 -> -1, features acting 4611087854031667199 upacting 4611087854031667199
2020-05-07 08:51:17.141 7fd8360af700  1 osd.399 pg_epoch: 15005 pg[16.35f7s5( v 14998'1548 (14929'1538,14998'1548] lb MIN (bitwise) local-lis/les=15000/15001 n=0 ec=12152/12099 lis/c 15002/13292 les/c/f 15003/13295/0 15004/15005/12251) [125,103,116,47,244,399]/[125,103,116,47,244,2147483647]p125(0) r=-1 lpr=15005 pi=[13292,15005)/3 crt=14998'1548 lcod 0'0 remapped NOTIFY mbc={}] state<Start>: transitioning to Stray
2020-05-07 08:51:18.152 7fd8340ab700  0 log_channel(cluster) log [INF] : 16.3452s0 continuing backfill to osd.249(2) from (14980'1610,15001'1627] MIN to 15001'1627
2020-05-07 08:51:18.152 7fd8350ad700  0 log_channel(cluster) log [INF] : 16.1b78s0 continuing backfill to osd.65(5) from (14969'1510,15001'1524] MIN to 15001'1524
2020-05-07 08:51:18.155 7fd8340ab700  0 log_channel(cluster) log [INF] : 16.267es0 continuing backfill to osd.65(3) from (14969'1539,15001'1553] MIN to 15001'1553
/builddir/build/BUILD/ceph-14.2.4/src/osd/PGLog.h: In function 'void PGLog::IndexedLog::add(const pg_log_entry_t&, bool)' thread 7fd8360af700 time 2020-05-07 08:51:18.171087
/builddir/build/BUILD/ceph-14.2.4/src/osd/PGLog.h: 511: FAILED ceph_assert(head.version == 0 || e.version.version > head.version)
 ceph version 14.2.4-125.el8cp (db63624068590e593c47150c7574d08c1ec0d3e4) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x156) [0x561590c1234c]
 2: (()+0x51f566) [0x561590c12566]
 3: (bool PGLog::append_log_entries_update_missing<pg_missing_set<true> >(hobject_t const&, bool, std::__cxx11::list<pg_log_entry_t, mempool::pool_allocator<(mempool::pool_index_t)14, pg_log_entry_t> > const&, bool, PGLog::IndexedLog*, pg_missing_set<true>&, PGLog::LogEntryHandler*, DoutPrefixProvider const*)+0xaed) [0x561590e4442d]
 4: (PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0xf6b) [0x561590e37a6b]
 5: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_shard_t)+0x68) [0x561590d8e278]
 6: (PG::RecoveryState::Stray::react(MLogRec const&)+0x23b) [0x561590dce9ab]
 7: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xa5) [0x561590e2aa35]
 8: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x5a) [0x561590df8bda]
 9: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PG::RecoveryCtx*)+0x2c2) [0x561590de98e2]
 10: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x2bc) [0x561590d2a36c]
 11: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x55) [0x561590fa84b5]
 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x1366) [0x561590d26c36]
 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x561591306134]
 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x561591308cf4]
 15: (()+0x82de) [0x7fd8575ac2de]
 16: (clone()+0x43) [0x7fd856356133]
*** Caught signal (Aborted) **
 in thread 7fd8360af700 thread_name:tp_osd_tp
2020-05-07 08:51:18.174 7fd8360af700 -1 /builddir/build/BUILD/ceph-14.2.4/src/osd/PGLog.h: In function 'void PGLog::IndexedLog::add(const pg_log_entry_t&, bool)' thread 7fd8360af700 time 2020-05-07 08:51:18.171087
/builddir/build/BUILD/ceph-14.2.4/src/osd/PGLog.h: 511: FAILED ceph_assert(head.version == 0 || e.version.version > head.version)


Suspected issue:

https://tracker.ceph.com/issues/44532
https://github.com/ceph/ceph/pull/33910

Comment 9 karan singh 2020-05-07 16:03:53 UTC
Some more commentary per my discussion with Vikhyat


So the cluster as running fine, I ran several cosbench tests successfully. Then all of a sudden, when i ran another test, I started to see this flapping issue. I stopped the workload and rested osds that are down for a long time. Then the cluster became healthy again. Post that i re-applied the load from cosbench and again started to see Flapping issues with  OSDs.  When i run watch "ceph -s", I can see the random amount of OSD going down 1,2, 10 or even 53 odds, and withing a few seconds they came back ... and this is a repeated pattern.

Comment 14 karan singh 2020-05-08 13:11:48 UTC
I received wonderful support from Vikhyat and Neha in this case, many thanks guys, you ROCK !!

Happy to report that, changes mentioned by Neha i.e switching to default values for pg_log* tuneable have fixed the issue. I have not seen a similar assert since doing the mentioned changes, So this is not a bug.

Going forward, my plan is to ingest 10 Billion RADOS objects via 12xRGW on to this cluster. So if I encounter this again, I probably will re-open this. Until them all good :)

Comment 15 Yaniv Kaul 2020-05-10 10:00:43 UTC
(In reply to karan singh from comment #14)
> I received wonderful support from Vikhyat and Neha in this case, many thanks
> guys, you ROCK !!
> 
> Happy to report that, changes mentioned by Neha i.e switching to default
> values for pg_log* tuneable have fixed the issue. I have not seen a similar
> assert since doing the mentioned changes, So this is not a bug.

Well, if we allow our users to change from default settings, it is a bug.

> 
> Going forward, my plan is to ingest 10 Billion RADOS objects via 12xRGW on
> to this cluster. So if I encounter this again, I probably will re-open this.
> Until them all good :)


Based on the above, closing. Please re-open if we need to fix for 4.x.


Note You need to log in before you can comment on or make changes to this bug.