+++ This bug was initially created as a clone of Bug #1943357 +++ Description of problem: When there are a large number of messages sent to the cluster log, such as slow ops warnings, the monitors can grow their db and fill up their disk. This occurs when the ingest rate of log messages is greater than the paxos_service_trim_max, which defaults to 500 entries - at this point logs are stored faster than they are deleted, and the store grows continuously. Version-Release number of selected component (if applicable): Any How reproducible: Always Steps to Reproduce: 1. Set paxos_service_trim_min = 10 and paxos_service_trim_max = 100 2. Generate slow ops on a cluster by running I/O and setting the slow ops threshold very low (e.g. osd_op_complaint_time = 0.001). Actual results: mon db grows continuously Expected results: mon db should not grow continuously Additional info:
Verified the bug fix by executing the following steps- 1. Set paxos_service_trim_min = 10 and paxos_service_trim_max = 10 [root@ceph-bharath-1621903885667-node1-mon-mgr-installer /]# ceph daemon /var/run/ceph/ceph-mon.ceph-bharath-1621903885667-node1-mon-mgr-installer.asok config show | grep "paxos_service_trim_max" "paxos_service_trim_max": "100", "paxos_service_trim_max_multiplier": "20", [root@ceph-bharath-1621903885667-node1-mon-mgr-installer /]# ceph daemon /var/run/ceph/ceph-mon.ceph-bharath-1621903885667-node1-mon-mgr-installer.asok config show | grep "paxos_stash_full_interval" "paxos_stash_full_interval": "10", [root@ceph-bharath-1621903885667-node1-mon-mgr-installer /]# 2.Generate slow ops 2.1 - [root@ceph-bharath-1621924714974-node1-mon-mgr-installer ceph-ceph-bharath-1621924714974-node1-mon-mgr-installer]# ceph daemon /var/run/ceph/ceph-mon.ceph-bharath-1621924714974-node1-mon-mgr-installer.asok config show | grep "osd_op_complaint_time" "osd_op_complaint_time": "0.000001", 2.2 - Perform the IO operations on the cluster root@ceph-bharath-1621924714974-node1-mon-mgr-installer cephuser]# rados bench -p rbd 300 write -b 8192 --no-cleanup ------------------------------------------------- ------------------------------------------------- 291 16 225101 225085 6.04209 0.015625 1.2053 0.0205476 292 16 225132 225116 6.02223 0.242188 0.0819285 0.0207254 293 16 225141 225125 6.00191 0.0703125 1.01162 0.0207605 294 16 225145 225129 5.9816 0.03125 1.06602 0.0207884 295 16 225162 225146 5.96178 0.132812 3.20863 0.0209133 296 16 225188 225172 5.94232 0.203125 0.103691 0.0209911 297 16 225201 225185 5.92265 0.101562 2.79583 0.0210419 298 16 225215 225199 5.90315 0.109375 0.100911 0.0210891 299 16 225230 225214 5.8838 0.117188 0.956521 0.0211968 2021-05-25 07:59:33.496814 min lat: 0.00251739 max lat: 13.8412 avg lat: 0.0212447 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 300 16 225242 225226 5.86449 0.09375 0.704823 0.0212447 301 16 225243 225227 5.84504 0.0078125 0.815811 0.0212482 302 16 225243 225227 5.82568 0 - 0.0212482 Total time run: 302.246 Total writes made: 225243 Write size: 8192 Object size: 8192 Bandwidth (MB/sec): 5.82211 Stddev Bandwidth: 4.75857 Max bandwidth (MB/sec): 11.8828 Min bandwidth (MB/sec): 0 Average IOPS: 745 Stddev IOPS: 609.113 Max IOPS: 1521 Min IOPS: 0 Average Latency(s): 0.0214694 Stddev Latency(s): 0.195288 Max latency(s): 13.8412 Min latency(s): 0.00251739 2.3- Performed the power cycle on OSD 3. Verify the mon DB and noticed that the db is not continuously growing [root@ceph-bharath-1621924714974-node1-mon-mgr-installer ceph-ceph-bharath-1621924714974-node1-mon-mgr-installer]# pwd /var/lib/ceph/mon/ceph-ceph-bharath-1621924714974-node1-mon-mgr-installer [root@ceph-bharath-1621924714974-node1-mon-mgr-installer ceph-ceph-bharath-1621924714974-node1-mon-mgr-installer]# du 70644 ./store.db 70656 . [root@ceph-bharath-1621924714974-node1-mon-mgr-installer ceph-ceph-bharath-1621924714974-node1-mon-mgr-installer]# du 70644 ./store.db 70656 . [root@ceph-bharath-1621924714974-node1-mon-mgr-installer ceph-ceph-bharath-1621924714974-node1-mon-mgr-installer]# du 70644 ./store.db 70656 . [root@ceph-bharath-1621924714974-node1-mon-mgr-installer ceph-ceph-bharath-1621924714974-node1-mon-mgr-installer]# du 70644 ./store.db 70656 .
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Ceph Storage 4.2 Security and Bug Fix Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2445