Bug 1955218 - monitor db can fill up with cluster log messages
Summary: monitor db can fill up with cluster log messages
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 4.2
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: 4.2z2
Assignee: Aishwarya Mathuria
QA Contact: skanta
URL:
Whiteboard:
Depends On: 1943357
Blocks: 1941939
TreeView+ depends on / blocked
 
Reported: 2021-04-29 17:34 UTC by Neha Ojha
Modified: 2021-06-15 17:14 UTC (History)
18 users (show)

Fixed In Version: ceph-14.2.11-157.el8cp, ceph-14.2.11-157.el7cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1943357
Environment:
Last Closed: 2021-06-15 17:14:17 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph pull 41099 0 None closed nautilus: mon: Modifying trim logic to change paxos_service_trim_max dynamically 2021-05-06 18:05:05 UTC
Red Hat Product Errata RHSA-2021:2445 0 None None None 2021-06-15 17:14:35 UTC

Description Neha Ojha 2021-04-29 17:34:47 UTC
+++ This bug was initially created as a clone of Bug #1943357 +++

Description of problem:
When there are a large number of messages sent to the cluster log, such as slow ops warnings, the monitors can grow their db and fill up their disk. This occurs when the ingest rate of log messages is greater than the paxos_service_trim_max, which defaults to 500 entries - at this point logs are stored faster than they are deleted, and the store grows continuously.

Version-Release number of selected component (if applicable):
Any

How reproducible:
Always

Steps to Reproduce:
1. Set paxos_service_trim_min = 10 and paxos_service_trim_max = 100
2. Generate slow ops on a cluster by running I/O and setting the slow ops threshold very low (e.g. osd_op_complaint_time = 0.001).

Actual results:
mon db grows continuously

Expected results:
mon db should not grow continuously

Additional info:

Comment 4 skanta 2021-05-25 12:39:17 UTC
Verified the bug fix by executing the following steps-

1. Set paxos_service_trim_min = 10 and paxos_service_trim_max = 10

[root@ceph-bharath-1621903885667-node1-mon-mgr-installer /]# ceph daemon /var/run/ceph/ceph-mon.ceph-bharath-1621903885667-node1-mon-mgr-installer.asok config show | grep "paxos_service_trim_max"
    "paxos_service_trim_max": "100",
    "paxos_service_trim_max_multiplier": "20",
[root@ceph-bharath-1621903885667-node1-mon-mgr-installer /]# ceph daemon /var/run/ceph/ceph-mon.ceph-bharath-1621903885667-node1-mon-mgr-installer.asok config show | grep "paxos_stash_full_interval"
    "paxos_stash_full_interval": "10",
[root@ceph-bharath-1621903885667-node1-mon-mgr-installer /]#

2.Generate slow ops 
    2.1 - [root@ceph-bharath-1621924714974-node1-mon-mgr-installer ceph-ceph-bharath-1621924714974-node1-mon-mgr-installer]# ceph daemon /var/run/ceph/ceph-mon.ceph-bharath-1621924714974-node1-mon-mgr-installer.asok config show | grep  
          "osd_op_complaint_time"
                     "osd_op_complaint_time": "0.000001",
    2.2 - Perform the IO operations on the cluster
          root@ceph-bharath-1621924714974-node1-mon-mgr-installer cephuser]# rados bench -p rbd 300 write -b 8192 --no-cleanup
           -------------------------------------------------
           -------------------------------------------------
            291      16    225101    225085   6.04209  0.015625      1.2053   0.0205476
  292      16    225132    225116   6.02223  0.242188   0.0819285   0.0207254
  293      16    225141    225125   6.00191 0.0703125     1.01162   0.0207605
  294      16    225145    225129    5.9816   0.03125     1.06602   0.0207884
  295      16    225162    225146   5.96178  0.132812     3.20863   0.0209133
  296      16    225188    225172   5.94232  0.203125    0.103691   0.0209911
  297      16    225201    225185   5.92265  0.101562     2.79583   0.0210419
  298      16    225215    225199   5.90315  0.109375    0.100911   0.0210891
  299      16    225230    225214    5.8838  0.117188    0.956521   0.0211968
2021-05-25 07:59:33.496814 min lat: 0.00251739 max lat: 13.8412 avg lat: 0.0212447
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
  300      16    225242    225226   5.86449   0.09375    0.704823   0.0212447
  301      16    225243    225227   5.84504 0.0078125    0.815811   0.0212482
  302      16    225243    225227   5.82568         0           -   0.0212482
Total time run:         302.246
Total writes made:      225243
Write size:             8192
Object size:            8192
Bandwidth (MB/sec):     5.82211
Stddev Bandwidth:       4.75857
Max bandwidth (MB/sec): 11.8828
Min bandwidth (MB/sec): 0
Average IOPS:           745
Stddev IOPS:            609.113
Max IOPS:               1521
Min IOPS:               0
Average Latency(s):     0.0214694
Stddev Latency(s):      0.195288
Max latency(s):         13.8412
Min latency(s):         0.00251739

2.3- Performed the power cycle on OSD


3. Verify the mon DB and noticed that the db is not continuously growing

  [root@ceph-bharath-1621924714974-node1-mon-mgr-installer ceph-ceph-bharath-1621924714974-node1-mon-mgr-installer]# pwd
/var/lib/ceph/mon/ceph-ceph-bharath-1621924714974-node1-mon-mgr-installer

[root@ceph-bharath-1621924714974-node1-mon-mgr-installer ceph-ceph-bharath-1621924714974-node1-mon-mgr-installer]# du 
70644	./store.db
70656	.
[root@ceph-bharath-1621924714974-node1-mon-mgr-installer ceph-ceph-bharath-1621924714974-node1-mon-mgr-installer]# du 
70644	./store.db
70656	.
[root@ceph-bharath-1621924714974-node1-mon-mgr-installer ceph-ceph-bharath-1621924714974-node1-mon-mgr-installer]# du 
70644	./store.db
70656	.
[root@ceph-bharath-1621924714974-node1-mon-mgr-installer ceph-ceph-bharath-1621924714974-node1-mon-mgr-installer]# du 
70644	./store.db
70656	.

Comment 6 errata-xmlrpc 2021-06-15 17:14:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 4.2 Security and Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2445


Note You need to log in before you can comment on or make changes to this bug.