Bug 1955782 - Ceph Monitors incorrectly report slow operations
Summary: Ceph Monitors incorrectly report slow operations
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 4.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.2z2
Assignee: Kefu Chai
QA Contact: skanta
URL:
Whiteboard:
: 1890899 (view as bug list)
Depends On: 1905339
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-30 18:57 UTC by Neha Ojha
Modified: 2021-06-22 21:34 UTC (History)
20 users (show)

Fixed In Version: ceph-14.2.11-177.el8cp, ceph-14.2.11-177.el7cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1905339
Environment:
Last Closed: 2021-06-15 17:14:17 UTC
Embargoed:


Attachments (Terms of Use)
Error snippet (20.93 KB, text/plain)
2021-05-20 14:39 UTC, skanta
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 50964 0 None None None 2021-05-25 06:42:37 UTC
Github ceph ceph pull 41213 0 None closed nautilus: mon/OSDMonitor: drop stale failure_info after a grace period 2021-05-11 16:57:13 UTC
Github ceph ceph pull 41516 0 None open mon/OSDMonitor: drop stale failure_info even if can_mark_down() 2021-05-25 06:42:37 UTC
Github ceph ceph pull 41519 0 None open nautilus: mon/OSDMonitor: drop stale failure_info even if can_mark_down() 2021-05-25 06:42:37 UTC
Red Hat Product Errata RHSA-2021:2445 0 None None None 2021-06-15 17:14:35 UTC

Comment 7 skanta 2021-05-20 14:35:17 UTC
Facing the issue after performing the following steps-

[root@ceph-bharath-1621408854173-node1-mon-mgr-installer cephuser]# ceph osd lspools
1 rbd
[root@ceph-bharath-1621408854173-node1-mon-mgr-installer cephuser]# rados bench -p rbd 300 write -b 8192 --no-cleanup
.........................................
.........................................
  296      16    273341    273325   7.21304    7.8125   0.0542765   0.0173262
  297      16    274429    274413   7.21737       8.5  0.00938193   0.0173185
  298      16    275714    275698   7.22684   10.0391   0.0118252   0.0172957
  299      16    277085    277069   7.23849   10.7109   0.0120228    0.017268
Total time run:         300.007
Total writes made:      278447
Write size:             8192
Object size:            8192
Bandwidth (MB/sec):     7.25105
Stddev Bandwidth:       1.56969
Max bandwidth (MB/sec): 10.8359
Min bandwidth (MB/sec): 1.57812
Average IOPS:           928
Stddev IOPS:            200.921
Max IOPS:               1387
Min IOPS:               202
Average Latency(s):     0.0172383
Stddev Latency(s):      0.0188797
Max latency(s):         0.602089
Min latency(s):         0.0024007
[root@ceph-bharath-1621408854173-node1-mon-mgr-installer cephuser]#



[root@ceph-bharath-1621408854173-node1-mon-mgr-installer cephuser]# ceph daemon mon.`hostname` ops
{
    "ops": [],
    "num_ops": 0
}
[root@ceph-bharath-1621408854173-node1-mon-mgr-installer cephuser]# ceph -s
  cluster:
    id:     4fc966cd-df20-4772-8703-7fd99fd7355b
    health: HEALTH_WARN
            Long heartbeat ping times on back interface seen, longest is 63736.858 msec
            Long heartbeat ping times on front interface seen, longest is 63736.353 msec
            8 slow ops, oldest one blocked for 1277 sec, mon.ceph-bharath-1621408854173-node2-mon has slow ops
 
  services:
    mon: 3 daemons, quorum ceph-bharath-1621408854173-node2-mon,ceph-bharath-1621408854173-node3-mon-osd,ceph-bharath-1621408854173-node1-mon-mgr-installer (age 13m)
    mgr: ceph-bharath-1621408854173-node1-mon-mgr-installer(active, since 11h)
    osd: 11 osds: 11 up (since 13m), 11 in (since 13m)
 
  data:
    pools:   1 pools, 64 pgs
    objects: 278.45k objects, 2.1 GiB
    usage:   86 GiB used, 239 GiB / 325 GiB avail
    pgs:     64 active+clean
 
[root@ceph-bharath-1621408854173-node1-mon-mgr-installer cephuser]#

[root@ceph-bharath-1621408854173-node1-mon-mgr-installer cephuser]# ceph -v
ceph version 14.2.11-170.el8cp (b49a031f4d70d49462afb70f730b6b346effdd14) nautilus (stable)
[root@ceph-bharath-1621408854173-node1-mon-mgr-installer cephuser]#


Error snippet is attached.

Comment 8 skanta 2021-05-20 14:39:37 UTC
Created attachment 1785220 [details]
Error snippet

Comment 9 Kefu Chai 2021-05-25 06:42:38 UTC
turns out the fix does not work in all cases.

created https://github.com/ceph/ceph/pull/41516 to address this issue

Comment 12 skanta 2021-05-31 12:01:05 UTC
Moving the bug to verified state With the following successful steps:-

1. [root@ceph-bharath-1622458349362-node1-mon-mgr-installer cephuser]# ceph osd lspools
1 rbd
[root@ceph-bharath-1622458349362-node1-mon-mgr-installer cephuser]# 

2.[root@ceph-bharath-1622458349362-node1-mon-mgr-installer cephuser]# rados bench -p rbd 300 write -b 8192 --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 8192 bytes to objects of size 8192 for up to 300 seconds or 0 objects
Object prefix: benchmark_data_ceph-bharath-1622458349362-no_86661
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16      1102      1086   8.48496   8.48438   0.0125168    0.014662
    2      16      2065      2049   8.00364   7.52344   0.0121516   0.0155233
    3      16      2862      2846   7.41102   6.22656   0.0150411    0.016823
    4      16      3976      3960   7.73377   8.70312   0.0111872   0.0161329
    5      16      4812      4796   7.49313   6.53125   0.0120251   0.0166651
    6      16      5756      5740    7.4733     7.375   0.0139815   0.0167099
    7      16      6828      6812   7.60198     8.375  0.00679112   0.0164136
.................................................................................
...............................................................................
  297      16    320093    320077   8.41853   10.2734   0.0206678   0.0148473
  298      16    321517    321501   8.42761    11.125   0.0100201   0.0148315
  299      16    322814    322798   8.43331   10.1328   0.0157222   0.0148213
Total time run:         300.007
Total writes made:      324097
Write size:             8192
Object size:            8192
Bandwidth (MB/sec):     8.43983
Stddev Bandwidth:       1.36136
Max bandwidth (MB/sec): 12.1797
Min bandwidth (MB/sec): 3.26562
Average IOPS:           1080
Stddev IOPS:            174.254
Max IOPS:               1559
Min IOPS:               418
Average Latency(s):     0.0148102
Stddev Latency(s):      0.0133563
Max latency(s):         0.434727
Min latency(s):         0.00250338

3.[root@ceph-bharath-1622458349362-node1-mon-mgr-installer cephuser]# ceph -s
  cluster:
    id:     b6dabf91-1c96-45ee-9635-92961f393f9c
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph-bharath-1622458349362-node2-mon,ceph-bharath-1622458349362-node3-mon-osd,ceph-bharath-1622458349362-node1-mon-mgr-installer (age 19m)
    mgr: ceph-bharath-1622458349362-node1-mon-mgr-installer(active, since 19m)
    osd: 14 osds: 14 up (since 16m), 14 in (since 16m)
 
  data:
    pools:   1 pools, 64 pgs
    objects: 324.10k objects, 2.5 GiB
    usage:   101 GiB used, 339 GiB / 441 GiB avail
    pgs:     64 active+clean
 
4. [root@ceph-bharath-1622458349362-node1-mon-mgr-installer cephuser]# ceph daemon mon.`hostname` ops
{
    "ops": [],
    "num_ops": 0
}
[root@ceph-bharath-1622458349362-node1-mon-mgr-installer cephuser]#  


CEPH Version-

[root@ceph-bharath-1622458349362-node1-mon-mgr-installer cephuser]# ceph -v
ceph version 14.2.11-177.el8cp (0486420967ea3327d3ba01d3184f3ab96ddaa616) nautilus (stable)

Comment 14 errata-xmlrpc 2021-06-15 17:14:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 4.2 Security and Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2445

Comment 15 Neha Ojha 2021-06-22 21:34:40 UTC
*** Bug 1890899 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.