Bug 2125433

Summary: [ceph-Dashboard] RHCS 6.0 - Ceph node Network Packet drop alerts are seen frequently in RHCS 6.0 dashboard
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Preethi <pnataraj>
Component: Ceph-DashboardAssignee: Aashish sharma <aasharma>
Status: CLOSED ERRATA QA Contact: Preethi <pnataraj>
Severity: high Docs Contact: Eliska <ekristov>
Priority: unspecified    
Version: 6.0CC: aasharma, ceph-eng-bugs, cephqe-warriors, ekristov, maydin
Target Milestone: ---   
Target Release: 6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-17.2.3-30.el9cp Doc Type: Bug Fix
Doc Text:
.Ceph node Network Packet drop alerts are shown appropriately on the Ceph dashboard Previously, there was an issue in the query related to Ceph node Network Packet drop alerts. As a consequence, those alerts would be seen frequently on the Ceph Dashboard. With this fix, related query no longer causes the issues and Ceph node Network Packet drop alerts are shown appropriately.
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-20 18:58:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2126050    
Attachments:
Description Flags
mgr snippet logs none

Description Preethi 2022-09-09 02:44:43 UTC
Created attachment 1910585 [details]
mgr snippet logs

Created attachment 1910585 [details]
mgr snippet logs

Description of problem:
[ceph-Dashboard] RHCS 6.0 - Ceph node Network Packet drop alerts are  seen frequently in RHCS 6.0 dashboard 

Version-Release number of selected component (if applicable):
ceph version 17.2.3-21.el9cp (c988b360f12cbb4bc1c80ee7a9771814bc0f49d6) quincy 

How reproducible:


Steps to Reproduce:
1. Have a cluster with 5.2 and perform upgrade to 6.0 with 250+images across pools cluster with 60+ filled (OS is upgraded from 8.6 to 9.0 before performing ceph upgrade to 6.0)
2. once upgrade is successful check the dashboard alerts section
3. Observe the behaviour 
We see continuous popups throwing "Network packet drop alerts" in RED in alert sections

NOTE: cluster health status was good and there was no significant network related errors in the ceph status

Actual results:
We are seeing continuous network packet drop errors in the ceph dashboard alert section

Expected results:

We should not see any network related alerts with packet drops 

Additional info:

magna021
https://10.8.128.22:8443/#/login
admin123/myPassword

Snippet of mgr logs:
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/_ping): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/_ping): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/_ping): [errno: 111] Connection refused
::ffff:10.8.128.21 - - [08/Sep/2022:07:43:05] "GET /metrics HTTP/1.1" 200 1042443 "" "Prometheus/2.33.4"
::ffff:10.8.128.21 - - [08/Sep/2022:07:43:15] "GET /metrics HTTP/1.1" 200 1042442 "" "Prometheus/2.33.4"
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/_ping): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/_ping): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/_ping): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/_ping): [errno: 111] Connection refused
::ffff:10.8.128.21 - - [08/Sep/2022:07:43:25] "GET /metrics HTTP/1.1" 200 1042442 "" "Prometheus/2.33.4"
Dashboard Exception
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 47, in dashboard_exception_handler
    return handler(*args, **kwargs)
  File "/lib/python3.9/site-packages/cherrypy/_cpdispatch.py", line 54, in __call__
    return self.callable(*self.args, **self.kwargs)
  File "/usr/share/ceph/mgr/dashboard/controllers/_base_controller.py", line 258, in inner
    ret = func(*args, **kwargs)
  File "/usr/share/ceph/mgr/dashboard/controllers/_rest_controller.py", line 191, in wrapper
    return func(*vpath, **params)
  File "/lib64/python3.9/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/lib64/python3.9/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/usr/share/ceph/mgr/dashboard/controllers/rbd.py", line 105, in list
    return self._rbd_list(pool_name)
  File "/usr/share/ceph/mgr/dashboard/controllers/rbd.py", line 90, in _rbd_list
    status, value = RbdService.rbd_pool_list(pool)
  File "/usr/share/ceph/mgr/dashboard/tools.py", line 245, in wrapper
    return rvc.run(fn, args, kwargs)
  File "/usr/share/ceph/mgr/dashboard/tools.py", line 233, in run
    raise ViewCacheNoDataException()
dashboard.exceptions.ViewCacheNoDataException: ViewCache: unable to retrieve data
::ffff:10.8.128.21 - - [08/Sep/2022:07:43:35] "GET /metrics HTTP/1.1" 200 1042631 "" "Prometheus/2.33.4"
Error while calling fn=<function RbdService.rbd_pool_list at 0x7ff9cd3f7790> ex=[errno 19] error generating diff from snapshot None
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/dashboard/tools.py", line 147, in run
    val = self.fn(*self.args, **self.kwargs)
  File "/usr/share/ceph/mgr/dashboard/services/rbd.py", line 421, in rbd_pool_list
    stat = cls._rbd_image_stat(
  File "/usr/share/ceph/mgr/dashboard/services/rbd.py", line 386, in _rbd_image_stat
    return cls._rbd_image(ioctx, pool_name, namespace, image_name)
  File "/usr/share/ceph/mgr/dashboard/services/rbd.py", line 360, in _rbd_image
    total_prov_bytes, snaps_prov_bytes = cls._rbd_disk_usage(
  File "/usr/share/ceph/mgr/dashboard/services/rbd.py", line 256, in _rbd_disk_usage
    image.diff_iterate(0, size, prev_snap, du_callb,
  File "rbd.pyx", line 2770, in rbd.requires_not_closed.wrapper
  File "rbd.pyx", line 3925, in rbd.Image.diff_iterate
rbd.OSError: [errno 19] error generating diff from snapshot None
::ffff:10.8.128.21 - - [08/Sep/2022:07:43:45] "GET /metrics HTTP/1.1" 200 1042626 "" "Prometheus/2.33.4"
::ffff:10.8.128.21 - - [08/Sep/2022:07:43:55] "GET /metrics HTTP/1.1" 200 1042626 "" "Prometheus/2.33.4"
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/_ping): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/_ping): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/_ping): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/_ping): [errno: 111] Connection refused


ceph status-
[root@magna021 pnataraj]# ceph status
  cluster:
    id:     c8ce6d50-c0a1-11ec-a99b-002590fc2a2e
    health: HEALTH_WARN
            2 failed cephadm daemon(s)
 
  services:
    mon:        5 daemons, quorum magna021,magna022,magna024,magna025,magna026 (age 20h)
    mgr:        magna022.icxgsh(active, since 20h), standbys: magna021.syfuos
    osd:        52 osds: 52 up (since 19h), 52 in (since 2d)
    rbd-mirror: 1 daemon active (1 hosts)
 
  data:
    pools:   17 pools, 1569 pgs
    objects: 2.39M objects, 9.1 TiB
    usage:   27 TiB used, 20 TiB / 48 TiB avail
    pgs:     1569 active+clean
 
  io:
    client:   520 KiB/s rd, 34 KiB/s wr, 623 op/s rd, 64 op/s wr
 
[root@magna021 pnataraj]# ceph health detail
HEALTH_WARN 2 failed cephadm daemon(s)
[WRN] CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
    daemon iscsi.test.plena001.konnne on plena001 is in error state
    daemon iscsi.test.plena002.wgcgle on plena002 is in error state
[root@magna021 pnataraj]#

Comment 6 Preethi 2022-09-21 07:08:32 UTC
issue is not seen in the latest version of RHCS 6.0. Attached screenshot for reference

[ceph: root@magna021 /]# ceph versions
{
    "mon": {
        "ceph version 17.2.3-36.el9cp (962ea0ecffbdeffd7352299549353cc6f71aa979) quincy (stable)": 5
    },
    "mgr": {
        "ceph version 17.2.3-36.el9cp (962ea0ecffbdeffd7352299549353cc6f71aa979) quincy (stable)": 2
    },
    "osd": {
        "ceph version 17.2.3-36.el9cp (962ea0ecffbdeffd7352299549353cc6f71aa979) quincy (stable)": 52
    },
    "mds": {},
    "rbd-mirror": {
        "ceph version 17.2.3-36.el9cp (962ea0ecffbdeffd7352299549353cc6f71aa979) quincy (stable)": 1
    },
    "overall": {
        "ceph version 17.2.3-36.el9cp (962ea0ecffbdeffd7352299549353cc6f71aa979) quincy (stable)": 60
    }
}

Comment 7 Nizamudeen 2022-09-23 06:45:43 UTC
*** Bug 2115668 has been marked as a duplicate of this bug. ***

Comment 18 errata-xmlrpc 2023-03-20 18:58:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 6.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:1360