Bug 2125433 - [ceph-Dashboard] RHCS 6.0 - Ceph node Network Packet drop alerts are seen frequently in RHCS 6.0 dashboard
Summary: [ceph-Dashboard] RHCS 6.0 - Ceph node Network Packet drop alerts are seen fr...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Dashboard
Version: 6.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 6.0
Assignee: Aashish sharma
QA Contact: Preethi
Eliska
URL:
Whiteboard:
: 2115668 (view as bug list)
Depends On:
Blocks: 2126050
TreeView+ depends on / blocked
 
Reported: 2022-09-09 02:44 UTC by Preethi
Modified: 2023-03-20 18:58 UTC (History)
5 users (show)

Fixed In Version: ceph-17.2.3-30.el9cp
Doc Type: Bug Fix
Doc Text:
.Ceph node Network Packet drop alerts are shown appropriately on the Ceph dashboard Previously, there was an issue in the query related to Ceph node Network Packet drop alerts. As a consequence, those alerts would be seen frequently on the Ceph Dashboard. With this fix, related query no longer causes the issues and Ceph node Network Packet drop alerts are shown appropriately.
Clone Of:
Environment:
Last Closed: 2023-03-20 18:58:05 UTC
Embargoed:


Attachments (Terms of Use)
mgr snippet logs (43.28 KB, text/plain)
2022-09-09 02:44 UTC, Preethi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph pull 47707 0 None Merged Ceph-mixin: Fix CephNodeNetworkPacket alerts 2022-09-14 06:03:46 UTC
Red Hat Issue Tracker RHCEPH-5229 0 None None None 2022-09-09 02:53:15 UTC
Red Hat Issue Tracker RHCSDASH-827 0 None None None 2022-09-09 02:54:18 UTC
Red Hat Product Errata RHBA-2023:1360 0 None None None 2023-03-20 18:58:40 UTC

Description Preethi 2022-09-09 02:44:43 UTC
Created attachment 1910585 [details]
mgr snippet logs

Created attachment 1910585 [details]
mgr snippet logs

Description of problem:
[ceph-Dashboard] RHCS 6.0 - Ceph node Network Packet drop alerts are  seen frequently in RHCS 6.0 dashboard 

Version-Release number of selected component (if applicable):
ceph version 17.2.3-21.el9cp (c988b360f12cbb4bc1c80ee7a9771814bc0f49d6) quincy 

How reproducible:


Steps to Reproduce:
1. Have a cluster with 5.2 and perform upgrade to 6.0 with 250+images across pools cluster with 60+ filled (OS is upgraded from 8.6 to 9.0 before performing ceph upgrade to 6.0)
2. once upgrade is successful check the dashboard alerts section
3. Observe the behaviour 
We see continuous popups throwing "Network packet drop alerts" in RED in alert sections

NOTE: cluster health status was good and there was no significant network related errors in the ceph status

Actual results:
We are seeing continuous network packet drop errors in the ceph dashboard alert section

Expected results:

We should not see any network related alerts with packet drops 

Additional info:

magna021
https://10.8.128.22:8443/#/login
admin123/myPassword

Snippet of mgr logs:
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/_ping): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/_ping): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/_ping): [errno: 111] Connection refused
::ffff:10.8.128.21 - - [08/Sep/2022:07:43:05] "GET /metrics HTTP/1.1" 200 1042443 "" "Prometheus/2.33.4"
::ffff:10.8.128.21 - - [08/Sep/2022:07:43:15] "GET /metrics HTTP/1.1" 200 1042442 "" "Prometheus/2.33.4"
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/_ping): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/_ping): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/_ping): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/_ping): [errno: 111] Connection refused
::ffff:10.8.128.21 - - [08/Sep/2022:07:43:25] "GET /metrics HTTP/1.1" 200 1042442 "" "Prometheus/2.33.4"
Dashboard Exception
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 47, in dashboard_exception_handler
    return handler(*args, **kwargs)
  File "/lib/python3.9/site-packages/cherrypy/_cpdispatch.py", line 54, in __call__
    return self.callable(*self.args, **self.kwargs)
  File "/usr/share/ceph/mgr/dashboard/controllers/_base_controller.py", line 258, in inner
    ret = func(*args, **kwargs)
  File "/usr/share/ceph/mgr/dashboard/controllers/_rest_controller.py", line 191, in wrapper
    return func(*vpath, **params)
  File "/lib64/python3.9/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/lib64/python3.9/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/usr/share/ceph/mgr/dashboard/controllers/rbd.py", line 105, in list
    return self._rbd_list(pool_name)
  File "/usr/share/ceph/mgr/dashboard/controllers/rbd.py", line 90, in _rbd_list
    status, value = RbdService.rbd_pool_list(pool)
  File "/usr/share/ceph/mgr/dashboard/tools.py", line 245, in wrapper
    return rvc.run(fn, args, kwargs)
  File "/usr/share/ceph/mgr/dashboard/tools.py", line 233, in run
    raise ViewCacheNoDataException()
dashboard.exceptions.ViewCacheNoDataException: ViewCache: unable to retrieve data
::ffff:10.8.128.21 - - [08/Sep/2022:07:43:35] "GET /metrics HTTP/1.1" 200 1042631 "" "Prometheus/2.33.4"
Error while calling fn=<function RbdService.rbd_pool_list at 0x7ff9cd3f7790> ex=[errno 19] error generating diff from snapshot None
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/dashboard/tools.py", line 147, in run
    val = self.fn(*self.args, **self.kwargs)
  File "/usr/share/ceph/mgr/dashboard/services/rbd.py", line 421, in rbd_pool_list
    stat = cls._rbd_image_stat(
  File "/usr/share/ceph/mgr/dashboard/services/rbd.py", line 386, in _rbd_image_stat
    return cls._rbd_image(ioctx, pool_name, namespace, image_name)
  File "/usr/share/ceph/mgr/dashboard/services/rbd.py", line 360, in _rbd_image
    total_prov_bytes, snaps_prov_bytes = cls._rbd_disk_usage(
  File "/usr/share/ceph/mgr/dashboard/services/rbd.py", line 256, in _rbd_disk_usage
    image.diff_iterate(0, size, prev_snap, du_callb,
  File "rbd.pyx", line 2770, in rbd.requires_not_closed.wrapper
  File "rbd.pyx", line 3925, in rbd.Image.diff_iterate
rbd.OSError: [errno 19] error generating diff from snapshot None
::ffff:10.8.128.21 - - [08/Sep/2022:07:43:45] "GET /metrics HTTP/1.1" 200 1042626 "" "Prometheus/2.33.4"
::ffff:10.8.128.21 - - [08/Sep/2022:07:43:55] "GET /metrics HTTP/1.1" 200 1042626 "" "Prometheus/2.33.4"
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/_ping): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/_ping): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/_ping): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused
iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/_ping): [errno: 111] Connection refused


ceph status-
[root@magna021 pnataraj]# ceph status
  cluster:
    id:     c8ce6d50-c0a1-11ec-a99b-002590fc2a2e
    health: HEALTH_WARN
            2 failed cephadm daemon(s)
 
  services:
    mon:        5 daemons, quorum magna021,magna022,magna024,magna025,magna026 (age 20h)
    mgr:        magna022.icxgsh(active, since 20h), standbys: magna021.syfuos
    osd:        52 osds: 52 up (since 19h), 52 in (since 2d)
    rbd-mirror: 1 daemon active (1 hosts)
 
  data:
    pools:   17 pools, 1569 pgs
    objects: 2.39M objects, 9.1 TiB
    usage:   27 TiB used, 20 TiB / 48 TiB avail
    pgs:     1569 active+clean
 
  io:
    client:   520 KiB/s rd, 34 KiB/s wr, 623 op/s rd, 64 op/s wr
 
[root@magna021 pnataraj]# ceph health detail
HEALTH_WARN 2 failed cephadm daemon(s)
[WRN] CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
    daemon iscsi.test.plena001.konnne on plena001 is in error state
    daemon iscsi.test.plena002.wgcgle on plena002 is in error state
[root@magna021 pnataraj]#

Comment 6 Preethi 2022-09-21 07:08:32 UTC
issue is not seen in the latest version of RHCS 6.0. Attached screenshot for reference

[ceph: root@magna021 /]# ceph versions
{
    "mon": {
        "ceph version 17.2.3-36.el9cp (962ea0ecffbdeffd7352299549353cc6f71aa979) quincy (stable)": 5
    },
    "mgr": {
        "ceph version 17.2.3-36.el9cp (962ea0ecffbdeffd7352299549353cc6f71aa979) quincy (stable)": 2
    },
    "osd": {
        "ceph version 17.2.3-36.el9cp (962ea0ecffbdeffd7352299549353cc6f71aa979) quincy (stable)": 52
    },
    "mds": {},
    "rbd-mirror": {
        "ceph version 17.2.3-36.el9cp (962ea0ecffbdeffd7352299549353cc6f71aa979) quincy (stable)": 1
    },
    "overall": {
        "ceph version 17.2.3-36.el9cp (962ea0ecffbdeffd7352299549353cc6f71aa979) quincy (stable)": 60
    }
}

Comment 7 Nizamudeen 2022-09-23 06:45:43 UTC
*** Bug 2115668 has been marked as a duplicate of this bug. ***

Comment 18 errata-xmlrpc 2023-03-20 18:58:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 6.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:1360


Note You need to log in before you can comment on or make changes to this bug.