Created attachment 1910585 [details] mgr snippet logs Created attachment 1910585 [details] mgr snippet logs Description of problem: [ceph-Dashboard] RHCS 6.0 - Ceph node Network Packet drop alerts are seen frequently in RHCS 6.0 dashboard Version-Release number of selected component (if applicable): ceph version 17.2.3-21.el9cp (c988b360f12cbb4bc1c80ee7a9771814bc0f49d6) quincy How reproducible: Steps to Reproduce: 1. Have a cluster with 5.2 and perform upgrade to 6.0 with 250+images across pools cluster with 60+ filled (OS is upgraded from 8.6 to 9.0 before performing ceph upgrade to 6.0) 2. once upgrade is successful check the dashboard alerts section 3. Observe the behaviour We see continuous popups throwing "Network packet drop alerts" in RED in alert sections NOTE: cluster health status was good and there was no significant network related errors in the ceph status Actual results: We are seeing continuous network packet drop errors in the ceph dashboard alert section Expected results: We should not see any network related alerts with packet drops Additional info: magna021 https://10.8.128.22:8443/#/login admin123/myPassword Snippet of mgr logs: iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/_ping): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/_ping): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/_ping): [errno: 111] Connection refused ::ffff:10.8.128.21 - - [08/Sep/2022:07:43:05] "GET /metrics HTTP/1.1" 200 1042443 "" "Prometheus/2.33.4" ::ffff:10.8.128.21 - - [08/Sep/2022:07:43:15] "GET /metrics HTTP/1.1" 200 1042442 "" "Prometheus/2.33.4" iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/_ping): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/_ping): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/_ping): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/_ping): [errno: 111] Connection refused ::ffff:10.8.128.21 - - [08/Sep/2022:07:43:25] "GET /metrics HTTP/1.1" 200 1042442 "" "Prometheus/2.33.4" Dashboard Exception Traceback (most recent call last): File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 47, in dashboard_exception_handler return handler(*args, **kwargs) File "/lib/python3.9/site-packages/cherrypy/_cpdispatch.py", line 54, in __call__ return self.callable(*self.args, **self.kwargs) File "/usr/share/ceph/mgr/dashboard/controllers/_base_controller.py", line 258, in inner ret = func(*args, **kwargs) File "/usr/share/ceph/mgr/dashboard/controllers/_rest_controller.py", line 191, in wrapper return func(*vpath, **params) File "/lib64/python3.9/contextlib.py", line 79, in inner return func(*args, **kwds) File "/lib64/python3.9/contextlib.py", line 79, in inner return func(*args, **kwds) File "/usr/share/ceph/mgr/dashboard/controllers/rbd.py", line 105, in list return self._rbd_list(pool_name) File "/usr/share/ceph/mgr/dashboard/controllers/rbd.py", line 90, in _rbd_list status, value = RbdService.rbd_pool_list(pool) File "/usr/share/ceph/mgr/dashboard/tools.py", line 245, in wrapper return rvc.run(fn, args, kwargs) File "/usr/share/ceph/mgr/dashboard/tools.py", line 233, in run raise ViewCacheNoDataException() dashboard.exceptions.ViewCacheNoDataException: ViewCache: unable to retrieve data ::ffff:10.8.128.21 - - [08/Sep/2022:07:43:35] "GET /metrics HTTP/1.1" 200 1042631 "" "Prometheus/2.33.4" Error while calling fn=<function RbdService.rbd_pool_list at 0x7ff9cd3f7790> ex=[errno 19] error generating diff from snapshot None Traceback (most recent call last): File "/usr/share/ceph/mgr/dashboard/tools.py", line 147, in run val = self.fn(*self.args, **self.kwargs) File "/usr/share/ceph/mgr/dashboard/services/rbd.py", line 421, in rbd_pool_list stat = cls._rbd_image_stat( File "/usr/share/ceph/mgr/dashboard/services/rbd.py", line 386, in _rbd_image_stat return cls._rbd_image(ioctx, pool_name, namespace, image_name) File "/usr/share/ceph/mgr/dashboard/services/rbd.py", line 360, in _rbd_image total_prov_bytes, snaps_prov_bytes = cls._rbd_disk_usage( File "/usr/share/ceph/mgr/dashboard/services/rbd.py", line 256, in _rbd_disk_usage image.diff_iterate(0, size, prev_snap, du_callb, File "rbd.pyx", line 2770, in rbd.requires_not_closed.wrapper File "rbd.pyx", line 3925, in rbd.Image.diff_iterate rbd.OSError: [errno 19] error generating diff from snapshot None ::ffff:10.8.128.21 - - [08/Sep/2022:07:43:45] "GET /metrics HTTP/1.1" 200 1042626 "" "Prometheus/2.33.4" ::ffff:10.8.128.21 - - [08/Sep/2022:07:43:55] "GET /metrics HTTP/1.1" 200 1042626 "" "Prometheus/2.33.4" iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/_ping): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/_ping): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/_ping): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.1:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/sysinfo/hostname): [errno: 111] Connection refused iscsi REST API failed GET, connection error (url=http://10.1.172.2:5000/api/_ping): [errno: 111] Connection refused ceph status- [root@magna021 pnataraj]# ceph status cluster: id: c8ce6d50-c0a1-11ec-a99b-002590fc2a2e health: HEALTH_WARN 2 failed cephadm daemon(s) services: mon: 5 daemons, quorum magna021,magna022,magna024,magna025,magna026 (age 20h) mgr: magna022.icxgsh(active, since 20h), standbys: magna021.syfuos osd: 52 osds: 52 up (since 19h), 52 in (since 2d) rbd-mirror: 1 daemon active (1 hosts) data: pools: 17 pools, 1569 pgs objects: 2.39M objects, 9.1 TiB usage: 27 TiB used, 20 TiB / 48 TiB avail pgs: 1569 active+clean io: client: 520 KiB/s rd, 34 KiB/s wr, 623 op/s rd, 64 op/s wr [root@magna021 pnataraj]# ceph health detail HEALTH_WARN 2 failed cephadm daemon(s) [WRN] CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s) daemon iscsi.test.plena001.konnne on plena001 is in error state daemon iscsi.test.plena002.wgcgle on plena002 is in error state [root@magna021 pnataraj]#
issue is not seen in the latest version of RHCS 6.0. Attached screenshot for reference [ceph: root@magna021 /]# ceph versions { "mon": { "ceph version 17.2.3-36.el9cp (962ea0ecffbdeffd7352299549353cc6f71aa979) quincy (stable)": 5 }, "mgr": { "ceph version 17.2.3-36.el9cp (962ea0ecffbdeffd7352299549353cc6f71aa979) quincy (stable)": 2 }, "osd": { "ceph version 17.2.3-36.el9cp (962ea0ecffbdeffd7352299549353cc6f71aa979) quincy (stable)": 52 }, "mds": {}, "rbd-mirror": { "ceph version 17.2.3-36.el9cp (962ea0ecffbdeffd7352299549353cc6f71aa979) quincy (stable)": 1 }, "overall": { "ceph version 17.2.3-36.el9cp (962ea0ecffbdeffd7352299549353cc6f71aa979) quincy (stable)": 60 } }
*** Bug 2115668 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 6.0 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:1360