Bug 2230801
| Summary: | Unable to get "dump_osd_network" output via mgr admin socket | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Pawan <pdhiran> |
| Component: | RADOS | Assignee: | Radoslaw Zarzynski <rzarzyns> |
| Status: | CLOSED NOTABUG | QA Contact: | Pawan <pdhiran> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.1 | CC: | bhubbard, ceph-eng-bugs, cephqe-warriors, nojha, vumrao |
| Target Milestone: | --- | ||
| Target Release: | 7.1 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-08-11 16:43:56 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Description of problem: We observed that there were slow heartbeats on front and back interfaces of OSDs, and health warn had been generated for the same. [WARN] OSD_SLOW_PING_TIME_BACK: Slow OSD heartbeats on back (longest 3477.396ms) Slow OSD heartbeats on back from osd.4 [] to osd.11 [] 3477.396 msec Slow OSD heartbeats on back from osd.4 [] to osd.8 [] 3470.730 msec Slow OSD heartbeats on back from osd.13 [] to osd.8 [] 3229.386 msec Slow OSD heartbeats on back from osd.7 [] to osd.11 [] 3019.577 msec Slow OSD heartbeats on back from osd.7 [] to osd.8 [] 3012.205 msec Slow OSD heartbeats on back from osd.7 [] to osd.2 [] 2450.898 msec Slow OSD heartbeats on back from osd.7 [] to osd.5 [] 2450.715 msec Slow OSD heartbeats on back from osd.7 [] to osd.14 [] 2436.617 msec Slow OSD heartbeats on back from osd.13 [] to osd.14 [] 1833.005 msec Slow OSD heartbeats on back from osd.13 [] to osd.2 [] 1832.006 msec Truncated long network list. Use ceph daemon mgr.# dump_osd_network for more information [WARN] OSD_SLOW_PING_TIME_FRONT: Slow OSD heartbeats on front (longest 3019.537ms) Slow OSD heartbeats on front from osd.7 [] to osd.11 [] 3019.537 msec Slow OSD heartbeats on front from osd.7 [] to osd.8 [] 3014.470 msec Slow OSD heartbeats on front from osd.7 [] to osd.14 [] 2451.640 msec Slow OSD heartbeats on front from osd.7 [] to osd.5 [] 2450.600 msec Slow OSD heartbeats on front from osd.7 [] to osd.2 [] 2438.592 msec Slow OSD heartbeats on front from osd.13 [] to osd.2 [] 1826.537 msec Slow OSD heartbeats on front from osd.13 [] to osd.5 [] 1826.496 msec Slow OSD heartbeats on front from osd.13 [] to osd.11 [] 1820.281 msec Slow OSD heartbeats on front from osd.13 [] to osd.14 [] 1819.868 msec Slow OSD heartbeats on front from osd.13 [] to osd.8 [] 1816.324 msec Truncated long network list. Use ceph daemon mgr.# dump_osd_network for more information Looking at the info provided in the health_warn, We tried to get the "dump_osd_network" o/p as suggested, but the command fails with error, "invalid command" # ceph daemon /var/run/ceph/66070a80-2f84-11ee-bc2c-0cc47af3ea56/ceph-mgr.argo012.odttqx.asok dump_osd_network no valid command found; 10 closest matches: 0 1 2 abort assert config diff config diff get <var> config get <var> config help [<var>] config set <var> <val>... admin_socket: invalid command Ran command as specified in the document referenced below, but no luck! : # ceph daemon /var/run/ceph/66070a80-2f84-11ee-bc2c-0cc47af3ea56/ceph-mgr.argo012.odttqx.asok dump_osd_network 0 no valid command found; 10 closest matches: 0 1 2 abort assert config diff config diff get <var> config get <var> config help [<var>] config set <var> <val>... admin_socket: invalid command I am connected to the correct admin socket, as i'm getting o/p for other commands : # ceph daemon /var/run/ceph/66070a80-2f84-11ee-bc2c-0cc47af3ea56/ceph-mgr.argo012.odttqx.asok dump_cache { "cache": [] } Tried the same command with the OSD admin socket, and the command works : # ceph daemon /var/run/ceph/66070a80-2f84-11ee-bc2c-0cc47af3ea56/ceph-osd.23.asok dump_osd_network { "threshold": 1000, "entries": [] } The upstream guide says " This command is usually sent to a Ceph Manager Daemon, but it can be used to collect information about a specific OSD’s interactions by sending it to that OSD." -> Even though the guide says it works on both MGR and OSD, I am observing it to work only on OSDs. Reference : https://docs.ceph.com/en/latest/rados/operations/monitoring/#network-performance-checks Questions: 1. Are we running the command with wrong options? If yes, can you please share the usage for the command? 2. Is the command support for mgr revoked? If yes, guides and the health warn should be updated. Version-Release number of selected component (if applicable): # ceph version ceph version 17.2.6-100.el9cp (ea4e3ef8df2cf26540aae06479df031dcfc80343) quincy (stable) How reproducible: Always Steps to Reproduce: 1. Deploy RHCS 6.1 cluster, Upgrade the cluster to 6.1z1 2. Observe slow heartbeat warnings post upgrade. 3. Try running the "dump_osd_network" command to get details. 4. Error upon command execution Actual results: Command errors out stating invalid command Expected results: Command runs as expected Additional info: