Bug 2276862
Summary: | [CephFS-Mirror] - Traceback error seen while running - "ceph fs snapshot mirror daemon status" | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Hemanth Kumar <hyelloji> |
Component: | CephFS | Assignee: | Jos Collin <jcollin> |
Status: | CLOSED ERRATA | QA Contact: | Hemanth Kumar <hyelloji> |
Severity: | high | Docs Contact: | Akash Raj <akraj> |
Priority: | unspecified | ||
Version: | 7.1 | CC: | akraj, ceph-eng-bugs, cephqe-warriors, jcollin, tserlin, vshankar |
Target Milestone: | --- | ||
Target Release: | 7.1z1 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | ceph-18.2.1-205.el9cp | Doc Type: | Bug Fix |
Doc Text: |
Previously, the `directory_count key` was missing in `self.mgr.get_daemon_status()` output json,
intermittently when there was a delay caused by `m_listener.handle_mirroring_enabled()` to update the `directory_count`. This resulted in `ServiceDaemon::update_status()` creating a json without `directory_count` key/value. This issue would occur intermittently when mirroring was enabled/disabled and 'daemon status' was checked in between. Due to this, ceph fs snapshot mirror daemon status would show `KeyError: 'directory_count'` when mirroring is disabled and enabled repeatedly.
With this fix, the patch sets a default value 0 for `directory_count` in `doemon_status()` and the key error in `ceph fs snapshot mirror daemon status` no longer occurs.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2024-08-07 11:21:52 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Comment 1
Jos Collin
2024-04-24 15:39:26 UTC
(In reply to Jos Collin from comment #1) > Hemanth, > > Could you please attach the mirror logs and mgr logs to check ? > > Thanks. ALl requested logs are uploaded here -- http://magna002.ceph.redhat.com/ceph-qe-logs/hemanth_k/Bug_2276862/ 'ceph fs snapshot mirror daemon status' didn't show KeyError for me, by following the steps from the mgr logs. It was still showing: [{"daemon_id": 4161, "filesystems": [{"filesystem_id": 1, "name": "a", "directory_count": 2, "peers": [{"uuid": "f2b6795b-7d35-4ec0-8507-0713676fae2b", "remote": {"client_name": "client.mirror_remote", "cluster_name": "ceph", "fs_name": "remotefs"}, "stats": {"failure_count": 0, "recovery_count": 0}}]}]}] So I couldn't reproduce this issue and it should be intermittent. @Hemanth, could you please check and provide reproducing steps for hitting the KeyError? So that I could fix what's causing the issue instead of a workaround. (In reply to Hemanth Kumar from comment #2) > (In reply to Jos Collin from comment #1) > > Hemanth, > > > > Could you please attach the mirror logs and mgr logs to check ? > > > > Thanks. > > ALl requested logs are uploaded here -- > > http://magna002.ceph.redhat.com/ceph-qe-logs/hemanth_k/Bug_2276862/ `SERVICE_DAEMON_DIR_COUNT_KEY` is only updated when a directory is added. See: FSMirror::handle_acquire_directory(). Jos, please try to reproduce on a fresh cluster without any dirs added. it's not always reproducible. It's intermittent. But when it errored, self.mgr.get_daemon_status returns the below json. When the daemon status ran again after sometime, the KeyError is gone. mgr.x.log:2024-05-03T18:34:57.208+0530 7f18d3a046c0 0 [mirroring DEBUG mirroring.fs.snapshot_mirror] daemon_status: {'status_json': '{}'} mgr.x.log:2024-05-03T18:35:05.077+0530 7f18d3a046c0 0 [mirroring DEBUG mirroring.fs.snapshot_mirror] daemon_status: {'status_json': '{"1":{"name":"a","peers":{}}}'} (In reply to Jos Collin from comment #5) > it's not always reproducible. It's intermittent. But when it errored, > self.mgr.get_daemon_status returns the below json. When the daemon status > ran again after sometime, the KeyError is gone. > > mgr.x.log:2024-05-03T18:34:57.208+0530 7f18d3a046c0 0 [mirroring DEBUG > mirroring.fs.snapshot_mirror] daemon_status: {'status_json': '{}'} > mgr.x.log:2024-05-03T18:35:05.077+0530 7f18d3a046c0 0 [mirroring DEBUG > mirroring.fs.snapshot_mirror] daemon_status: {'status_json': > '{"1":{"name":"a","peers":{}}}'} Have you identified the case where the keys go missing from in the JSON output? The fix is likely to be verifying if the key exists and then accessing it, but I would still like to know under which circumstances the keys are missing the daemon status JSON. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 7.1 security and bug fix update.), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:5080 |