Bug 2268179
| Summary: | ceph fs snap-schedule command is erroring with EIO: disk I/O error | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Amarnath <amk> |
| Component: | CephFS | Assignee: | Milind Changire <mchangir> |
| Status: | CLOSED ERRATA | QA Contact: | Hemanth Kumar <hyelloji> |
| Severity: | medium | Docs Contact: | Rivka Pollack <rpollack> |
| Priority: | unspecified | ||
| Version: | 7.1 | CC: | ceph-eng-bugs, cephqe-warriors, gfarnum, mchangir, ngangadh, rpollack, sumr, tserlin, vshankar |
| Target Milestone: | --- | Flags: | hyelloji:
needinfo-
hyelloji: needinfo- |
| Target Release: | 7.1z5 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | ceph-18.2.1-335.el9cp | Doc Type: | Bug Fix |
| Doc Text: |
.Improved Handling of fs_map Notifications After File-System Removal
Previously, after a file-system was removed from the cluster, the fs_map notification about the change was not handled properly. This oversight caused the snap_schedule Manager Module to continue accessing the associated snap_schedule SQLite Database in the metadata pool, which in turn resulted in disk I/O errors.
With this fix, all timers related to the file-system are now canceled and the SQLite Database connection is closed after deletion, helping ensure no invalid metadata pool references remain.
NOTE: A small window still exists between file-system deletion and notification processing, during which a snapshot schedule could run for a recently deleted file-system and occasionally report disk I/O errors in the Manager logs or at the console.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2025-06-23 02:51:40 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Amarnath
2024-03-06 14:08:56 UTC
Please specify the severity of this bug. Severity is defined here: https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity. (In reply to Amarnath from comment #0) > Description of problem: > ceph fs snap-schedule command is erroring with EIO: disk I/O error > > As part of test case we are creating FS with name cephfs_snap_1 and enabling > the > snap-schedule and it works fine and we are deleting the FS. > http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-TCN23X/ > snap_schedule_test_0.log > > on the same setup if we rerun the same test case. we are seeing the above > error > [root@ceph-amk-nfs-n308a3-node7 ~]# ceph fs snap-schedule add /dir_kernel 1m > --fs cephfs_snap_1 > Error EIO: disk I/O error > > Log : > http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-RQWN0B/ > snap_schedule_test_0.log > > mgr Log : > 2024-03-06T14:05:53.442+0000 7fd68f399640 0 [rbd_support INFO root] > MirrorSnapshotScheduleHandler: load_schedules > 2024-03-06T14:05:53.454+0000 7fd68ab90640 0 [rbd_support INFO root] > TrashPurgeScheduleHandler: load_schedules > 2024-03-06T14:05:54.007+0000 7fd67b532640 0 [volumes INFO mgr_util] > scanning for idle connections.. > 2024-03-06T14:05:54.007+0000 7fd67b532640 0 [volumes INFO mgr_util] > cleaning up connections: [] > 2024-03-06T14:05:54.387+0000 7fd6b2d6e640 0 log_channel(audit) log [DBG] : > from='client.25922 -' entity='client.admin' cmd=[{"prefix": "fs > snap-schedule add", "path": "/dir_kernel", "snap_schedule": "1m", "fs": > "cephfs_snap_1", "target": ["mon-mgr", ""]}]: dispatch > 2024-03-06T14:05:54.389+0000 7fd67dd37640 -1 client.14706: > SimpleRADOSStriper: lock: snap_db_v0.db: lock failed: (2) No such file or > directory That the schedules database being loaded where we do handle ENOENT: ``` with open_ioctx(self, pool_param) as ioctx: try: size, _mtime = ioctx.stat(SNAP_DB_OBJECT_NAME) dump = ioctx.read(SNAP_DB_OBJECT_NAME, size).decode('utf-8') db.executescript(dump) ioctx.remove_object(SNAP_DB_OBJECT_NAME) except rados.ObjectNotFound: log.debug(f'No legacy schedule DB found in {fs}') ``` Milind? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 7.1 security and bug fix updates), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2025:9335 |