Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2370370

Summary:	[8.x Backport] - ceph fs snap-schedule command is erroring with EIO: disk I/O error
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Hemanth Kumar <hyelloji>
Component:	CephFS	Assignee:	Milind Changire <mchangir>
Status:	CLOSED ERRATA	QA Contact:	Hemanth Kumar <hyelloji>
Severity:	medium	Docs Contact:	Rivka Pollack <rpollack>
Priority:	unspecified
Version:	8.0	CC:	ceph-eng-bugs, cephqe-warriors, gfarnum, mchangir, ngangadh, rpollack, vshankar
Target Milestone:	---
Target Release:	8.1z1
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	ceph-19.2.1-229	Doc Type:	Bug Fix
Doc Text:	.Improved handling of `fs_map` notifications after file system removal Previously, after a Ceph File System (CephFS) was removed from the cluster, the `fs_map` notification about the change was not handled properly. This oversight caused the `snap_schedule` manager module to continue accessing the associated `snap_schedule` SQLite Database in the metadata pool. As a result, disk I/O errors occured. With this fix, all timers related to the CephFS are now canceled and the SQLite Database connection is closed after deletion, helping ensure no invalid metadata pool references remain. NOTE: A small window still exists between CephFS deletion and notification processing, during which a snapshot schedule could run for a recently deleted CephFS and occasionally report disk I/O errors in the manager logs or at the console.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2025-08-18 14:02:03 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Hemanth Kumar 2025-06-05 03:24:47 UTC

This bug was initially created as a copy of Bug #2268179

I am copying this bug because: The Upstream tracker has backports created for all 3 releases, quincy, reef and squid



Description of problem:
ceph fs snap-schedule command is erroring with EIO: disk I/O error

As part of test case we are creating FS with name cephfs_snap_1 and enabling the 
snap-schedule and it works fine and we are deleting the FS.
http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-TCN23X/snap_schedule_test_0.log

on the same setup if we rerun the same test case. we are seeing the above error
[root@ceph-amk-nfs-n308a3-node7 ~]# ceph fs snap-schedule add /dir_kernel 1m --fs cephfs_snap_1
Error EIO: disk I/O error

Log : http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-RQWN0B/snap_schedule_test_0.log

mgr Log : 
2024-03-06T14:05:53.442+0000 7fd68f399640  0 [rbd_support INFO root] MirrorSnapshotScheduleHandler: load_schedules
2024-03-06T14:05:53.454+0000 7fd68ab90640  0 [rbd_support INFO root] TrashPurgeScheduleHandler: load_schedules
2024-03-06T14:05:54.007+0000 7fd67b532640  0 [volumes INFO mgr_util] scanning for idle connections..
2024-03-06T14:05:54.007+0000 7fd67b532640  0 [volumes INFO mgr_util] cleaning up connections: []
2024-03-06T14:05:54.387+0000 7fd6b2d6e640  0 log_channel(audit) log [DBG] : from='client.25922 -' entity='client.admin' cmd=[{"prefix": "fs snap-schedule add", "path": "/dir_kernel", "snap_schedule": "1m", "fs": "cephfs_snap_1", "target": ["mon-mgr", ""]}]: dispatch
2024-03-06T14:05:54.389+0000 7fd67dd37640 -1 client.14706: SimpleRADOSStriper: lock: snap_db_v0.db:  lock failed: (2) No such file or directory
2024-03-06T14:05:54.390+0000 7fd67dd37640 -1 mgr.server reply reply (5) Input/output error disk I/O error

Logs : http://magna002.ceph.redhat.com/ceph-qe-logs/amar/snap-scedule/ceph-mgr.ceph-amk-nfs-n308a3-node1-installer.ddwlwo.log 


Version-Release number of selected component (if applicable):
[root@ceph-amk-nfs-n308a3-node7 ~]# ceph versions
{
    "mon": {
        "ceph version 18.2.1-46.el9cp (141acb8d05e675ccf507f89585369b2c90c6d4a9) reef (stable)": 3
    },
    "mgr": {
        "ceph version 18.2.1-46.el9cp (141acb8d05e675ccf507f89585369b2c90c6d4a9) reef (stable)": 2
    },
    "osd": {
        "ceph version 18.2.1-46.el9cp (141acb8d05e675ccf507f89585369b2c90c6d4a9) reef (stable)": 12
    },
    "mds": {
        "ceph version 18.2.1-46.el9cp (141acb8d05e675ccf507f89585369b2c90c6d4a9) reef (stable)": 5
    },
    "overall": {
        "ceph version 18.2.1-46.el9cp (141acb8d05e675ccf507f89585369b2c90c6d4a9) reef (stable)": 22
    }
}


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 8 errata-xmlrpc 2025-08-18 14:02:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 8.1 security and bug fix updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2025:14015