Description of problem: Seeing lots of Error message while Rename of Snapshot and IO is happening on the Parent Image. Version-Release number of selected component (if applicable): ceph version 10.2.1-6.el7cp How reproducible: 2 Times Steps to Reproduce: 1. Create an Image, take 100 Snap, and protect them. rbd image 'testing2': size 102400 MB in 25600 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.124d642ae8944a format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, journaling flags: journal: 124d642ae8944a mirroring state: disabled 2.Rename the created snapshot. for i in {1..100}; do rbd snap rename cephfs_data/testing2@snap$i cephfs_data/testing2@snappey$i; done 3. Start Write-bench on the parent Volume. rbd bench-write -p cephfs_data --image testing2 --io-size 10240 Started step 2 and 3 in parallel. Actual results: Both IO and renaming of snapshot is happening, still i am seeing lots of Error message. 2016-05-30 08:52:11.847718 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists 2016-05-30 08:52:12.293367 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists 2016-05-30 08:52:13.731278 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists 2016-05-30 08:52:16.385266 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists 2016-05-30 08:52:20.514545 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists 2016-05-30 08:52:24.469216 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists Expected results: There should not be any Error message. Additional info: Debug logs attached.
Created attachment 1162744 [details] Debug log Please rename the file to mv log log.tar.gz
@Tanay: your cluster is overloaded on IO and isn't able to service the rename request. As a result, your rbd CLI client is resending the request every 5 seconds because it was told the request timed out. I'd imagine this wouldn't be an issue on a non-overloaded cluster. It took ~30 seconds for the SnapRename journal event to be committed to disk, and the snap rename cannot proceed until it has been safely journaled (when journaling is enabled).
Upstream pull request: https://github.com/ceph/ceph/pull/9724
verified with 10.2.3-8.el7cp.x86_64. Followed procedure mentioned in bug description. no error messages hence moving to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2815.html