.Renaming snapshots no longer returns errors on overloaded clusters
When a Ceph Storage Cluster was overloaded and an image was in-use for I/O operations, performing rename requests sometimes took unexpectedly long. Consequently, the RADOS Block Device (RBD) CLI kept sending the rename request every 5 seconds because it received a message that the request had timed out. This caused error messages were returned in the logs of the process performing the I/O operations on the image. This update fixes this bug, and the error log messages are no longer returned in the described scenario.
Description of problem:
Seeing lots of Error message while Rename of Snapshot and IO is happening on the Parent Image.
Version-Release number of selected component (if applicable):
ceph version 10.2.1-6.el7cp
How reproducible:
2 Times
Steps to Reproduce:
1. Create an Image, take 100 Snap, and protect them.
rbd image 'testing2':
size 102400 MB in 25600 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.124d642ae8944a
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, journaling
flags:
journal: 124d642ae8944a
mirroring state: disabled
2.Rename the created snapshot.
for i in {1..100}; do rbd snap rename cephfs_data/testing2@snap$i cephfs_data/testing2@snappey$i; done
3. Start Write-bench on the parent Volume.
rbd bench-write -p cephfs_data --image testing2 --io-size 10240
Started step 2 and 3 in parallel.
Actual results:
Both IO and renaming of snapshot is happening, still i am seeing lots of Error message.
2016-05-30 08:52:11.847718 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists
2016-05-30 08:52:12.293367 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists
2016-05-30 08:52:13.731278 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists
2016-05-30 08:52:16.385266 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists
2016-05-30 08:52:20.514545 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists
2016-05-30 08:52:24.469216 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists
Expected results:
There should not be any Error message.
Additional info:
Debug logs attached.
@Tanay: your cluster is overloaded on IO and isn't able to service the rename request. As a result, your rbd CLI client is resending the request every 5 seconds because it was told the request timed out. I'd imagine this wouldn't be an issue on a non-overloaded cluster. It took ~30 seconds for the SnapRename journal event to be committed to disk, and the snap rename cannot proceed until it has been safely journaled (when journaling is enabled).
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://rhn.redhat.com/errata/RHSA-2016-2815.html