Bug 1340772 - Seeing Error Message "librbd::SnapshotRenameRequest: encountered error: (17) File exists"
Summary: Seeing Error Message "librbd::SnapshotRenameRequest: encountered error: (17) ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RBD
Version: 2.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: 2.1
Assignee: Jason Dillaman
QA Contact: Rachana Patel
Bara Ancincova
URL:
Whiteboard:
Depends On:
Blocks: 1322504 1383917
TreeView+ depends on / blocked
 
Reported: 2016-05-30 09:25 UTC by Tanay Ganguly
Modified: 2017-07-30 15:28 UTC (History)
5 users (show)

Fixed In Version: RHEL: ceph-10.2.3-2.el7cp Ubuntu: ceph_10.2.3-3redhat1xenial
Doc Type: Bug Fix
Doc Text:
.Renaming snapshots no longer returns errors on overloaded clusters When a Ceph Storage Cluster was overloaded and an image was in-use for I/O operations, performing rename requests sometimes took unexpectedly long. Consequently, the RADOS Block Device (RBD) CLI kept sending the rename request every 5 seconds because it received a message that the request had timed out. This caused error messages were returned in the logs of the process performing the I/O operations on the image. This update fixes this bug, and the error log messages are no longer returned in the described scenario.
Clone Of:
Environment:
Last Closed: 2016-11-22 19:26:07 UTC
Target Upstream Version:


Attachments (Terms of Use)
Debug log (3.58 MB, application/x-gzip)
2016-05-30 09:34 UTC, Tanay Ganguly
no flags Details


Links
System ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 16130 None None None 2016-06-02 18:24:02 UTC
Red Hat Product Errata RHSA-2016:2815 normal SHIPPED_LIVE Moderate: Red Hat Ceph Storage security, bug fix, and enhancement update 2017-03-22 02:06:33 UTC

Description Tanay Ganguly 2016-05-30 09:25:41 UTC
Description of problem:
Seeing lots of Error message while Rename of Snapshot and IO is happening on the Parent Image.

Version-Release number of selected component (if applicable):
ceph version 10.2.1-6.el7cp

How reproducible:
2 Times

Steps to Reproduce:
1. Create an Image, take 100 Snap, and protect them.
rbd image 'testing2':
        size 102400 MB in 25600 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.124d642ae8944a
        format: 2
        features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, journaling
        flags: 
        journal: 124d642ae8944a
        mirroring state: disabled
2.Rename the created snapshot.
for i in {1..100}; do rbd snap rename cephfs_data/testing2@snap$i cephfs_data/testing2@snappey$i; done
3. Start Write-bench on the parent Volume.
rbd bench-write -p cephfs_data --image testing2 --io-size 10240

Started step 2 and 3 in parallel.

Actual results:
Both IO and renaming of snapshot is happening, still i am seeing lots of Error message.

2016-05-30 08:52:11.847718 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists
2016-05-30 08:52:12.293367 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists
2016-05-30 08:52:13.731278 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists
2016-05-30 08:52:16.385266 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists
2016-05-30 08:52:20.514545 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists
2016-05-30 08:52:24.469216 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists

Expected results:
There should not be any Error message.

Additional info:
Debug logs attached.

Comment 2 Tanay Ganguly 2016-05-30 09:34:41 UTC
Created attachment 1162744 [details]
Debug log

Please rename the file to 

mv log log.tar.gz

Comment 3 Jason Dillaman 2016-06-01 22:46:03 UTC
@Tanay: your cluster is overloaded on IO and isn't able to service the rename request.  As a result, your rbd CLI client is resending the request every 5 seconds because it was told the request timed out. I'd imagine this wouldn't be an issue on a non-overloaded cluster. It took ~30 seconds for the SnapRename journal event to be committed to disk, and the snap rename cannot proceed until it has been safely journaled (when journaling is enabled).

Comment 8 Jason Dillaman 2016-08-10 19:20:19 UTC
Upstream pull request: https://github.com/ceph/ceph/pull/9724

Comment 14 Rachana Patel 2016-10-21 16:23:58 UTC
verified with 10.2.3-8.el7cp.x86_64.

Followed procedure mentioned in bug description. no error messages hence moving to verified

Comment 18 errata-xmlrpc 2016-11-22 19:26:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2815.html


Note You need to log in before you can comment on or make changes to this bug.