Bug 1340772

Summary: Seeing Error Message "librbd::SnapshotRenameRequest: encountered error: (17) File exists"
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tanay Ganguly <tganguly>
Component: RBDAssignee: Jason Dillaman <jdillama>
Status: CLOSED ERRATA QA Contact: Rachana Patel <racpatel>
Severity: medium Docs Contact: Bara Ancincova <bancinco>
Priority: medium    
Version: 2.0CC: ceph-eng-bugs, hnallurv, jdillama, kurs, uboppana
Target Milestone: rc   
Target Release: 2.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: RHEL: ceph-10.2.3-2.el7cp Ubuntu: ceph_10.2.3-3redhat1xenial Doc Type: Bug Fix
Doc Text:
.Renaming snapshots no longer returns errors on overloaded clusters When a Ceph Storage Cluster was overloaded and an image was in-use for I/O operations, performing rename requests sometimes took unexpectedly long. Consequently, the RADOS Block Device (RBD) CLI kept sending the rename request every 5 seconds because it received a message that the request had timed out. This caused error messages were returned in the logs of the process performing the I/O operations on the image. This update fixes this bug, and the error log messages are no longer returned in the described scenario.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-22 19:26:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1322504, 1383917    
Attachments:
Description Flags
Debug log none

Description Tanay Ganguly 2016-05-30 09:25:41 UTC
Description of problem:
Seeing lots of Error message while Rename of Snapshot and IO is happening on the Parent Image.

Version-Release number of selected component (if applicable):
ceph version 10.2.1-6.el7cp

How reproducible:
2 Times

Steps to Reproduce:
1. Create an Image, take 100 Snap, and protect them.
rbd image 'testing2':
        size 102400 MB in 25600 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.124d642ae8944a
        format: 2
        features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, journaling
        flags: 
        journal: 124d642ae8944a
        mirroring state: disabled
2.Rename the created snapshot.
for i in {1..100}; do rbd snap rename cephfs_data/testing2@snap$i cephfs_data/testing2@snappey$i; done
3. Start Write-bench on the parent Volume.
rbd bench-write -p cephfs_data --image testing2 --io-size 10240

Started step 2 and 3 in parallel.

Actual results:
Both IO and renaming of snapshot is happening, still i am seeing lots of Error message.

2016-05-30 08:52:11.847718 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists
2016-05-30 08:52:12.293367 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists
2016-05-30 08:52:13.731278 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists
2016-05-30 08:52:16.385266 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists
2016-05-30 08:52:20.514545 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists
2016-05-30 08:52:24.469216 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists

Expected results:
There should not be any Error message.

Additional info:
Debug logs attached.

Comment 2 Tanay Ganguly 2016-05-30 09:34:41 UTC
Created attachment 1162744 [details]
Debug log

Please rename the file to 

mv log log.tar.gz

Comment 3 Jason Dillaman 2016-06-01 22:46:03 UTC
@Tanay: your cluster is overloaded on IO and isn't able to service the rename request.  As a result, your rbd CLI client is resending the request every 5 seconds because it was told the request timed out. I'd imagine this wouldn't be an issue on a non-overloaded cluster. It took ~30 seconds for the SnapRename journal event to be committed to disk, and the snap rename cannot proceed until it has been safely journaled (when journaling is enabled).

Comment 8 Jason Dillaman 2016-08-10 19:20:19 UTC
Upstream pull request: https://github.com/ceph/ceph/pull/9724

Comment 14 Rachana Patel 2016-10-21 16:23:58 UTC
verified with 10.2.3-8.el7cp.x86_64.

Followed procedure mentioned in bug description. no error messages hence moving to verified

Comment 18 errata-xmlrpc 2016-11-22 19:26:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2815.html