Bug 1340772

Summary:

Seeing Error Message "librbd::SnapshotRenameRequest: encountered error: (17) File exists"

Product:

[Red Hat Storage] Red Hat Ceph Storage

Reporter:

Tanay Ganguly <tganguly>

Component:

RBD

Assignee:

Jason Dillaman <jdillama>

Status:

CLOSED ERRATA

QA Contact:

Rachana Patel <racpatel>

Severity:

medium

Docs Contact:

Bara Ancincova <bancinco>

Priority:

medium

Version:

2.0

CC:

ceph-eng-bugs, hnallurv, jdillama, kurs, uboppana

Target Milestone:

Target Release:

2.1

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

RHEL: ceph-10.2.3-2.el7cp Ubuntu: ceph_10.2.3-3redhat1xenial

Doc Type:

Bug Fix

Doc Text:

.Renaming snapshots no longer returns errors on overloaded clusters When a Ceph Storage Cluster was overloaded and an image was in-use for I/O operations, performing rename requests sometimes took unexpectedly long. Consequently, the RADOS Block Device (RBD) CLI kept sending the rename request every 5 seconds because it received a message that the request had timed out. This caused error messages were returned in the logs of the process performing the I/O operations on the image. This update fixes this bug, and the error log messages are no longer returned in the described scenario.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-11-22 19:26:07 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1322504, 1383917

Attachments:

Description	Flags
Debug log	none

Description Tanay Ganguly 2016-05-30 09:25:41 UTC

Description of problem:
Seeing lots of Error message while Rename of Snapshot and IO is happening on the Parent Image.

Version-Release number of selected component (if applicable):
ceph version 10.2.1-6.el7cp

How reproducible:
2 Times

Steps to Reproduce:
1. Create an Image, take 100 Snap, and protect them.
rbd image 'testing2':
        size 102400 MB in 25600 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.124d642ae8944a
        format: 2
        features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, journaling
        flags: 
        journal: 124d642ae8944a
        mirroring state: disabled
2.Rename the created snapshot.
for i in {1..100}; do rbd snap rename cephfs_data/testing2@snap$i cephfs_data/testing2@snappey$i; done
3. Start Write-bench on the parent Volume.
rbd bench-write -p cephfs_data --image testing2 --io-size 10240

Started step 2 and 3 in parallel.

Actual results:
Both IO and renaming of snapshot is happening, still i am seeing lots of Error message.

2016-05-30 08:52:11.847718 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists
2016-05-30 08:52:12.293367 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists
2016-05-30 08:52:13.731278 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists
2016-05-30 08:52:16.385266 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists
2016-05-30 08:52:20.514545 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists
2016-05-30 08:52:24.469216 7f8329ffb700 -1 librbd::SnapshotRenameRequest: encountered error: (17) File exists

Expected results:
There should not be any Error message.

Additional info:
Debug logs attached.

Comment 2 Tanay Ganguly 2016-05-30 09:34:41 UTC

Created attachment 1162744 [details]
Debug log

Please rename the file to 

mv log log.tar.gz

Comment 3 Jason Dillaman 2016-06-01 22:46:03 UTC

@Tanay: your cluster is overloaded on IO and isn't able to service the rename request.  As a result, your rbd CLI client is resending the request every 5 seconds because it was told the request timed out. I'd imagine this wouldn't be an issue on a non-overloaded cluster. It took ~30 seconds for the SnapRename journal event to be committed to disk, and the snap rename cannot proceed until it has been safely journaled (when journaling is enabled).

Comment 8 Jason Dillaman 2016-08-10 19:20:19 UTC

Upstream pull request: https://github.com/ceph/ceph/pull/9724

Comment 14 Rachana Patel 2016-10-21 16:23:58 UTC

verified with 10.2.3-8.el7cp.x86_64.

Followed procedure mentioned in bug description. no error messages hence moving to verified

Comment 18 errata-xmlrpc 2016-11-22 19:26:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2815.html