Bug 1291560 - Renames/deletes failed with "No such file or directory" when few of the bricks from the hot tier went offline
Summary: Renames/deletes failed with "No such file or directory" when few of the brick...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.1.2
Assignee: Bug Updates Notification Mailing List
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On:
Blocks: 1291701 1292046
TreeView+ depends on / blocked
 
Reported: 2015-12-15 07:21 UTC by spandura
Modified: 2016-09-17 12:18 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.7.5-13
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1291701 (view as bug list)
Environment:
Last Closed: 2016-03-01 06:03:52 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0193 0 normal SHIPPED_LIVE Red Hat Gluster Storage 3.1 update 2 2016-03-01 10:20:36 UTC

Description spandura 2015-12-15 07:21:49 UTC
Description of problem:
===========================
On a tiered volume, with 2x2 cold tier and 2x3 dis-rep hot tier was performing 
renames|deletes on files/dirs. When the bricks went offline, the renames/deletes  failed with "No such file or directory". The bricks from cold tier were all online as quorum was set. Only one of the bricks from the each sub-volume of the hot-tier went offline. 

Version-Release number of selected component (if applicable):
===============================================================
glusterfs 3.7.5 built on Dec  3 2015 11:30:45


How reproducible:
====================
Often

Steps to Reproduce:
======================
1. Create 2x2 dis-rep cold-tier and 2x3 dis-rep hot-tier volume. Start the volume. Mount the volume. 

2. From mount, create files/dirs. 

3. rename few of the files/dirs created to different name

4. While rename is in progress, crash the mounts of the bricks using "godown" utility.(available in xfsprogs). 


Actual results:
================
On mount, the rename fails with "No such file or directory"

Expected results:
====================
Renames/deletes shouldn't fail

Additional info:
===================
 
Volume Name: testvol
Type: Tier
Volume ID: 5a2f042d-ee04-4b3d-b5d5-d36e29cea325
Status: Started
Number of Bricks: 10
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 3 = 6
Brick1: rhsauto020.lab.eng.blr.redhat.com:/bricks/brick2/testvol_tier5
Brick2: rhsauto017.lab.eng.blr.redhat.com:/bricks/brick2/testvol_tier4
Brick3: rhsauto038.lab.eng.blr.redhat.com:/bricks/brick1/testvol_tier3
Brick4: rhsauto021.lab.eng.blr.redhat.com:/bricks/brick1/testvol_tier2
Brick5: rhsauto020.lab.eng.blr.redhat.com:/bricks/brick1/testvol_tier1
Brick6: rhsauto017.lab.eng.blr.redhat.com:/bricks/brick1/testvol_tier0
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick7: rhsauto017.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick0
Brick8: rhsauto020.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick1
Brick9: rhsauto021.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick2
Brick10: rhsauto038.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick3
Options Reconfigured:
diagnostics.brick-log-level: DEBUG
diagnostics.client-log-level: DEBUG
performance.readdir-ahead: on
features.ctr-enabled: on
cluster.tier-mode: cache
cluster.watermark-low: 75
cluster.watermark-hi: 90

Error messages seen in client log:
=================================
[2015-12-14 07:45:12.156546] E [MSGID: 114031] [client-rpc-fops.c:251:client3_3_mknod_cbk] 0-testvol-client-9: remote operation failed. Path: (null) [Input/output error]
[2015-12-14 07:45:12.159452] W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 0-testvol-client-0: remote operation failed [Device or resource busy]
[2015-12-14 07:45:12.159480] W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 0-testvol-client-1: remote operation failed [Device or resource busy]
[2015-12-14 07:45:12.160293] I [MSGID: 109069] [dht-common.c:1159:dht_lookup_unlink_stale_linkto_cbk] 0-testvol-tier-dht: Returned with op_ret -1 and op_errno 16 for /E_file_44

Comment 4 Krutika Dhananjay 2015-12-17 12:36:43 UTC
https://code.engineering.redhat.com/gerrit/#/c/64082/ <-- AFR fix

Comment 7 spandura 2016-01-22 10:06:11 UTC
Verified the issue on build: glusterfs-3.7.5-14.el7rhgs.x86_64. Bug is fixed . Moving the bug to verified state.

Comment 9 errata-xmlrpc 2016-03-01 06:03:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html


Note You need to log in before you can comment on or make changes to this bug.