Bug 1060676

Summary: [add-brick]: I/O on NFS fails when bricks are added to a distribute-replicate volume
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Sachidananda Urs <surs>
Component: distributeAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED ERRATA QA Contact: storage-qa-internal <storage-qa-internal>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.0CC: annair, byarlaga, nbalacha, nlevinki, rgowdapp, rmekala, sankarshan, smohan, srangana, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.1.2   
Hardware: x86_64   
OS: Linux   
Whiteboard: dht-add-brick
Fixed In Version: glusterfs-3.7.5-6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-01 05:22:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1260783    
Attachments:
Description Flags
NFS client logs none

Description Sachidananda Urs 2014-02-03 10:40:18 UTC
Created attachment 858495 [details]
NFS client logs

Description of problem:

When new bricks are added to distribute-replicate volume, IO fails on NFS mount with error messages:

dd: opening `dir.94/file.100': No such file or directory
mkdir: cannot create directory `dir.95': Invalid argument
dd: opening `dir.95/file.1': No such file or directory

Attaching nfs logs. No errors reported in glusterd logs.

Version-Release number of selected component (if applicable):

glusterfs 3.4.0.58rhs built on Jan 25 2014 07:04:06

How reproducible:
Always.

Steps to Reproduce:
1. Create a 2x2 volume, and do some IO on nfs mount
2. Peer probe two more machine
3. add-brick to the cluster

IO will fail on the mount

Expected results:

add-brick should be seamless.

Comment 2 santosh pradhan 2014-06-24 08:28:40 UTC
NFS protocol has no relation with brick operation like adding/removing or rebalance. This needs to be looked by DHT or AFR why the NFS fop fails, mostly DHT team.

Comment 3 Shyamsundar 2014-08-13 13:53:41 UTC
From the logs, I see that there are errors in the dht_access function which had an issue of treating directories as files in certain cases, where the cluster is expanded (i.e bricks added etc.).

This is being fixed as a part of the bug #1125824

Once fixed there and downstream, would like to repeat this test case to ensure that this problem is not present.

Comment 4 Raghavendra G 2015-11-10 05:16:33 UTC
A duplicate of:
https://bugzilla.redhat.com/show_bug.cgi?id=1278399

Fixed by:
https://code.engineering.redhat.com/gerrit/#/c/61036/2

With Patch #61036 and fixes to dht-access, this issue should be fixed.

Comment 6 RajeshReddy 2015-11-23 07:40:32 UTC
Tested with build glusterfs-server-3.7.5-6, created 2x2 volume and mounted it on client using nfs and created 200 deep directories and cd to the leaf directory (../dir199/dir200) and then added two new bricks to the volume and while re-balance is going on, from the client able to run ls  and mkdir so marking this bug a verified

Comment 8 errata-xmlrpc 2016-03-01 05:22:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html