Bug 831940

Summary: dd on nfs mount failed with Input/output error
Product: [Community] GlusterFS Reporter: Shwetha Panduranga <shwetha.h.panduranga>
Component: nfsAssignee: Rajesh <rajesh>
Status: CLOSED DUPLICATE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.3-betaCC: gluster-bugs, sgowda, vagarwal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-02 05:24:09 EDT Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Attachments:
Description Flags
glusterfs logs, history of commands executed on storage_node1/node2/node3 and nfs mount none

Description Shwetha Panduranga 2012-06-14 02:43:13 EDT
Description of problem:
-----------------------
dd on nfs mount failed with "I/O Error" when add-brick on a replicate volume and subsequent rebalance was performed on that volume. 

Version-Release number of selected component (if applicable):
------------------------------------------------------------
3.3.0qa45

Steps to Reproduce:
------------------
1.Create a replicate volume(1x3: brick1, brick2, brick3)

2.Set "write-behind on" and "eager-lock on" on the volume

3.Start the volume

4.Create a nfs mount

5.execute the command "echo 3>/proc/sys/vm/drop_caches ; time dd if=/dev/urandom of=./file bs=2M count=2048" from nfs mount

when dd is still in progress perform the following tasks:
---------------------------------------------------------
6.bounce bricks "brick2" and "brick3"

7.bring down "brick1"

8. add-bricks to the vol to change the volume from replicate to distribute-replicate (2x3)

9. bring back "brick1"

10. perform rebalance
  
Actual results:
--------------
[06/14/12 - 01:41:52 root@ARF-Client1 nfsc1]# echo 3>/proc/sys/vm/drop_caches ; time dd if=/dev/urandom of=./file bs=2M count=2048

dd: writing `./file': Input/output error
dd: closing output file `./file': Input/output error

real	8m53.063s
user	0m0.006s
sys	6m51.750s
[06/14/12 - 01:51:17 root@ARF-Client1 nfsc1]# 

[06/14/12 - 01:53:34 root@ARF-Client1 nfsc1]# ls
file

[06/14/12 - 01:53:35 root@ARF-Client1 nfsc1]# ls -lh file
ls: cannot access file: Remote I/O error

[06/14/12 - 01:53:44 root@ARF-Client1 nfsc1]# stat file
stat: cannot stat `file': Remote I/O error


Expected results:
----------------
dd should not fail

Additional info:
----------------
[06/14/12 - 01:51:03 root@AFR-Server1 ~]# gluster v info
 
Volume Name: vol
Type: Distributed-Replicate
Volume ID: a14bdfdb-c4d7-4794-9924-4fa41a97883d
Status: Started
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 10.16.159.184:/export_b1/dir1
Brick2: 10.16.159.188:/export_b1/dir1
Brick3: 10.16.159.196:/export_b1/dir1
Brick4: 10.16.159.184:/export_c1/dir1
Brick5: 10.16.159.188:/export_c1/dir1
Brick6: 10.16.159.196:/export_c1/dir1
Options Reconfigured:
cluster.eager-lock: on
performance.write-behind: on

nfs log output:-
------------------

[2012-06-14 01:51:09.198543] I [client-helpers.c:100:this_fd_set_ctx] 0-vol-client-5: <gfid:eb749521-3dc1-4bee-a5f5-ca251a082180> (eb749521-3dc1-4bee-a5f5-ca251a082180): trying duplicate remote fd set. 
[2012-06-14 01:51:09.198675] I [client-helpers.c:100:this_fd_set_ctx] 0-vol-client-3: <gfid:eb749521-3dc1-4bee-a5f5-ca251a082180> (eb749521-3dc1-4bee-a5f5-ca251a082180): trying duplicate remote fd set. 
[2012-06-14 01:51:09.199444] I [client-helpers.c:100:this_fd_set_ctx] 0-vol-client-4: <gfid:eb749521-3dc1-4bee-a5f5-ca251a082180> (eb749521-3dc1-4bee-a5f5-ca251a082180): trying duplicate remote fd set. 


[2012-06-14 01:51:09.372516] W [client3_1-fops.c:821:client3_1_writev_cbk] 0-vol-client-4: remote operation failed: Bad file descriptor
[2012-06-14 01:51:09.373042] W [client3_1-fops.c:821:client3_1_writev_cbk] 0-vol-client-3: remote operation failed: Bad file descriptor
[2012-06-14 01:51:09.373194] W [client3_1-fops.c:821:client3_1_writev_cbk] 0-vol-client-5: remote operation failed: Bad file descriptor
[2012-06-14 01:51:09.373264] W [nfs3.c:2079:nfs3svc_write_cbk] 0-nfs: 21eed347: <gfid:eb749521-3dc1-4bee-a5f5-ca251a082180> => -1 (Bad file descriptor)
Comment 1 Shwetha Panduranga 2012-06-14 02:46:51 EDT
Created attachment 591745 [details]
glusterfs logs, history of commands executed on storage_node1/node2/node3 and nfs mount
Comment 2 Krishna Srinivas 2012-06-15 03:24:56 EDT
client translator returns error to the higher translators. need to figure out what exactly is happening. does not look like this is nfs related though the issue is not seen in fuse setup. needs more investigation.
Comment 3 shishir gowda 2012-06-15 06:32:37 EDT
This bug seems to be related to 815227. This is case where a non-distribute volume was converted to distribute volume. Please check if the issue exists when add-brick/rebalance of a distribute volume also errors out. The fix is in upstream.
Comment 4 Rajesh 2012-08-02 05:24:09 EDT

*** This bug has been marked as a duplicate of bug 815227 ***