Description of problem: ----------------------- dd on nfs mount failed with "I/O Error" when add-brick on a replicate volume and subsequent rebalance was performed on that volume. Version-Release number of selected component (if applicable): ------------------------------------------------------------ 3.3.0qa45 Steps to Reproduce: ------------------ 1.Create a replicate volume(1x3: brick1, brick2, brick3) 2.Set "write-behind on" and "eager-lock on" on the volume 3.Start the volume 4.Create a nfs mount 5.execute the command "echo 3>/proc/sys/vm/drop_caches ; time dd if=/dev/urandom of=./file bs=2M count=2048" from nfs mount when dd is still in progress perform the following tasks: --------------------------------------------------------- 6.bounce bricks "brick2" and "brick3" 7.bring down "brick1" 8. add-bricks to the vol to change the volume from replicate to distribute-replicate (2x3) 9. bring back "brick1" 10. perform rebalance Actual results: -------------- [06/14/12 - 01:41:52 root@ARF-Client1 nfsc1]# echo 3>/proc/sys/vm/drop_caches ; time dd if=/dev/urandom of=./file bs=2M count=2048 dd: writing `./file': Input/output error dd: closing output file `./file': Input/output error real 8m53.063s user 0m0.006s sys 6m51.750s [06/14/12 - 01:51:17 root@ARF-Client1 nfsc1]# [06/14/12 - 01:53:34 root@ARF-Client1 nfsc1]# ls file [06/14/12 - 01:53:35 root@ARF-Client1 nfsc1]# ls -lh file ls: cannot access file: Remote I/O error [06/14/12 - 01:53:44 root@ARF-Client1 nfsc1]# stat file stat: cannot stat `file': Remote I/O error Expected results: ---------------- dd should not fail Additional info: ---------------- [06/14/12 - 01:51:03 root@AFR-Server1 ~]# gluster v info Volume Name: vol Type: Distributed-Replicate Volume ID: a14bdfdb-c4d7-4794-9924-4fa41a97883d Status: Started Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: 10.16.159.184:/export_b1/dir1 Brick2: 10.16.159.188:/export_b1/dir1 Brick3: 10.16.159.196:/export_b1/dir1 Brick4: 10.16.159.184:/export_c1/dir1 Brick5: 10.16.159.188:/export_c1/dir1 Brick6: 10.16.159.196:/export_c1/dir1 Options Reconfigured: cluster.eager-lock: on performance.write-behind: on nfs log output:- ------------------ [2012-06-14 01:51:09.198543] I [client-helpers.c:100:this_fd_set_ctx] 0-vol-client-5: <gfid:eb749521-3dc1-4bee-a5f5-ca251a082180> (eb749521-3dc1-4bee-a5f5-ca251a082180): trying duplicate remote fd set. [2012-06-14 01:51:09.198675] I [client-helpers.c:100:this_fd_set_ctx] 0-vol-client-3: <gfid:eb749521-3dc1-4bee-a5f5-ca251a082180> (eb749521-3dc1-4bee-a5f5-ca251a082180): trying duplicate remote fd set. [2012-06-14 01:51:09.199444] I [client-helpers.c:100:this_fd_set_ctx] 0-vol-client-4: <gfid:eb749521-3dc1-4bee-a5f5-ca251a082180> (eb749521-3dc1-4bee-a5f5-ca251a082180): trying duplicate remote fd set. [2012-06-14 01:51:09.372516] W [client3_1-fops.c:821:client3_1_writev_cbk] 0-vol-client-4: remote operation failed: Bad file descriptor [2012-06-14 01:51:09.373042] W [client3_1-fops.c:821:client3_1_writev_cbk] 0-vol-client-3: remote operation failed: Bad file descriptor [2012-06-14 01:51:09.373194] W [client3_1-fops.c:821:client3_1_writev_cbk] 0-vol-client-5: remote operation failed: Bad file descriptor [2012-06-14 01:51:09.373264] W [nfs3.c:2079:nfs3svc_write_cbk] 0-nfs: 21eed347: <gfid:eb749521-3dc1-4bee-a5f5-ca251a082180> => -1 (Bad file descriptor)
Created attachment 591745 [details] glusterfs logs, history of commands executed on storage_node1/node2/node3 and nfs mount
client translator returns error to the higher translators. need to figure out what exactly is happening. does not look like this is nfs related though the issue is not seen in fuse setup. needs more investigation.
This bug seems to be related to 815227. This is case where a non-distribute volume was converted to distribute volume. Please check if the issue exists when add-brick/rebalance of a distribute volume also errors out. The fix is in upstream.
*** This bug has been marked as a duplicate of bug 815227 ***