Bug 831940 - dd on nfs mount failed with Input/output error
Summary: dd on nfs mount failed with Input/output error
Keywords:
Status: CLOSED DUPLICATE of bug 815227
Alias: None
Product: GlusterFS
Classification: Community
Component: nfs
Version: 3.3-beta
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Rajesh
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-06-14 06:43 UTC by Shwetha Panduranga
Modified: 2013-07-04 22:44 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-08-02 09:24:09 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
glusterfs logs, history of commands executed on storage_node1/node2/node3 and nfs mount (449.32 KB, application/x-gzip)
2012-06-14 06:46 UTC, Shwetha Panduranga
no flags Details

Description Shwetha Panduranga 2012-06-14 06:43:13 UTC
Description of problem:
-----------------------
dd on nfs mount failed with "I/O Error" when add-brick on a replicate volume and subsequent rebalance was performed on that volume. 

Version-Release number of selected component (if applicable):
------------------------------------------------------------
3.3.0qa45

Steps to Reproduce:
------------------
1.Create a replicate volume(1x3: brick1, brick2, brick3)

2.Set "write-behind on" and "eager-lock on" on the volume

3.Start the volume

4.Create a nfs mount

5.execute the command "echo 3>/proc/sys/vm/drop_caches ; time dd if=/dev/urandom of=./file bs=2M count=2048" from nfs mount

when dd is still in progress perform the following tasks:
---------------------------------------------------------
6.bounce bricks "brick2" and "brick3"

7.bring down "brick1"

8. add-bricks to the vol to change the volume from replicate to distribute-replicate (2x3)

9. bring back "brick1"

10. perform rebalance
  
Actual results:
--------------
[06/14/12 - 01:41:52 root@ARF-Client1 nfsc1]# echo 3>/proc/sys/vm/drop_caches ; time dd if=/dev/urandom of=./file bs=2M count=2048

dd: writing `./file': Input/output error
dd: closing output file `./file': Input/output error

real	8m53.063s
user	0m0.006s
sys	6m51.750s
[06/14/12 - 01:51:17 root@ARF-Client1 nfsc1]# 

[06/14/12 - 01:53:34 root@ARF-Client1 nfsc1]# ls
file

[06/14/12 - 01:53:35 root@ARF-Client1 nfsc1]# ls -lh file
ls: cannot access file: Remote I/O error

[06/14/12 - 01:53:44 root@ARF-Client1 nfsc1]# stat file
stat: cannot stat `file': Remote I/O error


Expected results:
----------------
dd should not fail

Additional info:
----------------
[06/14/12 - 01:51:03 root@AFR-Server1 ~]# gluster v info
 
Volume Name: vol
Type: Distributed-Replicate
Volume ID: a14bdfdb-c4d7-4794-9924-4fa41a97883d
Status: Started
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 10.16.159.184:/export_b1/dir1
Brick2: 10.16.159.188:/export_b1/dir1
Brick3: 10.16.159.196:/export_b1/dir1
Brick4: 10.16.159.184:/export_c1/dir1
Brick5: 10.16.159.188:/export_c1/dir1
Brick6: 10.16.159.196:/export_c1/dir1
Options Reconfigured:
cluster.eager-lock: on
performance.write-behind: on

nfs log output:-
------------------

[2012-06-14 01:51:09.198543] I [client-helpers.c:100:this_fd_set_ctx] 0-vol-client-5: <gfid:eb749521-3dc1-4bee-a5f5-ca251a082180> (eb749521-3dc1-4bee-a5f5-ca251a082180): trying duplicate remote fd set. 
[2012-06-14 01:51:09.198675] I [client-helpers.c:100:this_fd_set_ctx] 0-vol-client-3: <gfid:eb749521-3dc1-4bee-a5f5-ca251a082180> (eb749521-3dc1-4bee-a5f5-ca251a082180): trying duplicate remote fd set. 
[2012-06-14 01:51:09.199444] I [client-helpers.c:100:this_fd_set_ctx] 0-vol-client-4: <gfid:eb749521-3dc1-4bee-a5f5-ca251a082180> (eb749521-3dc1-4bee-a5f5-ca251a082180): trying duplicate remote fd set. 


[2012-06-14 01:51:09.372516] W [client3_1-fops.c:821:client3_1_writev_cbk] 0-vol-client-4: remote operation failed: Bad file descriptor
[2012-06-14 01:51:09.373042] W [client3_1-fops.c:821:client3_1_writev_cbk] 0-vol-client-3: remote operation failed: Bad file descriptor
[2012-06-14 01:51:09.373194] W [client3_1-fops.c:821:client3_1_writev_cbk] 0-vol-client-5: remote operation failed: Bad file descriptor
[2012-06-14 01:51:09.373264] W [nfs3.c:2079:nfs3svc_write_cbk] 0-nfs: 21eed347: <gfid:eb749521-3dc1-4bee-a5f5-ca251a082180> => -1 (Bad file descriptor)

Comment 1 Shwetha Panduranga 2012-06-14 06:46:51 UTC
Created attachment 591745 [details]
glusterfs logs, history of commands executed on storage_node1/node2/node3 and nfs mount

Comment 2 Krishna Srinivas 2012-06-15 07:24:56 UTC
client translator returns error to the higher translators. need to figure out what exactly is happening. does not look like this is nfs related though the issue is not seen in fuse setup. needs more investigation.

Comment 3 shishir gowda 2012-06-15 10:32:37 UTC
This bug seems to be related to 815227. This is case where a non-distribute volume was converted to distribute volume. Please check if the issue exists when add-brick/rebalance of a distribute volume also errors out. The fix is in upstream.

Comment 4 Rajesh 2012-08-02 09:24:09 UTC

*** This bug has been marked as a duplicate of bug 815227 ***


Note You need to log in before you can comment on or make changes to this bug.