Bug 808054

Summary: when a new brick is added to volume, dd fails to create few files on fuse mount
Product: [Community] GlusterFS Reporter: Shwetha Panduranga <shwetha.h.panduranga>
Component: fuseAssignee: Raghavendra G <rgowdapp>
Status: CLOSED DUPLICATE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: mainlineCC: amarts, gluster-bugs, vbellur, vinaraya
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-11 05:52:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Mount log file
none
Brick Log none

Description Shwetha Panduranga 2012-03-29 12:51:10 UTC
Created attachment 573659 [details]
Mount log file

Description of problem:

When dd was running on fuse mount and performed add-brick operation, dd failed to create a file reporting the following error and later after some time continued to create other files.

error Reported by dd:-
-----------------------
dd: opening `file.5': No such file or directory 

Mount log:-
-----------
[2012-03-29 21:40:43.715716] W [fuse-resolve.c:150:fuse_resolve_gfid_cbk] 0-fuse: 872f5a22-3577-43ec-a767-2ec75034c91b: failed to resolve (Invalid argument)
[2012-03-29 21:40:43.715838] E [fuse-bridge.c:520:fuse_getattr_resume] 0-glusterfs-fuse: 86053: GETATTR 140491251069300 (872f5a22-3577-43ec-a767-2ec75034c91b) resolution failed


Version-Release number of selected component (if applicable):
mainline
 
How reproducible:
often

script1:-
------
#!/bin/bash

mountpoint=`pwd`
for i in {1..5}
do 
	level1_dir=$mountpoint/fuse1.$i
	mkdir $level1_dir
	cd $level1_dir

	for j in {1..5}
	do
		level2_dir=dir.$j
		mkdir $level2_dir
		cd $level2_dir
 
		for k in {1..5}
		do 
			dd if=/dev/zero of=file.$k bs="$k"M count=1024 
		done
		cd $level1_dir
	done
	cd $mountpoint
done

Steps to Reproduce:
1.create distribute-replicate volume(2 X 3). start the volume.
2.create fuse, nfs mounts from client. 
3.run script1 from fuse mount.
4.add-brick to the volume while the script is still in progress.
  
Actual results:
[03/29/12 - 21:34:10 root@APP-CLIENT1 gfsc1]# /gfsc1.sh 
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 90.4418 s, 11.9 MB/s
1024+0 records in
1024+0 records out
2147483648 bytes (2.1 GB) copied, 81.4295 s, 26.4 MB/s
1024+0 records in
1024+0 records out
3221225472 bytes (3.2 GB) copied, 92.606 s, 34.8 MB/s
1024+0 records in
1024+0 records out
4294967296 bytes (4.3 GB) copied, 112.912 s, 38.0 MB/s

dd: opening `file.5': No such file or directory

1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 49.0801 s, 21.9 MB/s
1024+0 records in
1024+0 records out

Expected results:
Should not fail to create files. 

Additional Info:-
Brick log:-
-------

[2012-03-29 22:04:42.443695] I [server-handshake.c:571:server_setvolume] 0-dstore-server: accepted client from 192.168.2.35:1023 (version: 3git)
[2012-03-29 22:04:47.574594] I [server-handshake.c:571:server_setvolume] 0-dstore-server: accepted client from 192.168.2.36:1004 (version: 3git)
[2012-03-29 22:04:49.540744] I [server-handshake.c:571:server_setvolume] 0-dstore-server: accepted client from 192.168.2.37:1002 (version: 3git)
[2012-03-29 22:05:06.531019] I [server.c:679:server_rpc_notify] 0-dstore-server: disconnecting connectionfrom APP-SERVER2-24812-2012/03/29-22:04:38:415414-dstore-clien
t-0-0
[2012-03-29 22:05:06.556770] I [server.c:701:server_rpc_notify] 0-dstore-server: starting a grace timer for APP-SERVER2-24812-2012/03/29-22:04:38:415414-dstore-client-
0-0
[2012-03-29 22:05:12.832413] E [posix.c:3109:do_xattrop] 0-dstore-posix: getxattr failed on /export1/dstore1/.glusterfs/2b/8f/2b8f474f-3a5e-4c28-be4e-dbf7de9281e3 whil
e doing xattrop: Key:trusted.afr.dstore-client-0 (No such file or directory)
[2012-03-29 22:05:12.832495] I [server3_1-fops.c:1832:server_xattrop_cbk] 0-dstore-server: 293: XATTROP <gfid:2b8f474f-3a5e-4c28-be4e-dbf7de9281e3> (2b8f474f-3a5e-4c28
-be4e-dbf7de9281e3) ==> -1 (No such file or directory)

Comment 1 Shwetha Panduranga 2012-03-29 12:59:23 UTC
Created attachment 573666 [details]
Brick Log

Comment 2 Raghavendra G 2012-04-03 03:27:50 UTC

*** This bug has been marked as a duplicate of bug 803201 ***

Comment 3 Raghavendra G 2012-04-03 06:46:12 UTC

*** This bug has been marked as a duplicate of bug 802233 ***

Comment 4 Shwetha Panduranga 2012-04-12 05:11:37 UTC
This bug still exist on 3.3.0qa34. 

Client log output:-
-------------------

[2012-04-12 15:56:57.203408] I [client-handshake.c:1632:select_server_supported_programs] 1-dstore-client-1: Using Program GlusterFS 3.3.0qa34, Num (1298437), Version (330)
[2012-04-12 15:56:57.206200] I [client-handshake.c:1429:client_setvolume_cbk] 1-dstore-client-1: Connected to 192.168.2.36:24009, attached to remote volume '/export1/dstore1'.
[2012-04-12 15:56:57.206322] I [client-handshake.c:1441:client_setvolume_cbk] 1-dstore-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2012-04-12 15:56:57.224246] I [fuse-bridge.c:4119:fuse_graph_setup] 0-fuse: switched to graph 1
[2012-04-12 15:56:57.224416] I [client-handshake.c:456:client_set_lk_version_cbk] 1-dstore-client-1: Server lk version = 1
[2012-04-12 15:56:57.226690] I [afr-common.c:1877:afr_set_root_inode_on_first_lookup] 1-dstore-replicate-1: added root inode
[2012-04-12 15:56:57.229263] I [afr-common.c:1877:afr_set_root_inode_on_first_lookup] 1-dstore-replicate-0: added root inode
[2012-04-12 15:56:57.844956] I [client.c:2160:notify] 0-dstore-client-0: current graph is no longer active, destroying rpc_client 
[2012-04-12 15:56:57.845135] I [client.c:2160:notify] 0-dstore-client-1: current graph is no longer active, destroying rpc_client 
[2012-04-12 15:56:57.845276] I [client.c:136:client_register_grace_timer] 0-dstore-client-0: Registering a grace timer
[2012-04-12 15:56:57.845368] I [client.c:2099:client_rpc_notify] 0-dstore-client-0: disconnected
[2012-04-12 15:56:57.845493] I [client.c:136:client_register_grace_timer] 0-dstore-client-1: Registering a grace timer
[2012-04-12 15:56:57.845989] I [client.c:2099:client_rpc_notify] 0-dstore-client-1: disconnected
[2012-04-12 15:56:57.846059] E [afr-common.c:3572:afr_notify] 0-dstore-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2012-04-12 15:57:07.901973] W [client.c:112:client_grace_timeout] 0-dstore-client-0: client grace timer expired, updating the lk-version to 2
[2012-04-12 15:57:07.902101] W [client.c:112:client_grace_timeout] 0-dstore-client-1: client grace timer expired, updating the lk-version to 2
[2012-04-12 15:58:50.385619] W [fuse-resolve.c:152:fuse_resolve_gfid_cbk] 0-fuse: ef231ece-a3ac-408d-bdfa-862495feb4f7: failed to resolve (Invalid argument)
[2012-04-12 15:58:50.385782] E [fuse-bridge.c:539:fuse_getattr_resume] 0-glusterfs-fuse: 119441: GETATTR 140379421385568 (ef231ece-a3ac-408d-bdfa-862495feb4f7) resolution failed
[2012-04-12 15:58:50.393669] W [fuse-resolve.c:152:fuse_resolve_gfid_cbk] 0-fuse: ef231ece-a3ac-408d-bdfa-862495feb4f7: failed to resolve (Invalid argument)
[2012-04-12 15:58:50.393791] E [fuse-bridge.c:539:fuse_getattr_resume] 0-glusterfs-fuse: 119447: GETATTR 140379421385568 (ef231ece-a3ac-408d-bdfa-862495feb4f7) resolution failed


dd output:-
----------
[04/12/12 - 15:52:54 root@APP-CLIENT1 gfsc1]# /gfsc1.sh 
Creating File: /dir.1/file.1
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 55.9155 s, 19.2 MB/s
Creating File: /dir.1/file.2
1024+0 records in
1024+0 records out
2147483648 bytes (2.1 GB) copied, 106.658 s, 20.1 MB/s
Creating File: /dir.1/file.3
1024+0 records in
1024+0 records out
3221225472 bytes (3.2 GB) copied, 182.735 s, 17.6 MB/s
Creating File: /dir.1/file.4
dd: opening `file.4': No such file or directory
Creating File: /dir.1/file.5
dd: opening `file.5': No such file or directory

Comment 5 Raghavendra G 2012-04-24 00:17:01 UTC
This bug seems to have similar causes as bug #802233. The patch that was supposed to fix 802233 has not gone in. Please test the fix after a release containing the patch.

Comment 6 Raghavendra G 2012-05-01 04:01:24 UTC
Shwetha,

Now,that the required patch has gone in, Can you please verify whether this issue is fixed?

regards,
Raghavendra.

Comment 7 Amar Tumballi 2012-05-08 11:10:00 UTC
Also, this may be case of having just one brick and converting it to distributed volume with 'add-brick'. For now, we don't support the behavior.

Comment 8 Shwetha Panduranga 2012-05-11 05:52:11 UTC
The bug is fixed. verified on 3.3.0qa40.

*** This bug has been marked as a duplicate of bug 802233 ***