Bug 763996 (GLUSTER-2264)

Summary: setting the volumes options throws error on nfs mount
Product: [Community] GlusterFS Reporter: Saurabh <saurabh>
Component: nfsAssignee: Kaushal <kaushal>
Status: CLOSED WONTFIX QA Contact:
Severity: low Docs Contact:
Priority: high    
Version: 3.1.1CC: amarts, anush, gluster-bugs, krishna, nuaa_liuben, vijay
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-04-27 05:18:11 UTC Type: ---
Regression: RTNR Mount Type: nfs
Documentation: DNR CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Saurabh 2010-12-31 07:57:26 UTC
Hello,

  while testing for 3.1.2 and I figured that setting/changing the volume options 
create a problem on the nfs mount. If an operation is going on over nfs mount, the operation either just thows error for "Input/Output" or the operations gets interrupted.

The problem found on 3.1.2.qa2 and 3.1.2qa3.

Presently, the test case was
1. Dist-replicate volume with rdma as transport,
2. mount it on a client using gNFS.
3. on the server set the option like,
   volume set repdist diagnostics.brick-log-level DEBUG
4. start the iozone on the nfs mount 
5 change the option to TRACE
6. Iozone fails,

Also, to mention this is not just related to rdma as transport even for tcp it fails in similar fashion for "distribute" volume as well.

Also, I have other ways also like running touch command to create 10000 files, the files were created but input/output error was displayed on the screen of client while volume options were changed on server. Even I have tried running some operation when there was no log-level set and operation failed eventually while playing with volume options.

The problem is not seen on fuse mount.

   Logs of one of the failure are here,
   gluster> volume set repdist diagnostics.brick-log-level DEBUG
Set volume successful
gluster> volume set repdist diagnostics.brick-log-level TRACE
Set volume successful
[saurabh@client10 nfs-test]$ time sudo /opt/qa/tools/Iozone -i 0 -i 1 -i 2 -i 3 -i 4 -i 5 -i 6 -i 7 -i 8 -i 9 -i 10 -i 11 -i 12 -s 2g -r 22k
Password:
	Iozone: Performance Test of File I/O
	        Version $Revision: 3.326 $
		Compiled for 64 bit mode.
		Build: linux 

	Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
	             Al Slater, Scott Rhine, Mike Wisner, Ken Goss
	             Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
	             Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
	             Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy,
	             Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root.

	Run began: Thu Dec 30 23:21:02 2010

	Selected test not available on the version.
	File size set to 2097152 KB
	Record Size 22 KB
	Command line used: /opt/qa/tools/Iozone -i 0 -i 1 -i 2 -i 3 -i 4 -i 5 -i 6 -i 7 -i 8 -i 9 -i 10 -i 11 -i 12 -s 2g -r 22k
	Output is in Kbytes/sec
	Time Resolution = 0.000001 seconds.
	Processor cache size set to 1024 Kbytes.
	Processor cache line size set to 32 bytes.
	File stride size set to 17 * record size.
                                                            random  random    bkwd   record   stride                                   
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152      22
Error writing block 38266, fd= 3
write: No such file or directory

iozone: interrupted

exiting iozone



real	1m10.276s
user	0m0.039s
sys	0m1.462s


TCP configurations:-
2 centos servers, with Distribute volume on them
1 ubuntu client.

RDMA configuration:
2 Centos server, with dist-rep volume on them
1 centos client.

Please, let me know if one needs more information.

-Saurabh

Comment 1 Anush Shetty 2011-01-04 01:52:04 UTC
I think this is a problem with nfs server getting restarted even when the log-level is being changed. NFS server shouldn't be restarted for log-level change commands


[2011-01-04 09:15:52.212992] I [glusterfsd-mgmt.c:59:mgmt_cbk_spec] mgmt: Volume file changed
[2011-01-04 09:15:52.223008] I [server.c:428:server_rpc_notify] test-server: disconnected connection from 127.0.0.1:1016
[2011-01-04 09:15:52.223079] I [server-helpers.c:670:server_connection_destroy] test-server: destroyed connection of pitta-2929-2011/01/04-09:14:51:753085-test-client-1
[2011-01-04 09:15:53.239786] I [xlator.c:1279:is_gf_log_command] glusterfs: setting log level to 8 (old-value=7)
[2011-01-04 09:15:53.239824] D [io-stats.c:1599:reconfigure] /mnt/s2: changing log-level to DEBUG
[2011-01-04 09:15:53.239846] D [xlator.c:974:xlator_reconfigure_rec] /mnt/s2: reconfigured
[2011-01-04 09:15:53.239895] D [server.c:619:reconfigure] : returning 0
[2011-01-04 09:15:53.239916] D [glusterfsd-mgmt.c:373:mgmt_getspec_cbk] glusterfsd-mgmt: No need to re-load volfile, reconfigure done

Comment 2 Shehjar Tikoo 2011-02-09 07:30:37 UTC
For an explanation of why we cant fix this atm, see point (b) in comment 4 for:

http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2161#c4

Comment 3 Shehjar Tikoo 2011-03-16 06:00:45 UTC
Ok, this needs to be fixed. I was being stupid when I set the status to resolved.
Assigning to Kaushik since this needs a change in the command line code for nfs.

Comment 4 Vijay Bellur 2011-09-08 03:50:46 UTC
we need Duplicate Reply cache support in RPC.

Comment 5 kaushik 2011-10-17 08:29:37 UTC
Krishna is looking into this issue.

*** This bug has been marked as a duplicate of bug 3725 ***

Comment 6 Krishna Srinivas 2011-10-17 08:47:52 UTC
(In reply to comment #5)
> Krishna is looking into this issue.
> 
> *** This bug has been marked as a duplicate of bug 765457 ***

Koushik, this is a different but which needs duplicate cache reply in RPC (as mentioned above by vijay)

Comment 7 Kaushal 2012-04-27 05:18:11 UTC
Closing as wontfix. File a new bug if this is required.