Bug 763996 (GLUSTER-2264)

Summary:	setting the volumes options throws error on nfs mount
Product:	[Community] GlusterFS	Reporter:	Saurabh <saurabh>
Component:	nfs	Assignee:	Kaushal <kaushal>
Status:	CLOSED WONTFIX	QA Contact:
Severity:	low	Docs Contact:
Priority:	high
Version:	3.1.1	CC:	amarts, anush, gluster-bugs, krishna, nuaa_liuben, vijay
Target Milestone:	---	Keywords:	Reopened
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-04-27 05:18:11 UTC	Type:	---
Regression:	RTNR	Mount Type:	nfs
Documentation:	DNR	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Saurabh 2010-12-31 07:57:26 UTC

Hello,

while testing for 3.1.2 and I figured that setting/changing the volume options
create a problem on the nfs mount. If an operation is going on over nfs mount, the operation either just thows error for "Input/Output" or the operations gets interrupted.

The problem found on 3.1.2.qa2 and 3.1.2qa3.

Presently, the test case was
1. Dist-replicate volume with rdma as transport,
2. mount it on a client using gNFS.
3. on the server set the option like,
volume set repdist diagnostics.brick-log-level DEBUG
4. start the iozone on the nfs mount
5 change the option to TRACE
6. Iozone fails,

Also, to mention this is not just related to rdma as transport even for tcp it fails in similar fashion for "distribute" volume as well.

Also, I have other ways also like running touch command to create 10000 files, the files were created but input/output error was displayed on the screen of client while volume options were changed on server. Even I have tried running some operation when there was no log-level set and operation failed eventually while playing with volume options.

The problem is not seen on fuse mount.

Logs of one of the failure are here,
gluster> volume set repdist diagnostics.brick-log-level DEBUG
Set volume successful
gluster> volume set repdist diagnostics.brick-log-level TRACE
Set volume successful
[saurabh@client10 nfs-test]$ time sudo /opt/qa/tools/Iozone -i 0 -i 1 -i 2 -i 3 -i 4 -i 5 -i 6 -i 7 -i 8 -i 9 -i 10 -i 11 -i 12 -s 2g -r 22k
Password:
Iozone: Performance Test of File I/O
Version $Revision: 3.326 $
Compiled for 64 bit mode.
Build: linux

Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
Al Slater, Scott Rhine, Mike Wisner, Ken Goss
Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy,
Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root.

Run began: Thu Dec 30 23:21:02 2010

Selected test not available on the version.
File size set to 2097152 KB
Record Size 22 KB
Command line used: /opt/qa/tools/Iozone -i 0 -i 1 -i 2 -i 3 -i 4 -i 5 -i 6 -i 7 -i 8 -i 9 -i 10 -i 11 -i 12 -s 2g -r 22k
Output is in Kbytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
random random bkwd record stride
KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread
2097152 22
Error writing block 38266, fd= 3
write: No such file or directory

iozone: interrupted

exiting iozone

real 1m10.276s
user 0m0.039s
sys 0m1.462s

TCP configurations:-
2 centos servers, with Distribute volume on them
1 ubuntu client.

RDMA configuration:
2 Centos server, with dist-rep volume on them
1 centos client.

Please, let me know if one needs more information.

-Saurabh

Comment 1 Anush Shetty 2011-01-04 01:52:04 UTC

I think this is a problem with nfs server getting restarted even when the log-level is being changed. NFS server shouldn't be restarted for log-level change commands


[2011-01-04 09:15:52.212992] I [glusterfsd-mgmt.c:59:mgmt_cbk_spec] mgmt: Volume file changed
[2011-01-04 09:15:52.223008] I [server.c:428:server_rpc_notify] test-server: disconnected connection from 127.0.0.1:1016
[2011-01-04 09:15:52.223079] I [server-helpers.c:670:server_connection_destroy] test-server: destroyed connection of pitta-2929-2011/01/04-09:14:51:753085-test-client-1
[2011-01-04 09:15:53.239786] I [xlator.c:1279:is_gf_log_command] glusterfs: setting log level to 8 (old-value=7)
[2011-01-04 09:15:53.239824] D [io-stats.c:1599:reconfigure] /mnt/s2: changing log-level to DEBUG
[2011-01-04 09:15:53.239846] D [xlator.c:974:xlator_reconfigure_rec] /mnt/s2: reconfigured
[2011-01-04 09:15:53.239895] D [server.c:619:reconfigure] : returning 0
[2011-01-04 09:15:53.239916] D [glusterfsd-mgmt.c:373:mgmt_getspec_cbk] glusterfsd-mgmt: No need to re-load volfile, reconfigure done

Comment 2 Shehjar Tikoo 2011-02-09 07:30:37 UTC

For an explanation of why we cant fix this atm, see point (b) in comment 4 for:

http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2161#c4

Comment 3 Shehjar Tikoo 2011-03-16 06:00:45 UTC

Ok, this needs to be fixed. I was being stupid when I set the status to resolved.
Assigning to Kaushik since this needs a change in the command line code for nfs.

Comment 4 Vijay Bellur 2011-09-08 03:50:46 UTC

we need Duplicate Reply cache support in RPC.

Comment 5 kaushik 2011-10-17 08:29:37 UTC

Krishna is looking into this issue.

*** This bug has been marked as a duplicate of bug 3725 ***

Comment 6 Krishna Srinivas 2011-10-17 08:47:52 UTC

(In reply to comment #5)
> Krishna is looking into this issue.
> 
> *** This bug has been marked as a duplicate of bug 765457 ***

Koushik, this is a different but which needs duplicate cache reply in RPC (as mentioned above by vijay)

Comment 7 Kaushal 2012-04-27 05:18:11 UTC

Closing as wontfix. File a new bug if this is required.