Hello, while testing for 3.1.2 and I figured that setting/changing the volume options create a problem on the nfs mount. If an operation is going on over nfs mount, the operation either just thows error for "Input/Output" or the operations gets interrupted. The problem found on 3.1.2.qa2 and 3.1.2qa3. Presently, the test case was 1. Dist-replicate volume with rdma as transport, 2. mount it on a client using gNFS. 3. on the server set the option like, volume set repdist diagnostics.brick-log-level DEBUG 4. start the iozone on the nfs mount 5 change the option to TRACE 6. Iozone fails, Also, to mention this is not just related to rdma as transport even for tcp it fails in similar fashion for "distribute" volume as well. Also, I have other ways also like running touch command to create 10000 files, the files were created but input/output error was displayed on the screen of client while volume options were changed on server. Even I have tried running some operation when there was no log-level set and operation failed eventually while playing with volume options. The problem is not seen on fuse mount. Logs of one of the failure are here, gluster> volume set repdist diagnostics.brick-log-level DEBUG Set volume successful gluster> volume set repdist diagnostics.brick-log-level TRACE Set volume successful [saurabh@client10 nfs-test]$ time sudo /opt/qa/tools/Iozone -i 0 -i 1 -i 2 -i 3 -i 4 -i 5 -i 6 -i 7 -i 8 -i 9 -i 10 -i 11 -i 12 -s 2g -r 22k Password: Iozone: Performance Test of File I/O Version $Revision: 3.326 $ Compiled for 64 bit mode. Build: linux Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins Al Slater, Scott Rhine, Mike Wisner, Ken Goss Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR, Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner, Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root. Run began: Thu Dec 30 23:21:02 2010 Selected test not available on the version. File size set to 2097152 KB Record Size 22 KB Command line used: /opt/qa/tools/Iozone -i 0 -i 1 -i 2 -i 3 -i 4 -i 5 -i 6 -i 7 -i 8 -i 9 -i 10 -i 11 -i 12 -s 2g -r 22k Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. random random bkwd record stride KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 2097152 22 Error writing block 38266, fd= 3 write: No such file or directory iozone: interrupted exiting iozone real 1m10.276s user 0m0.039s sys 0m1.462s TCP configurations:- 2 centos servers, with Distribute volume on them 1 ubuntu client. RDMA configuration: 2 Centos server, with dist-rep volume on them 1 centos client. Please, let me know if one needs more information. -Saurabh
I think this is a problem with nfs server getting restarted even when the log-level is being changed. NFS server shouldn't be restarted for log-level change commands [2011-01-04 09:15:52.212992] I [glusterfsd-mgmt.c:59:mgmt_cbk_spec] mgmt: Volume file changed [2011-01-04 09:15:52.223008] I [server.c:428:server_rpc_notify] test-server: disconnected connection from 127.0.0.1:1016 [2011-01-04 09:15:52.223079] I [server-helpers.c:670:server_connection_destroy] test-server: destroyed connection of pitta-2929-2011/01/04-09:14:51:753085-test-client-1 [2011-01-04 09:15:53.239786] I [xlator.c:1279:is_gf_log_command] glusterfs: setting log level to 8 (old-value=7) [2011-01-04 09:15:53.239824] D [io-stats.c:1599:reconfigure] /mnt/s2: changing log-level to DEBUG [2011-01-04 09:15:53.239846] D [xlator.c:974:xlator_reconfigure_rec] /mnt/s2: reconfigured [2011-01-04 09:15:53.239895] D [server.c:619:reconfigure] : returning 0 [2011-01-04 09:15:53.239916] D [glusterfsd-mgmt.c:373:mgmt_getspec_cbk] glusterfsd-mgmt: No need to re-load volfile, reconfigure done
For an explanation of why we cant fix this atm, see point (b) in comment 4 for: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2161#c4
Ok, this needs to be fixed. I was being stupid when I set the status to resolved. Assigning to Kaushik since this needs a change in the command line code for nfs.
we need Duplicate Reply cache support in RPC.
Krishna is looking into this issue. *** This bug has been marked as a duplicate of bug 3725 ***
(In reply to comment #5) > Krishna is looking into this issue. > > *** This bug has been marked as a duplicate of bug 765457 *** Koushik, this is a different but which needs duplicate cache reply in RPC (as mentioned above by vijay)
Closing as wontfix. File a new bug if this is required.