Created attachment 1372733 [details] Test script and log for this issue Description of problem: write 2 long variable and other text into a new file which store in glusterfs filesystem, and then read and check whether the value is just wrote or not, after try about 100 times into different new files, it easy to reproduce that the read value is not correct like just wrote. if test the same procedure in EXT4 filesytem, the issue will not be exist. Version-Release number of selected component (if applicable): 3.12.3 How reproducible: Steps to Reproduce: 1. create a gluster volume named "test", and mount to "/mnt/test" for this volume. (we use replicate mode , have SN-0, SN-1 2 SN node, and a Client Node) 2. copy a shell script _loop.sh and file_test.c in attachment 3. compile file_test.c with binary output "file_test" (could use command "make file_test") 4. run _loop.sh , format is _loop.sh <try_times> <save_to_glusterwhere> Actual results: /mnt/test is mounted from gluster storage. [root@mn-0:/home/robot] # ./_loop.sh 100 /mnt/test/test__1 Loop completed 100 times without errors. [root@mn-0:/home/robot] # ./_loop.sh 100 /mnt/test/test__2 Read values do not match with written data: -2064763152 and -1074501584 Error 6 returned on loop 3 [root@mn-0:/home/robot] # ./_loop.sh 100 /mnt/test/test__3 Read values do not match with written data: -24283408 and -1440483296 Error 6 returned on loop 39 [root@mn-0:/home/robot] # ./_loop.sh 100 /mnt/test/test__4 Loop completed 100 times without errors. [root@mn-0:/home/robot] # ./_loop.sh 100 /mnt/test/test__5 Read values do not match with written data: 438253296 and -2074572768 Error 6 returned on loop 34 [root@mn-0:/home/robot] # ./_loop.sh 100 /mnt/test/test__6 Read values do not match with written data: -972917008 and -231886816 Error 6 returned on loop 3 [root@mn-0:/home/robot] # ./_loop.sh 100 /mnt/test/test__7 Read values do not match with written data: -434657552 and 906727456 Error 6 returned on loop 6 Expected results: Expect the test result like run the same command in ext4 ("/mnt/tmp" is EXT4 filesystem) [root@mn-0:/home/robot] # ./_loop.sh 100 /mnt/tmp/test__1 Loop completed 100 times without errors. [root@mn-0:/home/robot] # ./_loop.sh 100 /mnt/tmp/test__2 Loop completed 100 times without errors. [root@mn-0:/home/robot] # ./_loop.sh 100 /mnt/tmp/test__3 Loop completed 100 times without errors. [root@mn-0:/home/robot] # ./_loop.sh 100 /mnt/tmp/test__4 Loop completed 100 times without errors. [root@mn-0:/home/robot] # ./_loop.sh 100 /mnt/tmp/test__5 Loop completed 100 times without errors. [root@mn-0:/home/robot] # ./_loop.sh 100 /mnt/tmp/test__6 Loop completed 100 times without errors. [root@mn-0:/home/robot] # ./_loop.sh 100 /mnt/tmp/test__7 Loop completed 100 times without errors. [root@mn-0:/home/robot] Additional info: with the below performance related translator disabled , the issue still is exist: gluster volume set test stat-prefetch off gluster volume set test read-ahead off gluster volume set test write-behind off gluster volume set test io-cache off gluster volume set test quick-read off gluster volume set test open-behind off # gluster v info test Volume Name: test Type: Replicate Volume ID: 2f602a72-2836-45c5-a14f-aaedab65f95d Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: sn-0.local:/mnt/bricks/test/brick Brick2: sn-1.local:/mnt/bricks/test/brick Options Reconfigured: diagnostics.brick-log-level: TRACE diagnostics.client-log-level: TRACE performance.open-behind: off performance.quick-read: off performance.io-cache: off performance.write-behind: off performance.read-ahead: off performance.stat-prefetch: off transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.server-quorum-ratio: 51% I also attached the debug log when run the test FYI, and attach some I think useful log abstract from the attachment below FYI: [2017-12-27 06:21:04.332524] T [MSGID: 0] [afr-inode-read.c:286:afr_fstat_cbk] 12-stack-trace: stack-address: 0x7fb5ec010680, test-replicate-0 returned 0 [2017-12-27 06:21:04.332724] T [MSGID: 0] [syncop.c:1715:syncop_fgetxattr] 12-stack-trace: stack-address: 0x7fb5fc0618d0, winding from test-dht to test-replicate-0 [2017-12-27 06:21:04.332768] D [MSGID: 0] [afr-read-txn.c:220:afr_read_txn] 12-test-replicate-0: 70e0f7ee-2f0a-4364-a38e-96e7031eee7b: generation now vs cached: 2, 2 [2017-12-27 06:21:04.332791] T [MSGID: 0] [afr-inode-read.c:1728:afr_fgetxattr_wind] 12-stack-trace: stack-address: 0x7fb5fc0618d0, winding from test-replicate-0 to test-client-0 [2017-12-27 06:21:04.332817] T [rpc-clnt.c:1496:rpc_clnt_record] 12-test-client-0: Auth Info: pid: 5257, uid: 0, gid: 0, owner: 0000000000000000 [2017-12-27 06:21:04.332837] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 12-rpc-clnt: Request fraglen 140, payload: 64, rpc hdr: 76 [2017-12-27 06:21:04.332862] T [rpc-clnt.c:1699:rpc_clnt_submit] 12-rpc-clnt: submitted request (XID: 0x2681 Program: GlusterFS 3.3, ProgVers: 330, Proc: 35) to rpc-transport (test-client-0) [2017-12-27 06:21:04.333943] T [rpc-clnt.c:675:rpc_clnt_reply_init] 12-test-client-0: received rpc message (RPC XID: 0x2681 Program: GlusterFS 3.3, ProgVers: 330, Proc: 35) from rpc-transport (test-client-0) [2017-12-27 06:21:04.334013] D [MSGID: 0] [client-rpc-fops.c:1143:client3_3_fgetxattr_cbk] 12-test-client-0: remote operation failed: No data available [2017-12-27 06:21:04.334036] D [MSGID: 0] [client-rpc-fops.c:1151:client3_3_fgetxattr_cbk] 12-stack-trace: stack-address: 0x7fb5fc0618d0, test-client-0 returned -1 error: No data available [No data available] [2017-12-27 06:21:04.334114] T [MSGID: 0] [afr-common.c:1306:afr_inode_refresh_subvol_with_fstat] 12-stack-trace: stack-address: 0x7fb5fc0618d0, winding from test-replicate-0 to test-client-0 [2017-12-27 06:21:04.334174] T [rpc-clnt.c:1496:rpc_clnt_record] 12-test-client-0: Auth Info: pid: 5257, uid: 0, gid: 0, owner: 0000000000000000 [2017-12-27 06:21:04.334196] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 12-rpc-clnt: Request fraglen 336, payload: 260, rpc hdr: 76 [2017-12-27 06:21:04.334229] T [rpc-clnt.c:1699:rpc_clnt_submit] 12-rpc-clnt: submitted request (XID: 0x2682 Program: GlusterFS 3.3, ProgVers: 330, Proc: 25) to rpc-transport (test-client-0) [2017-12-27 06:21:04.334280] T [MSGID: 0] [afr-common.c:1306:afr_inode_refresh_subvol_with_fstat] 12-stack-trace: stack-address: 0x7fb5fc0618d0, winding from test-replicate-0 to test-client-1 [2017-12-27 06:21:04.334311] T [rpc-clnt.c:1496:rpc_clnt_record] 12-test-client-1: Auth Info: pid: 5257, uid: 0, gid: 0, owner: 0000000000000000 [2017-12-27 06:21:04.334330] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 12-rpc-clnt: Request fraglen 336, payload: 260, rpc hdr: 76 [2017-12-27 06:21:04.334354] T [rpc-clnt.c:1699:rpc_clnt_submit] 12-rpc-clnt: submitted request (XID: 0x252c Program: GlusterFS 3.3, ProgVers: 330, Proc: 25) to rpc-transport (test-client-1) [2017-12-27 06:21:04.334904] T [rpc-clnt.c:675:rpc_clnt_reply_init] 12-test-client-1: received rpc message (RPC XID: 0x252c Program: GlusterFS 3.3, ProgVers: 330, Proc: 25) from rpc-transport (test-client-1) [2017-12-27 06:21:04.334983] T [MSGID: 0] [client-rpc-fops.c:1462:client3_3_fstat_cbk] 12-stack-trace: stack-address: 0x7fb5fc0618d0, test-client-1 returned 0 [2017-12-27 06:21:04.335128] T [rpc-clnt.c:675:rpc_clnt_reply_init] 12-test-client-0: received rpc message (RPC XID: 0x2682 Program: GlusterFS 3.3, ProgVers: 330, Proc: 25) from rpc-transport (test-client-0) [2017-12-27 06:21:04.335165] T [MSGID: 0] [client-rpc-fops.c:1462:client3_3_fstat_cbk] 12-stack-trace: stack-address: 0x7fb5fc0618d0, test-client-0 returned 0 [2017-12-27 06:21:04.335193] T [MSGID: 0] [afr-inode-read.c:1728:afr_fgetxattr_wind] 12-stack-trace: stack-address: 0x7fb5fc0618d0, winding from test-replicate-0 to test-client-1 [2017-12-27 06:21:04.335224] T [rpc-clnt.c:1496:rpc_clnt_record] 12-test-client-1: Auth Info: pid: 5257, uid: 0, gid: 0, owner: 0000000000000000 [2017-12-27 06:21:04.335253] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 12-rpc-clnt: Request fraglen 140, payload: 64, rpc hdr: 76 [2017-12-27 06:21:04.335284] T [rpc-clnt.c:1699:rpc_clnt_submit] 12-rpc-clnt: submitted request (XID: 0x252d Program: GlusterFS 3.3, ProgVers: 330, Proc: 35) to rpc-transport (test-client-1) [2017-12-27 06:21:04.335870] T [rpc-clnt.c:675:rpc_clnt_reply_init] 12-test-client-1: received rpc message (RPC XID: 0x252d Program: GlusterFS 3.3, ProgVers: 330, Proc: 35) from rpc-transport (test-client-1) [2017-12-27 06:21:04.335902] D [MSGID: 0] [client-rpc-fops.c:1143:client3_3_fgetxattr_cbk] 12-test-client-1: remote operation failed: No data available [2017-12-27 06:21:04.335922] D [MSGID: 0] [client-rpc-fops.c:1151:client3_3_fgetxattr_cbk] 12-stack-trace: stack-address: 0x7fb5fc0618d0, test-client-1 returned -1 error: No data available [No data available] [2017-12-27 06:21:04.335950] D [MSGID: 0] [afr-inode-read.c:1720:afr_fgetxattr_wind] 12-stack-trace: stack-address: 0x7fb5fc0618d0, test-replicate-0 returned -1 error: No data available [No data available]
Sorry, some code error in test code while call system call open wrong in attachment: open(argv[1], O_CREAT|O_RDWR|S_IRUSR|S_IWUSR) should be : open(argv[1], O_CREAT|O_RDWR,S_IRUSR|S_IWUSR) But it still seems has some issue while the file access in client with root privilege, in the brick, the file could be read , but in the client, the file has no content at all while the really size is "120". in sn-0 storage brick, the file could be access: # hexdump testfile.39 0000000 3039 0000 0000 0000 d431 0000 0000 0000 0000010 6d2f 746e 742f 7365 2f74 5f61 325f 742f 0000020 7365 6674 6c69 2e65 3933 0000 0000 0000 0000030 0000 0000 0000 0000 0000 0000 0000 0000 * 0000070 0000 0000 7ffc 0000 0000078 [root@sn-0:/mnt/bricks/test/brick/a__2] But in mounted gluster fs in client, the file can't be read. [root@mn-0:/mnt/test/a__2] # hexdump testfile.39 [root@mn-0:/mnt/test/a__2] # echo $? 0 [root@mn-0:/mnt/test/a__2] # ls -l testfile.39 ---------T 1 root root 120 Dec 28 07:16 testfile.39 [root@mn-0:/mnt/test/a__2]
Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions.
This bug is moved to https://github.com/gluster/glusterfs/issues/966, and will be tracked there from now on. Visit GitHub issues URL for further details