+++ This bug was initially created as a clone of Bug #840394 +++ +++ This bug was initially created as a clone of Bug #833727 +++ +++ This bug was initially created as a clone of Bug #832305 +++ Description of problem: ------------------------ appending data continuously to a file using "echo" command on a fuse mount reports EIO after sometime. [06/15/12 - 01:47:31 root@ARF-Client1 ~]# ls -lh /mnt/gfsc1/output_file.20120615010422 -rw-r--r--. 1 root root 14M Jun 15 01:27 /mnt/gfsc1/output_file.20120615010422 Version-Release number of selected component (if applicable): ------------------------------------------------------------ 3.3.0qa45 How reproducible: ----------------- Often append_data.sh:- --------------- #!/bin/bash path=$1 start_time=$(date +%s) append=`date '+%Y%m%d%H%M%S'` for i in {1..100000};do echo -e "Date: `date`\n" >> $path/output_file.$append echo -e "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" >> $path/output_file.$append echo -e "Calender:\n" >> $path/output_file.$append echo -e "`cal`\n" >> $path/output_file.$append echo -e "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" >> $path/output_file.$append done finish_time=$(date +%s) echo "Time duration: $((finish_time - start_time)) secs." >> $path/output_file.$append Steps to Reproduce: ------------------- 1.Create a replicate volume (1x3) 2.Set "write-behind on" and "eager-lock on" on the volume 3.Start the volume 4.Create a fuse mount. 5.execute the script "append_data <mount_point>" . mount_point as argument to the script. 6.After sometime the script fails with Input/Output Error Actual results: --------------- [2012-06-15 01:27:51.461317] W [afr-self-heal-data.c:831:afr_lookup_select_read_child_by_txn_type] 0-vol-replicate-0: /output_file.20120615010422: Possible split-brain [2012-06-15 01:27:51.461524] W [afr-common.c:1226:afr_detect_self_heal_by_lookup_status] 0-vol-replicate-0: split brain detected during lookup of /output_file.20120615010422. [2012-06-15 01:27:51.461736] W [afr-open.c:213:afr_open] 0-vol-replicate-0: failed to open as split brain seen, returning EIO [2012-06-15 01:27:51.461972] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 2565915: OPEN() /output_file.20120615010422 => -1 (Input/output error) [2012-06-15 01:27:51.462492] W [afr-open.c:213:afr_open] 0-vol-replicate-0: failed to open as split brain seen, returning EIO [2012-06-15 01:27:51.462560] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 2565916: OPEN() /output_file.20120615010422 => -1 (Input/output error) [2012-06-15 01:27:51.465513] W [afr-open.c:213:afr_open] 0-vol-replicate-0: failed to open as split brain seen, returning EIO [2012-06-15 01:27:51.465558] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 2565917: OPEN() /output_file.20120615010422 => -1 (Input/output error) [2012-06-15 01:27:51.466011] W [afr-open.c:213:afr_open] 0-vol-replicate-0: failed to open as split brain seen, returning EIO [2012-06-15 01:27:51.466056] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 2565918: OPEN() /output_file.20120615010422 => -1 (Input/output error) [2012-06-15 01:27:51.468327] W [afr-open.c:213:afr_open] 0-vol-replicate-0: failed to open as split brain seen, returning EIO Output of stat on file after the script failure: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [06/15/12 - 01:40:28 root@ARF-Client1 ~]# stat /mnt/gfsc1/output_file.20120615010422 File: `/mnt/gfsc1/output_file.20120615010422' Size: 13812273 Blocks: 32776 IO Block: 131072 regular file Device: 15h/21d Inode: 9581628177895579223 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2012-06-15 01:04:27.966005057 -0400 Modify: 2012-06-15 01:27:51.459004464 -0400 Change: 2012-06-15 01:27:51.460121389 -0400 Fuse Mount log when executed stat:- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [2012-06-15 01:47:31.337301] W [afr-common.c:1226:afr_detect_self_heal_by_lookup_status] 0-vol-replicate-0: split brain detected during lookup of /output_file.20120615010422. [2012-06-15 01:47:31.337387] I [afr-common.c:1340:afr_launch_self_heal] 0-vol-replicate-0: background data gfid self-heal triggered. path: /output_file.20120615010422, reason: lookup detected pending operations [2012-06-15 01:47:31.339249] I [afr-self-heal-common.c:1189:afr_sh_missing_entry_call_impunge_recreate] 0-vol-replicate-0: no missing files - /output_file.20120615010422. proceeding to metadata check [2012-06-15 01:47:31.340112] I [afr-self-heal-common.c:994:afr_sh_missing_entries_done] 0-vol-replicate-0: split brain found, aborting selfheal of /output_file.20120615010422 [2012-06-15 01:47:31.340185] E [afr-self-heal-common.c:2156:afr_self_heal_completion_cbk] 0-vol-replicate-0: background data gfid self-heal failed on /output_file.20120615010422 Xattrs of file after split-brain reported:- ------------------------------------------ brick3: ~~~~~~~ [06/15/12 - 01:41:18 root@AFR-Server3 ~]# getfattr -d -m. -ehex /export_b1/dir1/output_file.20120615010422 getfattr: Removing leading '/' from absolute path names # file: export_b1/dir1/output_file.20120615010422 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.vol-client-0=0x000000000000000000000000 trusted.afr.vol-client-1=0x000000000000000000000000 trusted.afr.vol-client-2=0x000000000000000000000000 trusted.gfid=0x8438a813ad7646c684f8c80fe956de57 brick2: ~~~~~~~~ [06/15/12 - 01:41:13 root@AFR-Server2 ~]# getfattr -d -m. -ehex /export_b1/dir1/output_file.20120615010422 getfattr: Removing leading '/' from absolute path names # file: export_b1/dir1/output_file.20120615010422 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.vol-client-0=0x000000000000000000000000 trusted.afr.vol-client-1=0x000000000000000000000000 trusted.afr.vol-client-2=0x000000000000000000000000 trusted.gfid=0x8438a813ad7646c684f8c80fe956de57 brick1: ~~~~~~~~ [06/15/12 - 01:41:03 root@AFR-Server1 ~]# getfattr -d -m. -ehex /export_b1/dir1/output_file.20120615010422 getfattr: Removing leading '/' from absolute path names # file: export_b1/dir1/output_file.20120615010422 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.vol-client-0=0x000000000000000000000000 trusted.afr.vol-client-1=0x000000000000000000000000 trusted.afr.vol-client-2=0x000000000000000000000000 trusted.gfid=0x8438a813ad7646c684f8c80fe956de57 --- Additional comment from shwetha.h.panduranga on 2012-06-15 02:16:48 EDT --- Created attachment 592039 [details] glusterfs logs, history of commands executed on storage_node1 and fuse mount log --- Additional comment from shwetha.h.panduranga on 2012-06-15 02:31:19 EDT --- md5sum on brick1:- ----------------- [06/15/12 - 02:28:59 root@AFR-Server1 ~]# arequal-checksum -p /export_b1/dir1/output_file.20120615010422 Entry counts Regular files : 1 Directories : 0 Symbolic links : 0 Other : 0 Total : 1 Metadata checksums Regular files : 486e85 Directories : 3e9 Symbolic links : 3e9 Other : 3e9 Checksums Regular files : b05e703fbeac795f934ec0aa043387c9 Directories : 0 Symbolic links : 0 Other : 0 Total : 2310b095ba9ffe96 md5sum on brick2:- ------------------ [06/15/12 - 02:29:04 root@AFR-Server2 ~]# arequal-checksum -p /export_b1/dir1/output_file.20120615010422 Entry counts Regular files : 1 Directories : 0 Symbolic links : 0 Other : 0 Total : 1 Metadata checksums Regular files : 486e85 Directories : 3e9 Symbolic links : 3e9 Other : 3e9 Checksums Regular files : b05e703fbeac795f934ec0aa043387c9 Directories : 0 Symbolic links : 0 Other : 0 Total : 2310b095ba9ffe96 md5sum on brick3:- ------------------ [06/15/12 - 02:30:48 root@AFR-Server3 ~]# arequal-checksum -p /export_b1/dir1/output_file.20120615010422 Entry counts Regular files : 1 Directories : 0 Symbolic links : 0 Other : 0 Total : 1 Metadata checksums Regular files : 486e85 Directories : 3e9 Symbolic links : 3e9 Other : 3e9 Checksums Regular files : b05e703fbeac795f934ec0aa043387c9 Directories : 0 Symbolic links : 0 Other : 0 Total : 2310b095ba9ffe96 --- Additional comment from shwetha.h.panduranga on 2012-06-22 02:52:58 EDT --- There are 2 parts to the bug. Case1) For some reason split-brain flag is set, we haven't figured out why that happened. Case2) Once the flag is set, there is no way to reset the flag. This fixes the second part. This bug is tracking Case1 Bug 832305 tracks Case2 and Case2 is already resolved. --- Additional comment from aavati on 2012-06-22 13:31:17 EDT --- The patch which was committed has removed the setting into inode context altogether. We can close this bug as it is not applicable anymore. --- Additional comment from pkarampu on 2012-06-26 01:43:51 EDT --- There is still one place where the decision to set split-brain flag is taken outside inodelk. Re-opening the bug. --- Additional comment from vinaraya on 2012-07-16 04:57:42 EDT --- Need this fixed in 2.0.z
Verified this bug on build: [root@flea:/mnt/vol_rep_2_mount1] Mar-04-2013 15:13:52 $ glusterfs --version glusterfs 3.3.0.5rhs built on Feb 19 2013 05:31:58 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. [root@flea:/mnt/vol_rep_2_mount1] Mar-04-2013 15:13:59 $ rpm -qa | grep gluster glusterfs-3.3.0.5rhs-43.el6rhs.x86_64 glusterfs-fuse-3.3.0.5rhs-43.el6rhs.x86_64