Bug 840820 - appending data to a file on fuse mount fails with EIO
appending data to a file on fuse mount fails with EIO
Status: CLOSED CURRENTRELEASE
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs (Show other bugs)
2.0
Unspecified Unspecified
high Severity urgent
: ---
: ---
Assigned To: Pranith Kumar K
spandura
: Reopened
Depends On: 832305 833727 840394
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-17 05:45 EDT by Scott Haines
Modified: 2015-08-10 03:47 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 840394
Environment:
Last Closed: 2015-08-10 03:47:27 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Scott Haines 2012-07-17 05:45:52 EDT
+++ This bug was initially created as a clone of Bug #840394 +++

+++ This bug was initially created as a clone of Bug #833727 +++

+++ This bug was initially created as a clone of Bug #832305 +++

Description of problem:
------------------------
appending data continuously to a file using "echo" command on a fuse mount  reports EIO after sometime. 

[06/15/12 - 01:47:31 root@ARF-Client1 ~]# ls -lh /mnt/gfsc1/output_file.20120615010422 
-rw-r--r--. 1 root root 14M Jun 15 01:27 /mnt/gfsc1/output_file.20120615010422


Version-Release number of selected component (if applicable):
------------------------------------------------------------
3.3.0qa45

How reproducible:
-----------------
Often

append_data.sh:-
---------------
#!/bin/bash

path=$1
start_time=$(date +%s)
append=`date '+%Y%m%d%H%M%S'`

for i in {1..100000};do
        echo -e "Date: `date`\n" >> $path/output_file.$append
        echo -e "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" >> $path/output_file.$append
	echo -e "Calender:\n" >> $path/output_file.$append
	echo -e "`cal`\n" >> $path/output_file.$append
	echo -e "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" >> $path/output_file.$append
done
finish_time=$(date +%s)
echo "Time duration: $((finish_time - start_time)) secs." >> $path/output_file.$append

Steps to Reproduce:
-------------------
1.Create a replicate volume (1x3)

2.Set "write-behind on" and "eager-lock on" on the volume

3.Start the volume

4.Create a fuse mount.

5.execute the script "append_data <mount_point>" . mount_point as argument to the script. 

6.After sometime the script fails with Input/Output Error

Actual results:
---------------
[2012-06-15 01:27:51.461317] W [afr-self-heal-data.c:831:afr_lookup_select_read_child_by_txn_type] 0-vol-replicate-0: /output_file.20120615010422: Possible split-brain
[2012-06-15 01:27:51.461524] W [afr-common.c:1226:afr_detect_self_heal_by_lookup_status] 0-vol-replicate-0: split brain detected during lookup of /output_file.20120615010422.
[2012-06-15 01:27:51.461736] W [afr-open.c:213:afr_open] 0-vol-replicate-0: failed to open as split brain seen, returning EIO
[2012-06-15 01:27:51.461972] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 2565915: OPEN() /output_file.20120615010422 => -1 (Input/output error)
[2012-06-15 01:27:51.462492] W [afr-open.c:213:afr_open] 0-vol-replicate-0: failed to open as split brain seen, returning EIO
[2012-06-15 01:27:51.462560] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 2565916: OPEN() /output_file.20120615010422 => -1 (Input/output error)
[2012-06-15 01:27:51.465513] W [afr-open.c:213:afr_open] 0-vol-replicate-0: failed to open as split brain seen, returning EIO
[2012-06-15 01:27:51.465558] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 2565917: OPEN() /output_file.20120615010422 => -1 (Input/output error)
[2012-06-15 01:27:51.466011] W [afr-open.c:213:afr_open] 0-vol-replicate-0: failed to open as split brain seen, returning EIO
[2012-06-15 01:27:51.466056] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 2565918: OPEN() /output_file.20120615010422 => -1 (Input/output error)
[2012-06-15 01:27:51.468327] W [afr-open.c:213:afr_open] 0-vol-replicate-0: failed to open as split brain seen, returning EIO

Output of stat on file after the script failure:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[06/15/12 - 01:40:28 root@ARF-Client1 ~]# stat /mnt/gfsc1/output_file.20120615010422
  File: `/mnt/gfsc1/output_file.20120615010422'
  Size: 13812273  	Blocks: 32776      IO Block: 131072 regular file
Device: 15h/21d	Inode: 9581628177895579223  Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2012-06-15 01:04:27.966005057 -0400
Modify: 2012-06-15 01:27:51.459004464 -0400
Change: 2012-06-15 01:27:51.460121389 -0400


Fuse Mount log when executed stat:-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[2012-06-15 01:47:31.337301] W [afr-common.c:1226:afr_detect_self_heal_by_lookup_status] 0-vol-replicate-0: split brain detected during lookup of /output_file.20120615010422.
[2012-06-15 01:47:31.337387] I [afr-common.c:1340:afr_launch_self_heal] 0-vol-replicate-0: background  data gfid self-heal triggered. path: /output_file.20120615010422, reason: lookup detected pending operations
[2012-06-15 01:47:31.339249] I [afr-self-heal-common.c:1189:afr_sh_missing_entry_call_impunge_recreate] 0-vol-replicate-0: no missing files - /output_file.20120615010422. proceeding to metadata check
[2012-06-15 01:47:31.340112] I [afr-self-heal-common.c:994:afr_sh_missing_entries_done] 0-vol-replicate-0: split brain found, aborting selfheal of /output_file.20120615010422
[2012-06-15 01:47:31.340185] E [afr-self-heal-common.c:2156:afr_self_heal_completion_cbk] 0-vol-replicate-0: background  data gfid self-heal failed on /output_file.20120615010422


Xattrs of file after split-brain reported:-
------------------------------------------
brick3:
~~~~~~~
[06/15/12 - 01:41:18 root@AFR-Server3 ~]# getfattr -d -m. -ehex /export_b1/dir1/output_file.20120615010422
getfattr: Removing leading '/' from absolute path names
# file: export_b1/dir1/output_file.20120615010422
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.vol-client-0=0x000000000000000000000000
trusted.afr.vol-client-1=0x000000000000000000000000
trusted.afr.vol-client-2=0x000000000000000000000000
trusted.gfid=0x8438a813ad7646c684f8c80fe956de57

brick2:
~~~~~~~~
[06/15/12 - 01:41:13 root@AFR-Server2 ~]# getfattr -d -m. -ehex /export_b1/dir1/output_file.20120615010422
getfattr: Removing leading '/' from absolute path names
# file: export_b1/dir1/output_file.20120615010422
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.vol-client-0=0x000000000000000000000000
trusted.afr.vol-client-1=0x000000000000000000000000
trusted.afr.vol-client-2=0x000000000000000000000000
trusted.gfid=0x8438a813ad7646c684f8c80fe956de57

brick1:
~~~~~~~~
[06/15/12 - 01:41:03 root@AFR-Server1 ~]# getfattr -d -m. -ehex /export_b1/dir1/output_file.20120615010422
getfattr: Removing leading '/' from absolute path names
# file: export_b1/dir1/output_file.20120615010422
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.vol-client-0=0x000000000000000000000000
trusted.afr.vol-client-1=0x000000000000000000000000
trusted.afr.vol-client-2=0x000000000000000000000000
trusted.gfid=0x8438a813ad7646c684f8c80fe956de57

--- Additional comment from shwetha.h.panduranga@redhat.com on 2012-06-15 02:16:48 EDT ---

Created attachment 592039 [details]
glusterfs logs, history of commands executed on storage_node1 and fuse mount log

--- Additional comment from shwetha.h.panduranga@redhat.com on 2012-06-15 02:31:19 EDT ---

md5sum on brick1:-
-----------------

[06/15/12 - 02:28:59 root@AFR-Server1 ~]# arequal-checksum -p /export_b1/dir1/output_file.20120615010422

Entry counts
Regular files   : 1
Directories     : 0
Symbolic links  : 0
Other           : 0
Total           : 1

Metadata checksums
Regular files   : 486e85
Directories     : 3e9
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : b05e703fbeac795f934ec0aa043387c9
Directories     : 0
Symbolic links  : 0
Other           : 0
Total           : 2310b095ba9ffe96

md5sum on brick2:-
------------------
[06/15/12 - 02:29:04 root@AFR-Server2 ~]# arequal-checksum -p /export_b1/dir1/output_file.20120615010422 

Entry counts
Regular files   : 1
Directories     : 0
Symbolic links  : 0
Other           : 0
Total           : 1

Metadata checksums
Regular files   : 486e85
Directories     : 3e9
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : b05e703fbeac795f934ec0aa043387c9
Directories     : 0
Symbolic links  : 0
Other           : 0
Total           : 2310b095ba9ffe96

md5sum on brick3:-
------------------
[06/15/12 - 02:30:48 root@AFR-Server3 ~]# arequal-checksum -p /export_b1/dir1/output_file.20120615010422 

Entry counts
Regular files   : 1
Directories     : 0
Symbolic links  : 0
Other           : 0
Total           : 1

Metadata checksums
Regular files   : 486e85
Directories     : 3e9
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : b05e703fbeac795f934ec0aa043387c9
Directories     : 0
Symbolic links  : 0
Other           : 0
Total           : 2310b095ba9ffe96

--- Additional comment from shwetha.h.panduranga@redhat.com on 2012-06-22 02:52:58 EDT ---

There are 2 parts to the bug. Case1) For some reason split-brain flag is set, we haven't figured out why that happened. Case2) Once the flag is set, there is no way to reset the flag. This fixes the second part. 

This bug is tracking Case1

Bug 832305 tracks Case2 and Case2 is already resolved.

--- Additional comment from aavati@redhat.com on 2012-06-22 13:31:17 EDT ---

The patch which was committed has removed the setting into inode context altogether. We can close this bug as it is not applicable anymore.

--- Additional comment from pkarampu@redhat.com on 2012-06-26 01:43:51 EDT ---

There is still one place where the decision to set split-brain flag is taken outside inodelk. Re-opening the bug.

--- Additional comment from vinaraya@redhat.com on 2012-07-16 04:57:42 EDT ---

Need this fixed in 2.0.z
Comment 3 spandura 2013-03-04 04:49:19 EST
Verified this bug on build: 


[root@flea:/mnt/vol_rep_2_mount1] Mar-04-2013 15:13:52 $ glusterfs --version
glusterfs 3.3.0.5rhs built on Feb 19 2013 05:31:58
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.

[root@flea:/mnt/vol_rep_2_mount1] Mar-04-2013 15:13:59 $ rpm -qa | grep gluster
glusterfs-3.3.0.5rhs-43.el6rhs.x86_64
glusterfs-fuse-3.3.0.5rhs-43.el6rhs.x86_64

Note You need to log in before you can comment on or make changes to this bug.