Bug 840394 - appending data to a file on fuse mount fails with EIO
appending data to a file on fuse mount fails with EIO
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs (Show other bugs)
2.0
Unspecified Unspecified
urgent Severity urgent
: ---
: ---
Assigned To: Pranith Kumar K
Sudhir D
: Reopened
Depends On: 832305 833727
Blocks: 840820
  Show dependency treegraph
 
Reported: 2012-07-16 04:56 EDT by Vidya Sakar
Modified: 2013-03-03 21:06 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 833727
: 840820 (view as bug list)
Environment:
Last Closed: 2012-09-11 10:23:33 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Vidya Sakar 2012-07-16 04:56:50 EDT
+++ This bug was initially created as a clone of Bug #833727 +++

+++ This bug was initially created as a clone of Bug #832305 +++

Description of problem:
------------------------
appending data continuously to a file using "echo" command on a fuse mount  reports EIO after sometime. 

[06/15/12 - 01:47:31 root@ARF-Client1 ~]# ls -lh /mnt/gfsc1/output_file.20120615010422 
-rw-r--r--. 1 root root 14M Jun 15 01:27 /mnt/gfsc1/output_file.20120615010422


Version-Release number of selected component (if applicable):
------------------------------------------------------------
3.3.0qa45

How reproducible:
-----------------
Often

append_data.sh:-
---------------
#!/bin/bash

path=$1
start_time=$(date +%s)
append=`date '+%Y%m%d%H%M%S'`

for i in {1..100000};do
        echo -e "Date: `date`\n" >> $path/output_file.$append
        echo -e "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" >> $path/output_file.$append
	echo -e "Calender:\n" >> $path/output_file.$append
	echo -e "`cal`\n" >> $path/output_file.$append
	echo -e "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" >> $path/output_file.$append
done
finish_time=$(date +%s)
echo "Time duration: $((finish_time - start_time)) secs." >> $path/output_file.$append

Steps to Reproduce:
-------------------
1.Create a replicate volume (1x3)

2.Set "write-behind on" and "eager-lock on" on the volume

3.Start the volume

4.Create a fuse mount.

5.execute the script "append_data <mount_point>" . mount_point as argument to the script. 

6.After sometime the script fails with Input/Output Error

Actual results:
---------------
[2012-06-15 01:27:51.461317] W [afr-self-heal-data.c:831:afr_lookup_select_read_child_by_txn_type] 0-vol-replicate-0: /output_file.20120615010422: Possible split-brain
[2012-06-15 01:27:51.461524] W [afr-common.c:1226:afr_detect_self_heal_by_lookup_status] 0-vol-replicate-0: split brain detected during lookup of /output_file.20120615010422.
[2012-06-15 01:27:51.461736] W [afr-open.c:213:afr_open] 0-vol-replicate-0: failed to open as split brain seen, returning EIO
[2012-06-15 01:27:51.461972] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 2565915: OPEN() /output_file.20120615010422 => -1 (Input/output error)
[2012-06-15 01:27:51.462492] W [afr-open.c:213:afr_open] 0-vol-replicate-0: failed to open as split brain seen, returning EIO
[2012-06-15 01:27:51.462560] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 2565916: OPEN() /output_file.20120615010422 => -1 (Input/output error)
[2012-06-15 01:27:51.465513] W [afr-open.c:213:afr_open] 0-vol-replicate-0: failed to open as split brain seen, returning EIO
[2012-06-15 01:27:51.465558] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 2565917: OPEN() /output_file.20120615010422 => -1 (Input/output error)
[2012-06-15 01:27:51.466011] W [afr-open.c:213:afr_open] 0-vol-replicate-0: failed to open as split brain seen, returning EIO
[2012-06-15 01:27:51.466056] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 2565918: OPEN() /output_file.20120615010422 => -1 (Input/output error)
[2012-06-15 01:27:51.468327] W [afr-open.c:213:afr_open] 0-vol-replicate-0: failed to open as split brain seen, returning EIO

Output of stat on file after the script failure:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[06/15/12 - 01:40:28 root@ARF-Client1 ~]# stat /mnt/gfsc1/output_file.20120615010422
  File: `/mnt/gfsc1/output_file.20120615010422'
  Size: 13812273  	Blocks: 32776      IO Block: 131072 regular file
Device: 15h/21d	Inode: 9581628177895579223  Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2012-06-15 01:04:27.966005057 -0400
Modify: 2012-06-15 01:27:51.459004464 -0400
Change: 2012-06-15 01:27:51.460121389 -0400


Fuse Mount log when executed stat:-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[2012-06-15 01:47:31.337301] W [afr-common.c:1226:afr_detect_self_heal_by_lookup_status] 0-vol-replicate-0: split brain detected during lookup of /output_file.20120615010422.
[2012-06-15 01:47:31.337387] I [afr-common.c:1340:afr_launch_self_heal] 0-vol-replicate-0: background  data gfid self-heal triggered. path: /output_file.20120615010422, reason: lookup detected pending operations
[2012-06-15 01:47:31.339249] I [afr-self-heal-common.c:1189:afr_sh_missing_entry_call_impunge_recreate] 0-vol-replicate-0: no missing files - /output_file.20120615010422. proceeding to metadata check
[2012-06-15 01:47:31.340112] I [afr-self-heal-common.c:994:afr_sh_missing_entries_done] 0-vol-replicate-0: split brain found, aborting selfheal of /output_file.20120615010422
[2012-06-15 01:47:31.340185] E [afr-self-heal-common.c:2156:afr_self_heal_completion_cbk] 0-vol-replicate-0: background  data gfid self-heal failed on /output_file.20120615010422


Xattrs of file after split-brain reported:-
------------------------------------------
brick3:
~~~~~~~
[06/15/12 - 01:41:18 root@AFR-Server3 ~]# getfattr -d -m. -ehex /export_b1/dir1/output_file.20120615010422
getfattr: Removing leading '/' from absolute path names
# file: export_b1/dir1/output_file.20120615010422
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.vol-client-0=0x000000000000000000000000
trusted.afr.vol-client-1=0x000000000000000000000000
trusted.afr.vol-client-2=0x000000000000000000000000
trusted.gfid=0x8438a813ad7646c684f8c80fe956de57

brick2:
~~~~~~~~
[06/15/12 - 01:41:13 root@AFR-Server2 ~]# getfattr -d -m. -ehex /export_b1/dir1/output_file.20120615010422
getfattr: Removing leading '/' from absolute path names
# file: export_b1/dir1/output_file.20120615010422
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.vol-client-0=0x000000000000000000000000
trusted.afr.vol-client-1=0x000000000000000000000000
trusted.afr.vol-client-2=0x000000000000000000000000
trusted.gfid=0x8438a813ad7646c684f8c80fe956de57

brick1:
~~~~~~~~
[06/15/12 - 01:41:03 root@AFR-Server1 ~]# getfattr -d -m. -ehex /export_b1/dir1/output_file.20120615010422
getfattr: Removing leading '/' from absolute path names
# file: export_b1/dir1/output_file.20120615010422
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.vol-client-0=0x000000000000000000000000
trusted.afr.vol-client-1=0x000000000000000000000000
trusted.afr.vol-client-2=0x000000000000000000000000
trusted.gfid=0x8438a813ad7646c684f8c80fe956de57

--- Additional comment from shwetha.h.panduranga@redhat.com on 2012-06-15 02:16:48 EDT ---

Created attachment 592039 [details]
glusterfs logs, history of commands executed on storage_node1 and fuse mount log

--- Additional comment from shwetha.h.panduranga@redhat.com on 2012-06-15 02:31:19 EDT ---

md5sum on brick1:-
-----------------

[06/15/12 - 02:28:59 root@AFR-Server1 ~]# arequal-checksum -p /export_b1/dir1/output_file.20120615010422

Entry counts
Regular files   : 1
Directories     : 0
Symbolic links  : 0
Other           : 0
Total           : 1

Metadata checksums
Regular files   : 486e85
Directories     : 3e9
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : b05e703fbeac795f934ec0aa043387c9
Directories     : 0
Symbolic links  : 0
Other           : 0
Total           : 2310b095ba9ffe96

md5sum on brick2:-
------------------
[06/15/12 - 02:29:04 root@AFR-Server2 ~]# arequal-checksum -p /export_b1/dir1/output_file.20120615010422 

Entry counts
Regular files   : 1
Directories     : 0
Symbolic links  : 0
Other           : 0
Total           : 1

Metadata checksums
Regular files   : 486e85
Directories     : 3e9
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : b05e703fbeac795f934ec0aa043387c9
Directories     : 0
Symbolic links  : 0
Other           : 0
Total           : 2310b095ba9ffe96

md5sum on brick3:-
------------------
[06/15/12 - 02:30:48 root@AFR-Server3 ~]# arequal-checksum -p /export_b1/dir1/output_file.20120615010422 

Entry counts
Regular files   : 1
Directories     : 0
Symbolic links  : 0
Other           : 0
Total           : 1

Metadata checksums
Regular files   : 486e85
Directories     : 3e9
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : b05e703fbeac795f934ec0aa043387c9
Directories     : 0
Symbolic links  : 0
Other           : 0
Total           : 2310b095ba9ffe96

--- Additional comment from shwetha.h.panduranga@redhat.com on 2012-06-22 02:52:58 EDT ---

There are 2 parts to the bug. Case1) For some reason split-brain flag is set, we haven't figured out why that happened. Case2) Once the flag is set, there is no way to reset the flag. This fixes the second part. 

This bug is tracking Case1

Bug 832305 tracks Case2 and Case2 is already resolved.

--- Additional comment from aavati@redhat.com on 2012-06-22 13:31:17 EDT ---

The patch which was committed has removed the setting into inode context altogether. We can close this bug as it is not applicable anymore.

--- Additional comment from pkarampu@redhat.com on 2012-06-26 01:43:51 EDT ---

There is still one place where the decision to set split-brain flag is taken outside inodelk. Re-opening the bug.
Comment 1 Vidya Sakar 2012-07-16 04:57:42 EDT
Need this fixed in 2.0.z
Comment 2 spandura 2012-08-28 03:02:34 EDT
Executed the steps to recreate the bug and Verified the fix on glusterfs 3.3.0rhs built on Aug 17 2012 07:06:59.

The Bug is fixed.
Comment 4 errata-xmlrpc 2012-09-11 10:23:33 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-1253.html

Note You need to log in before you can comment on or make changes to this bug.