Bug 991009

Summary: AFR: changelogs not cleared even after successful writes
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: spandura
Component: replicateAssignee: Anuradha <atalur>
Status: CLOSED EOL QA Contact: spandura
Severity: medium Docs Contact:
Priority: medium    
Version: 2.1CC: nsathyan, pkarampu, rhs-bugs, smohan, storage-qa-internal, surs, vagarwal, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-03 17:11:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description spandura 2013-08-01 11:52:14 UTC
Description of problem:
=======================
In a replicate volume ( 1 x 2 ) a brick is replaced by bringing the brick process offline, un-mounting , formatting , remounting the brick directory and bringing the brick online. "heal full" is triggered on the volume to self-heal the files/dirs. Heal is successfully completed. 

From mount point when we try to write data to a file , the  write succeeds on 
both the bricks. But the changelogs for the brick which was replaced is not cleared after the write op. 

This bug is found while testing the bug 853684

Version-Release number of selected component (if applicable):
===============================================================
root@king [Aug-01-2013-16:29:33] >rpm -qa | grep glusterfs-server
glusterfs-server-3.4.0.14rhs-1.el6rhs.x86_64

root@king [Aug-01-2013-16:29:39] >gluster --version
glusterfs 3.4.0.14rhs built on Jul 30 2013 09:09:36

How reproducible:
==================
Often

Steps to Reproduce:
===================
1. Create replica volume 1 x 2

2. Start the volume

3. Create a fuse mount

4. From fuse mount execute : "exec 5>>test_file" ( to close the fd use : exec 5>>&- ) 

5. Kill all gluster process on storage_node1 (killall glusterfs glusterfsd glusterd)

6. Get the extended attribute of the brick1 directory on storage_node1 (getfattr -d -e hex -m . <path_to_brick1>)

7. Remove the brick1 directory on storage_node1(rm -rf <path_to_brick1>)

8. Create the brick1 directory on storage_node1(mkdir <path_to_brick1>)

9. Set the extended attribute "trusted.glusterfs.volume-id" to the value captured at step 7 for the brick1 on storage_node1. 

10. Start glusterd on storage_node1. (service glusterd start)

11. Execute: "gluster volume heal <volume_name> full" from any of the storage_node. This will self-heal the file "test_file" from brick0 to brick1

12. From mount point execute: for i in `seq 1 10` ; do echo "Hello World" >&5" ; done

13. Check if the data is written on both the bricks (cat test_file on both brick0 and brick1. It should have 10 entries of "Hello World")

14. Check the extended attributes of the file on both the bricks. 

Actual result:-
===============
The changelog of the brick1 are not cleared after the writes. 

storage_node1:
===============
root@king [Aug-01-2013-16:50:27] >getfattr -d -e hex -m . /rhs/bricks/b0/test_file 
getfattr: Removing leading '/' from absolute path names
# file: rhs/bricks/b0/test_file
trusted.afr.vol_rep-client-0=0x000000000000000000000000
trusted.afr.vol_rep-client-1=0x000000010000000000000000
trusted.gfid=0x23473c17877643f49ee39a26e3a6c982


storage_node2:
=============
root@hicks [Aug-01-2013-16:49:54] >getfattr -d -e hex -m . /rhs/bricks/b1/test_file 
getfattr: Removing leading '/' from absolute path names
# file: rhs/bricks/b1/test_file
trusted.afr.vol_rep-client-0=0x000000000000000000000000
trusted.afr.vol_rep-client-1=0x000000010000000000000000
trusted.gfid=0x23473c17877643f49ee39a26e3a6c982

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
After performing more writes on test_file from mount the following is the extended attributes of the file: "for i in `seq 1 310`; do echo "Hello World" >&5 ;  sleep1 ; done
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Storage_node1:
================
root@king [Aug-01-2013-16:52:40] >getfattr -d -e hex -m . /rhs/bricks/b0/test_file 
getfattr: Removing leading '/' from absolute path names
# file: rhs/bricks/b0/test_file
trusted.afr.vol_rep-client-0=0x000000000000000000000000
trusted.afr.vol_rep-client-1=0x000001480000000000000000
trusted.gfid=0x23473c17877643f49ee39a26e3a6c982

Storage_node2:
=================
root@hicks [Aug-01-2013-16:52:34] >getfattr -d -e hex -m . /rhs/bricks/b1/test_file 
getfattr: Removing leading '/' from absolute path names
# file: rhs/bricks/b1/test_file
trusted.afr.vol_rep-client-0=0x000000000000000000000000
trusted.afr.vol_rep-client-1=0x000001480000000000000000
trusted.gfid=0x23473c17877643f49ee39a26e3a6c982

Expected results:
================
The changelogs should be cleared after the writes are successful.

Additional info:
================

root@king [Aug-01-2013-17:13:03] >gluster v info
 
Volume Name: vol_rep
Type: Replicate
Volume ID: c449b61f-f57d-4114-ac22-777d9d7f8e44
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: king:/rhs/bricks/b0
Brick2: hicks:/rhs/bricks/b1
Options Reconfigured:
cluster.self-heal-daemon: on

Comment 2 Pranith Kumar K 2013-08-01 17:38:56 UTC
Shwetha,
      Could you attach the logs please. I guess the changelogs are appearing because fsyncs would fail. protocol/client uses anonymous fds to perform writes/fxattrop until the actual fd is re-opened whereas fsyncs are done using the fd that is yet to be re-opened (EBADFD). Using anonymous fds to perform fsyncs just like we do it for write/fxattrop does not work because the files are opened/closed on the brick without mount process' knowledge (I am waiting for response from Avati for confirmation about this fact). It is correct behavior to have the pending xattrs if fsyncs fail because there is a chance of data not reaching disk. Lets take a decision on if this should be treated as bug or not based on Avati's response. Sorry I couldn't find the reason for the dirty xattrs at the time we were debugging, it didn't occur to me then. IMO this bug is *NOT* a severe bug. I guess functionality worked fine, didn't it? 

Pranith.

Comment 3 Pranith Kumar K 2013-08-02 08:19:29 UTC
Shwetha,
     Avati updated that it can be done. Man page of fsync says the following:
"       fsync() transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device)
       so that all changed information can be retrieved even after the system crashed or was rebooted.  This includes writing through or flushing a disk cache if present.  The call blocks  until  the  device
       reports that the transfer has completed.  It also flushes metadata information associated with the file (see stat(2))."

So it does not matter which fd of the inode does fsync for the data to be written.

Pranith.

Comment 6 Vivek Agarwal 2015-12-03 17:11:01 UTC
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.