This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 991009 - AFR: changelogs not cleared even after successful writes
AFR: changelogs not cleared even after successful writes
Status: CLOSED EOL
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate (Show other bugs)
2.1
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Anuradha
spandura
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-01 07:52 EDT by spandura
Modified: 2016-09-19 22:00 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-03 12:11:01 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description spandura 2013-08-01 07:52:14 EDT
Description of problem:
=======================
In a replicate volume ( 1 x 2 ) a brick is replaced by bringing the brick process offline, un-mounting , formatting , remounting the brick directory and bringing the brick online. "heal full" is triggered on the volume to self-heal the files/dirs. Heal is successfully completed. 

From mount point when we try to write data to a file , the  write succeeds on 
both the bricks. But the changelogs for the brick which was replaced is not cleared after the write op. 

This bug is found while testing the bug 853684

Version-Release number of selected component (if applicable):
===============================================================
root@king [Aug-01-2013-16:29:33] >rpm -qa | grep glusterfs-server
glusterfs-server-3.4.0.14rhs-1.el6rhs.x86_64

root@king [Aug-01-2013-16:29:39] >gluster --version
glusterfs 3.4.0.14rhs built on Jul 30 2013 09:09:36

How reproducible:
==================
Often

Steps to Reproduce:
===================
1. Create replica volume 1 x 2

2. Start the volume

3. Create a fuse mount

4. From fuse mount execute : "exec 5>>test_file" ( to close the fd use : exec 5>>&- ) 

5. Kill all gluster process on storage_node1 (killall glusterfs glusterfsd glusterd)

6. Get the extended attribute of the brick1 directory on storage_node1 (getfattr -d -e hex -m . <path_to_brick1>)

7. Remove the brick1 directory on storage_node1(rm -rf <path_to_brick1>)

8. Create the brick1 directory on storage_node1(mkdir <path_to_brick1>)

9. Set the extended attribute "trusted.glusterfs.volume-id" to the value captured at step 7 for the brick1 on storage_node1. 

10. Start glusterd on storage_node1. (service glusterd start)

11. Execute: "gluster volume heal <volume_name> full" from any of the storage_node. This will self-heal the file "test_file" from brick0 to brick1

12. From mount point execute: for i in `seq 1 10` ; do echo "Hello World" >&5" ; done

13. Check if the data is written on both the bricks (cat test_file on both brick0 and brick1. It should have 10 entries of "Hello World")

14. Check the extended attributes of the file on both the bricks. 

Actual result:-
===============
The changelog of the brick1 are not cleared after the writes. 

storage_node1:
===============
root@king [Aug-01-2013-16:50:27] >getfattr -d -e hex -m . /rhs/bricks/b0/test_file 
getfattr: Removing leading '/' from absolute path names
# file: rhs/bricks/b0/test_file
trusted.afr.vol_rep-client-0=0x000000000000000000000000
trusted.afr.vol_rep-client-1=0x000000010000000000000000
trusted.gfid=0x23473c17877643f49ee39a26e3a6c982


storage_node2:
=============
root@hicks [Aug-01-2013-16:49:54] >getfattr -d -e hex -m . /rhs/bricks/b1/test_file 
getfattr: Removing leading '/' from absolute path names
# file: rhs/bricks/b1/test_file
trusted.afr.vol_rep-client-0=0x000000000000000000000000
trusted.afr.vol_rep-client-1=0x000000010000000000000000
trusted.gfid=0x23473c17877643f49ee39a26e3a6c982

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
After performing more writes on test_file from mount the following is the extended attributes of the file: "for i in `seq 1 310`; do echo "Hello World" >&5 ;  sleep1 ; done
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Storage_node1:
================
root@king [Aug-01-2013-16:52:40] >getfattr -d -e hex -m . /rhs/bricks/b0/test_file 
getfattr: Removing leading '/' from absolute path names
# file: rhs/bricks/b0/test_file
trusted.afr.vol_rep-client-0=0x000000000000000000000000
trusted.afr.vol_rep-client-1=0x000001480000000000000000
trusted.gfid=0x23473c17877643f49ee39a26e3a6c982

Storage_node2:
=================
root@hicks [Aug-01-2013-16:52:34] >getfattr -d -e hex -m . /rhs/bricks/b1/test_file 
getfattr: Removing leading '/' from absolute path names
# file: rhs/bricks/b1/test_file
trusted.afr.vol_rep-client-0=0x000000000000000000000000
trusted.afr.vol_rep-client-1=0x000001480000000000000000
trusted.gfid=0x23473c17877643f49ee39a26e3a6c982

Expected results:
================
The changelogs should be cleared after the writes are successful.

Additional info:
================

root@king [Aug-01-2013-17:13:03] >gluster v info
 
Volume Name: vol_rep
Type: Replicate
Volume ID: c449b61f-f57d-4114-ac22-777d9d7f8e44
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: king:/rhs/bricks/b0
Brick2: hicks:/rhs/bricks/b1
Options Reconfigured:
cluster.self-heal-daemon: on
Comment 2 Pranith Kumar K 2013-08-01 13:38:56 EDT
Shwetha,
      Could you attach the logs please. I guess the changelogs are appearing because fsyncs would fail. protocol/client uses anonymous fds to perform writes/fxattrop until the actual fd is re-opened whereas fsyncs are done using the fd that is yet to be re-opened (EBADFD). Using anonymous fds to perform fsyncs just like we do it for write/fxattrop does not work because the files are opened/closed on the brick without mount process' knowledge (I am waiting for response from Avati for confirmation about this fact). It is correct behavior to have the pending xattrs if fsyncs fail because there is a chance of data not reaching disk. Lets take a decision on if this should be treated as bug or not based on Avati's response. Sorry I couldn't find the reason for the dirty xattrs at the time we were debugging, it didn't occur to me then. IMO this bug is *NOT* a severe bug. I guess functionality worked fine, didn't it? 

Pranith.
Comment 3 Pranith Kumar K 2013-08-02 04:19:29 EDT
Shwetha,
     Avati updated that it can be done. Man page of fsync says the following:
"       fsync() transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device)
       so that all changed information can be retrieved even after the system crashed or was rebooted.  This includes writing through or flushing a disk cache if present.  The call blocks  until  the  device
       reports that the transfer has completed.  It also flushes metadata information associated with the file (see stat(2))."

So it does not matter which fd of the inode does fsync for the data to be written.

Pranith.
Comment 6 Vivek Agarwal 2015-12-03 12:11:01 EST
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.

Note You need to log in before you can comment on or make changes to this bug.