Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1361518 - Files not able to heal after arbiter and data bricks were rebooted
Files not able to heal after arbiter and data bricks were rebooted
Status: CLOSED WONTFIX
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: arbiter (Show other bugs)
3.1
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Ravishankar N
Karan Sandha
: Triaged, ZStream
Depends On: 1340032
Blocks: 1351530
  Show dependency treegraph
 
Reported: 2016-07-29 05:21 EDT by Pranith Kumar K
Modified: 2018-08-14 07:15 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Known Issue
Doc Text:
If a file create is wound to all bricks, and it succeeds only on arbiter, the application will get a failure. But during self-heal, the file gets created on the data bricks with arbiter marked as source. Since data self-heal can never happen from arbiter, 'heal-info' will list the entries forever. Workaround: If 'gluster vol heal <volname> info` shows the pending heals for a file forever, then check if the issue is the same as mentioned above by: i) checking that trusted.afr.volname-client* xattrs are zero on the data bricks ii)checking that trusted.afr.volname-client* xattrs is non-zero on the arbiter brick *only* for the data part (first 4 bytes) Example: #getfattr -d -m . -e hex /bricks/arbiterbrick/file |grep trusted.afr.testvol* getfattr: Removing leading '/' from absolute path names trusted.afr.testvol-client-0=0x000000540000000000000000 trusted.afr.testvol-client-1=0x000000540000000000000000 If it is in this state, then delete the xattr: #for i in $(getfattr -d -m . -e hex /bricks/arbiterbrick/file |grep trusted.afr.testvol*|cut -f1 -d'='); do setfattr -x $i file; done
Story Points: ---
Clone Of: 1340032
Environment:
Last Closed: 2018-04-16 14:16:27 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Pranith Kumar K 2016-07-29 05:21:00 EDT
+++ This bug was initially created as a clone of Bug #1340032 +++

Description of problem:
created script for 50 files 5mb each and during the creation rebooted 2 nodes arbiter and data brick. while one brick was alive.

Version-Release number of selected component (if applicable):


How reproducible:
1 time

Steps to Reproduce:
1.create a 1x3 volume -core(name) 
2. mounted on the client using fuse  at /mnt/core
3. ran this Script for 1 min.
for (( i=1;i<=50;i++ )) 
do 
dd if=/dev/urandom of=corefile$i bs=5M count=5 status=progress
done

4. rebooted the arbiter and one the data brick.
5. files 15 to 26 were only touched. No data was written. 0 byte file.
6. Files from arbiter and data weren't able to heal

[root@dhcp43-192 core]# gluster volume heal core info
Brick dhcp43-157.lab.eng.blr.redhat.com:/rhs/brick1/core
Status: Connected
Number of entries: 0

Brick dhcp43-192.lab.eng.blr.redhat.com:/rhs/brick1/core
/corefile16 
/corefile17 
/corefile18 
/corefile19 
/corefile20 
/corefile21 
/corefile22 
/corefile23 
/corefile24 
/corefile25 
/corefile26 
Status: Connected
Number of entries: 11

Brick dhcp43-153.lab.eng.blr.redhat.com:/rhs/brick1/core
/corefile16 
/corefile17 
/corefile18 
/corefile19 
/corefile20 
/corefile21 
/corefile22 
/corefile23 
/corefile24 
/corefile25 
/corefile26 
Status: Connected
Number of entries: 11



Actual results:
The files should weren't healed.

Expected results:
The files should have been healed.

Additional info:
logs kept at rhsqe-repo.lab.eng.blr.redhat.com:/var/www/html/sosreports/<bug>

--- Additional comment from Karan Sandha on 2016-05-31 02:18 EDT ---



--- Additional comment from Karan Sandha on 2016-05-31 02:19 EDT ---



--- Additional comment from Karan Sandha on 2016-05-31 02:20 EDT ---



--- Additional comment from Karan Sandha on 2016-05-31 02:22 EDT ---



--- Additional comment from Karan Sandha on 2016-06-07 08:11 EDT ---



--- Additional comment from Karan Sandha on 2016-06-07 08:38:33 EDT ---

Steps To Reproduce:-
1) Create 1x3 Arbiter volume 
2) bricks B1 ,B2, B3(A)
3) bring down B1 
4) Create 50 files 500MB each on fuse mount from client.
5) after 30 files are created
6) bring up the B1 and bring down B3


Check gluster volume heal info
ls for the files on bricks there will be multiple 0 byte files
and gluster heal info shows mulitple files to be healed.

--- Additional comment from Vijay Bellur on 2016-06-20 04:18:01 EDT ---

REVIEW: http://review.gluster.org/14769 (afr: Do not mark arbiter as data source during newentry_mark) posted (#1) for review on master by Ravishankar N (ravishankar@redhat.com)

--- Additional comment from Vijay Bellur on 2016-06-24 07:42:49 EDT ---

REVIEW: http://review.gluster.org/14769 (afr: Do not mark arbiter as data source during newentry_mark) posted (#2) for review on master by Ravishankar N (ravishankar@redhat.com)

--- Additional comment from Ravishankar N on 2016-06-24 07:44:56 EDT ---

Moved BZ state by mistake
Comment 3 Atin Mukherjee 2016-08-30 01:34:04 EDT
http://review.gluster.org/14769 posted upstream for review.
Comment 4 Karan Sandha 2016-09-30 05:16:52 EDT
Increasing the priority of this bug as i am hitting this issue on pretty much every brick down scenarios.

Note You need to log in before you can comment on or make changes to this bug.