1361518 – Files not able to heal after arbiter and data bricks were rebooted

Bug 1361518 - Files not able to heal after arbiter and data bricks were rebooted

Summary: Files not able to heal after arbiter and data bricks were rebooted

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	arbiter
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Ravishankar N
QA Contact:	Karan Sandha
Docs Contact:
URL:
Whiteboard:
Depends On:	1340032
Blocks:	1351530
TreeView+	depends on / blocked

Reported:	2016-07-29 09:21 UTC by Pranith Kumar K
Modified:	2018-11-08 12:16 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:	If a file create is wound to all bricks, and it succeeds only on arbiter, the application will get a failure. But during self-heal, the file gets created on the data bricks with arbiter marked as source. Since data self-heal can never happen from arbiter, 'heal-info' will list the entries forever. Workaround: If 'gluster vol heal <volname> info` shows the pending heals for a file forever, then check if the issue is the same as mentioned above by: i) checking that trusted.afr.volname-client* xattrs are zero on the data bricks ii)checking that trusted.afr.volname-client* xattrs is non-zero on the arbiter brick only for the data part (first 4 bytes) Example: #getfattr -d -m . -e hex /bricks/arbiterbrick/file \|grep trusted.afr.testvol* getfattr: Removing leading '/' from absolute path names trusted.afr.testvol-client-0=0x000000540000000000000000 trusted.afr.testvol-client-1=0x000000540000000000000000 If it is in this state, then delete the xattr: #for i in $(getfattr -d -m . -e hex /bricks/arbiterbrick/file \|grep trusted.afr.testvol*\|cut -f1 -d'='); do setfattr -x $i file; done
Clone Of:	1340032
Environment:
Last Closed:	2018-04-16 18:16:27 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Pranith Kumar K 2016-07-29 09:21:00 UTC

+++ This bug was initially created as a clone of Bug #1340032 +++

Description of problem:
created script for 50 files 5mb each and during the creation rebooted 2 nodes arbiter and data brick. while one brick was alive.

Version-Release number of selected component (if applicable):


How reproducible:
1 time

Steps to Reproduce:
1.create a 1x3 volume -core(name) 
2. mounted on the client using fuse  at /mnt/core
3. ran this Script for 1 min.
for (( i=1;i<=50;i++ )) 
do 
dd if=/dev/urandom of=corefile$i bs=5M count=5 status=progress
done

4. rebooted the arbiter and one the data brick.
5. files 15 to 26 were only touched. No data was written. 0 byte file.
6. Files from arbiter and data weren't able to heal

[root@dhcp43-192 core]# gluster volume heal core info
Brick dhcp43-157.lab.eng.blr.redhat.com:/rhs/brick1/core
Status: Connected
Number of entries: 0

Brick dhcp43-192.lab.eng.blr.redhat.com:/rhs/brick1/core
/corefile16 
/corefile17 
/corefile18 
/corefile19 
/corefile20 
/corefile21 
/corefile22 
/corefile23 
/corefile24 
/corefile25 
/corefile26 
Status: Connected
Number of entries: 11

Brick dhcp43-153.lab.eng.blr.redhat.com:/rhs/brick1/core
/corefile16 
/corefile17 
/corefile18 
/corefile19 
/corefile20 
/corefile21 
/corefile22 
/corefile23 
/corefile24 
/corefile25 
/corefile26 
Status: Connected
Number of entries: 11



Actual results:
The files should weren't healed.

Expected results:
The files should have been healed.

Additional info:
logs kept at rhsqe-repo.lab.eng.blr.redhat.com:/var/www/html/sosreports/<bug>

--- Additional comment from Karan Sandha on 2016-05-31 02:18 EDT ---



--- Additional comment from Karan Sandha on 2016-05-31 02:19 EDT ---



--- Additional comment from Karan Sandha on 2016-05-31 02:20 EDT ---



--- Additional comment from Karan Sandha on 2016-05-31 02:22 EDT ---



--- Additional comment from Karan Sandha on 2016-06-07 08:11 EDT ---



--- Additional comment from Karan Sandha on 2016-06-07 08:38:33 EDT ---

Steps To Reproduce:-
1) Create 1x3 Arbiter volume 
2) bricks B1 ,B2, B3(A)
3) bring down B1 
4) Create 50 files 500MB each on fuse mount from client.
5) after 30 files are created
6) bring up the B1 and bring down B3


Check gluster volume heal info
ls for the files on bricks there will be multiple 0 byte files
and gluster heal info shows mulitple files to be healed.

--- Additional comment from Vijay Bellur on 2016-06-20 04:18:01 EDT ---

REVIEW: http://review.gluster.org/14769 (afr: Do not mark arbiter as data source during newentry_mark) posted (#1) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Vijay Bellur on 2016-06-24 07:42:49 EDT ---

REVIEW: http://review.gluster.org/14769 (afr: Do not mark arbiter as data source during newentry_mark) posted (#2) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Ravishankar N on 2016-06-24 07:44:56 EDT ---

Moved BZ state by mistake

Comment 3 Atin Mukherjee 2016-08-30 05:34:04 UTC

http://review.gluster.org/14769 posted upstream for review.

Comment 4 Karan Sandha 2016-09-30 09:16:52 UTC

Increasing the priority of this bug as i am hitting this issue on pretty much every brick down scenarios.

Comment 20 Amar Tumballi 2018-11-06 10:15:02 UTC

Bipin, Idea is not about we WONTFIX always, but that was done with looking the activity on the bugzilla, and we had not picked the particular bug for previous 2 releases.

We will keep it as an open bug in Upstream and fix it, and will get it to downstream when we get it in releases as backports.

Note You need to log in before you can comment on or make changes to this bug.