Bug 1486063

Summary: 0 kByte file not self-healing with replica 2 + arbiter
Product: [Community] GlusterFS Reporter: mabi
Component: replicateAssignee: Ravishankar N <ravishankar>
Status: CLOSED WONTFIX QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.8CC: bugs, ravishankar
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-03 04:49:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
self-heal daemon log file of all 3 nodes none

Description mabi 2017-08-28 21:34:44 UTC
Created attachment 1319232 [details]
self-heal daemon log file of all 3 nodes

Description of problem:
I have a 3 nodes replica-2+arbiter where one single 0 kBytes big file is stuck in self-heal and as such never gets healed. The whole issue has been extensively discussed and described on the gluster-users mailing list here: 

http://lists.gluster.org/pipermail/gluster-users/2017-August/032105.html

For easiness I have pasted a few relevant infos here below, starting with the heal info output:

Brick node1.domain.tld:/data/myvolume/brick
/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
Status: Connected
Number of entries: 1

Brick node2.domain.tld:/data/myvolume/brick
/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
Status: Connected
Number of entries: 1

Brick node3.domain.tld:/srv/glusterfs/myvolume/brick
/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
Status: Connected
Number of entries: 1

A stat and getfattr of the file on each brick:

NODE1:

STAT:
  File: ‘/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png’
  Size: 0         Blocks: 38         IO Block: 131072 regular empty file
Device: 24h/36d Inode: 10033884    Links: 2
Access: (0644/-rw-r--r--)  Uid: (   33/www-data)   Gid: (   33/www-data)
Access: 2017-08-14 17:04:55.530681000 +0200
Modify: 2017-08-14 17:11:46.407404779 +0200
Change: 2017-08-14 17:11:46.407404779 +0200
Birth: -

GETFATTR:
trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
trusted.bit-rot.version=0sAgAAAAAAAABZhuknAAlJAg==
trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOyo=

NODE2:

STAT:
  File: ‘/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png’
  Size: 0         Blocks: 38         IO Block: 131072 regular empty file
Device: 26h/38d Inode: 10031330    Links: 2
Access: (0644/-rw-r--r--)  Uid: (   33/www-data)   Gid: (   33/www-data)
Access: 2017-08-14 17:04:55.530681000 +0200
Modify: 2017-08-14 17:11:46.403704181 +0200
Change: 2017-08-14 17:11:46.403704181 +0200
Birth: -

GETFATTR:
trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
trusted.bit-rot.version=0sAgAAAAAAAABZhu6wAA8Hpw==
trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOVE=

NODE3:
STAT:
  File: /srv/glusterfs/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
  Size: 0         Blocks: 0          IO Block: 4096   regular empty file
Device: ca11h/51729d Inode: 405208959   Links: 2
Access: (0644/-rw-r--r--)  Uid: (   33/www-data)   Gid: (   33/www-data)
Access: 2017-08-14 17:04:55.530681000 +0200
Modify: 2017-08-14 17:04:55.530681000 +0200
Change: 2017-08-14 17:11:46.604380051 +0200
Birth: -

GETFATTR:
trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
trusted.bit-rot.version=0sAgAAAAAAAABZe6ejAAKPAg==
trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOc4=

CLIENT GLUSTER MOUNT:
STAT:
  File: '/mnt/myvolume/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png'
  Size: 0         Blocks: 0          IO Block: 131072 regular empty file
Device: 1eh/30d Inode: 11897049013408443114  Links: 1
Access: (0644/-rw-r--r--)  Uid: (   33/www-data)   Gid: (   33/www-data)
Access: 2017-08-14 17:04:55.530681000 +0200
Modify: 2017-08-14 17:11:46.407404779 +0200
Change: 2017-08-14 17:11:46.407404779 +0200
Birth: -

Version-Release number of selected component (if applicable):
GlusterFS 3.8.11 on Debian 8

How reproducible:
AFAIK Ravishankar managed to reproduced the problem.

Steps to Reproduce:
1. Ask Ravi
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Ravishankar N 2017-09-14 12:00:56 UTC
> Steps to Reproduce:
> 1. Ask Ravi

1. Create  a zero-byte file (`touch file`) on  an arbiter volume via fuse mount.
2. Attach gdb to mount process, put break points before winding pre-op and write.
3. echo "hello" >>file
4. after pre-op is stack_wound on all bricks, `pkill gluster && pkill gdb`
5. Now we have the dirty bit set (and no pending bits) for the file on all bricks and an entry inside .glusterfs/indices/dirty.
6. restart all gluster processes.
7. `heal info` will show the entry as needing heal.
8. Heal won't complete because data self-heal algo picks up the brick with latest ctime as source. In this case arbiter is likely the source because the pre-op in step-4 happened last on the arbiter.

Comment 2 Ravishankar N 2017-09-14 12:12:48 UTC
Sent patch https://review.gluster.org/#/c/18283/ against BZ 1491670.

Comment 3 Ravishankar N 2017-10-03 04:49:04 UTC
(In reply to Ravishankar N from comment #2)
> Sent patch https://review.gluster.org/#/c/18283/ against BZ 1491670.

The addendum patch to address review comments in 18283 has also been merged in master: https://review.gluster.org/#/c/18391/

Note: I'm back porting these patches to only the 3.12 and 3.8 branches. glusterfs-3.8 is EOL, hence moving this bug to CLOSED.