Bug 1658870 - Files pending heal in Distribute-Replicate (Arbiter) volume.
Summary: Files pending heal in Distribute-Replicate (Arbiter) volume.
Keywords:
Status: CLOSED DUPLICATE of bug 1640148
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: arbiter
Version: rhgs-3.4
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Ravishankar N
QA Contact: Anees Patel
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-13 04:30 UTC by Anees Patel
Modified: 2018-12-13 09:57 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-12-13 09:57:17 UTC
Embargoed:


Attachments (Terms of Use)

Description Anees Patel 2018-12-13 04:30:19 UTC
Description of problem:

Automated test-case fails with files pending healing,
Test-case name:test_entry_self_heal_heal_command
Protocol used: nfs
Volume type: 2X (2+1)

Retried the automate test-run on a local cluster, observed same results, heal is pending for few files

Version-Release number of selected component (if applicable):
# rpm -qa | grep gluster
glusterfs-client-xlators-3.12.2-31.el7rhgs.x86_64
glusterfs-debuginfo-3.12.2-31.el7rhgs.x86_64
glusterfs-cli-3.12.2-31.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.3.x86_64
glusterfs-libs-3.12.2-31.el7rhgs.x86_64
glusterfs-api-3.12.2-31.el7rhgs.x86_64


How reproducible:
2/2

Steps to Reproduce:
1. Create 2 X(2+1) Volume
2. NFS mount the volume
3. Disable client side healing (metadata, data and entry)
4. Write data, directories and files from mount-point.
5. Now set self-heal-deamon to off
6. Bring down one brick from each set,  example b2 and b4
7. Modify data from client. (create, mv and cp)
8. Bring all the bricks up.
9. Set self-heal-deamon to on
10. Check if all bricks are up and check if all shd as running
11. Issue heal
12 Heal should be completed, with no files pending and no files in split-brain.


Actual results:

At Step 12, heal is pending for a few files.

Expected results:

Heal should be completed for all files/dirs
Additional info:

# gluster v info testvol_distributed-replicated
 
Volume Name: testvol_distributed-replicated
Type: Distributed-Replicate
Volume ID: 2dad8909-862f-42c9-923d-8eafdfd1e50c
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.43.62:/bricks/brick5/testvol_distributed-replicated_brick0
Brick2: 10.70.42.103:/bricks/brick5/testvol_distributed-replicated_brick1
Brick3: 10.70.41.187:/bricks/brick5/testvol_distributed-replicated_brick2 (arbiter)
Brick4: 10.70.41.216:/bricks/brick9/testvol_distributed-replicated_brick3
Brick5: 10.70.42.104:/bricks/brick9/testvol_distributed-replicated_brick4
Brick6: 10.70.43.64:/bricks/brick9/testvol_distributed-replicated_brick5 (arbiter)
Options Reconfigured:
cluster.self-heal-daemon: on
cluster.data-self-heal: off
cluster.metadata-self-heal: off
cluster.entry-self-heal: off
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: off
performance.client-io-threads: off
cluster.server-quorum-ratio: 51


]# gluster v heal testvol_distributed-replicated info
Brick 10.70.43.62:/bricks/brick5/testvol_distributed-replicated_brick0
Status: Connected
Number of entries: 0

Brick 10.70.42.103:/bricks/brick5/testvol_distributed-replicated_brick1
Status: Connected
Number of entries: 0

Brick 10.70.41.187:/bricks/brick5/testvol_distributed-replicated_brick2
Status: Connected
Number of entries: 0

Brick 10.70.41.216:/bricks/brick9/testvol_distributed-replicated_brick3
/files/user2_a/dir0_a/dir0_a
/files/user2_a/dir0_a
Status: Connected
Number of entries: 2

Brick 10.70.42.104:/bricks/brick9/testvol_distributed-replicated_brick4
<gfid:d00340d5-f1a1-4e5d-8e78-a8fe7ad93e78>/user2_a/dir0_a/dir0_a
<gfid:d00340d5-f1a1-4e5d-8e78-a8fe7ad93e78>/user2_a/dir0_a
Status: Connected
Number of entries: 2

Brick 10.70.43.64:/bricks/brick9/testvol_distributed-replicated_brick5
/files/user2_a/dir0_a/dir0_a
/files/user2_a/dir0_a
Status: Connected
Number of entries: 2


Chage-logs for directory: dir0_a

[root@dhcp43-64 ~]# getfattr -de hex -m . /bricks/brick9/testvol_distributed-replicated_brick5/files/user2_a/dir0_a/
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick9/testvol_distributed-replicated_brick5/files/user2_a/dir0_a/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.testvol_distributed-replicated-client-4=0x000000000000000e00000001
trusted.gfid=0xd5e24a6225434db68e46b9e261641e9c
trusted.glusterfs.dht=0x00000000000000007fffffffffffffff
trusted.glusterfs.dht.mds=0x00000000

[root@dhcp41-216 ~]# getfattr -de hex -m . /bricks/brick9/testvol_distributed-replicated_brick3/files/user2_a/dir0_a/
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick9/testvol_distributed-replicated_brick3/files/user2_a/dir0_a/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.testvol_distributed-replicated-client-4=0x000000000000000e00000001
trusted.gfid=0xd5e24a6225434db68e46b9e261641e9c
trusted.glusterfs.dht=0x00000000000000007fffffffffffffff
trusted.glusterfs.dht.mds=0x00000000

[root@dhcp42-104 ~]# getfattr -de hex -m . /bricks/brick9/testvol_distributed-replicated_brick4/files/user2_a/dir0_a/
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick9/testvol_distributed-replicated_brick4/files/user2_a/dir0_a/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.gfid=0xd5e24a6225434db68e46b9e261641e9c

change-logs for directory dir0_a/dir0_a

[root@dhcp43-64 ~]# getfattr -de hex -m . /bricks/brick9/testvol_distributed-replicated_brick5/files/user2_a/dir0_a/dir0_a/
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick9/testvol_distributed-replicated_brick5/files/user2_a/dir0_a/dir0_a/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.testvol_distributed-replicated-client-4=0x000000000000000e00000001
trusted.gfid=0x197c24ca87ad49668d2773e8af8e8684
trusted.glusterfs.dht=0x0000000000000000000000007ffffffe


[root@dhcp41-216 ~]# getfattr -de hex -m . /bricks/brick9/testvol_distributed-replicated_brick3/files/user2_a/dir0_a/dir0_a/
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick9/testvol_distributed-replicated_brick3/files/user2_a/dir0_a/dir0_a/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.testvol_distributed-replicated-client-4=0x000000000000000600000001
trusted.gfid=0x197c24ca87ad49668d2773e8af8e8684
trusted.glusterfs.dht=0x0000000000000000000000007ffffffe

No file entry present for directory dir0_a/dir0_a in back-end brick 42-104 (data brick)

Comment 3 Ravishankar N 2018-12-13 05:01:57 UTC
Hi Anees,
The afr pending xattrs indicate entry self-heal is pending on '10.70.42.104:/bricks/brick9/testvol_distributed-replicated_brick4' and heal should be hindered. Can you check if this is a duplicate of BZ 1640148, where entry heal is not able to proceed because of the missing gfid symlink for the directory inside .glusterfs? If yes, we can close it as a duplicate.
-Ravi

Comment 4 Anees Patel 2018-12-13 09:57:17 UTC
Closing as a duplicate

*** This bug has been marked as a duplicate of bug 1640148 ***


Note You need to log in before you can comment on or make changes to this bug.