Bug 1332194 - gluster volume heal info throwing duplicate file or gfid entries
Summary: gluster volume heal info throwing duplicate file or gfid entries
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: ---
Assignee: Pranith Kumar K
QA Contact: Anees Patel
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-02 13:10 UTC by Nag Pavan Chilakam
Modified: 2019-07-17 04:39 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-02-14 08:59:01 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Nag Pavan Chilakam 2016-05-02 13:10:01 UTC
Description of problem:
=====================
I sometimes observe that the heal info throws duplicate entries for the same file or gfid

[root@dhcp35-191 glusterfs]# gluster v heal tinker info
Brick 10.70.35.191:/rhs/brick1/tinker
/newdata - Possibly undergoing heal

/newdata - Possibly undergoing heal

Number of entries: 2

Brick 10.70.35.27:/rhs/brick1/tinker
/newdata - Possibly undergoing heal

Number of entries: 1

Brick 10.70.35.191:/rhs/brick2/tinker
Status: Transport endpoint is not connected

[root@dhcp35-191 glusterfs]# gluster v status tinker
Status of volume: tinker
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.191:/rhs/brick1/tinker       49187     0          Y       5597 
Brick 10.70.35.27:/rhs/brick1/tinker        49166     0          Y       29082
Brick 10.70.35.191:/rhs/brick2/tinker       N/A       N/A        N       N/A  
NFS Server on localhost                     2049      0          Y       5617 
Self-heal Daemon on localhost               N/A       N/A        Y       5625 
NFS Server on 10.70.35.44                   2049      0          Y       25745
Self-heal Daemon on 10.70.35.44             N/A       N/A        Y       25753
NFS Server on 10.70.35.64                   2049      0          Y       1523 
Self-heal Daemon on 10.70.35.64             N/A       N/A        Y       1531 
NFS Server on 10.70.35.98                   2049      0          Y       31548
Self-heal Daemon on 10.70.35.98             N/A       N/A        Y       31556
NFS Server on 10.70.35.27                   2049      0          Y       30478
Self-heal Daemon on 10.70.35.27             N/A       N/A        Y       30487
NFS Server on 10.70.35.114                  2049      0          Y       30209
Self-heal Daemon on 10.70.35.114            N/A       N/A        Y       30217
 
Task Status of Volume tinker
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp35-191 glusterfs]# gluster v info tinker
 
Volume Name: tinker
Type: Replicate
Volume ID: 5f00ff1a-410f-4a9b-82a0-91d5cfae89c3
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.35.191:/rhs/brick1/tinker
Brick2: 10.70.35.27:/rhs/brick1/tinker
Brick3: 10.70.35.191:/rhs/brick2/tinker
Options Reconfigured:
performance.readdir-ahead: on


########another instance###################
[root@dhcp35-191 glusterfs]# gluster v heal tinker
Launching heal operation to perform index self heal on volume tinker has been unsuccessful on bricks that are down. Please check if all brick processes are running.
[root@dhcp35-191 glusterfs]# gluster v heal tinker info;
Brick 10.70.35.191:/rhs/brick1/tinker
Status: Transport endpoint is not connected

Brick 10.70.35.27:/rhs/brick1/tinker
/newdata - Possibly undergoing heal

/newdata - Possibly undergoing heal

Number of entries: 2

Brick 10.70.35.191:/rhs/brick2/tinker
<gfid:d08eb8a1-7ae6-4994-bc79-c14830b5d7d8> - Possibly undergoing heal

<gfid:d08eb8a1-7ae6-4994-bc79-c14830b5d7d8> - Possibly undergoing heal

Number of entries: 2



Version-Release number of selected component (if applicable):
==========
3.7.9-2



Steps to Reproduce:
1.create a 1x3 volume(in my case 1st and 3rd bricks are hosted in same node)
2.now mount volume on fuse and write a 1gb file
3.now keep writing to the file and bring down the first brick
4. keep the brick down till atleast there is 2-3GB of data for healing 
5. keep writes happening and force start the volume to bring back the brick up
6. Then kill the 3rd brick while IOs keep happening
7. Issue a heal command to heal the brick1 as there is still brick2 for source prupose
8. then issue a heal info command.
it can be seen that the same file is shown twice in the info

This behavior ceases to exist after sometime



#############################################################

File and brick xattrs  info

##while duplicate entries are getting listed########
[root@dhcp35-191 ~]#  ll /rhs/brick*/tinker/;du -sh /rhs/brick*/tinker/*
/rhs/brick1/tinker/:
total 6430956
-rw-r--r--. 2 root root 8841353216 May  2 17:52 newdata

/rhs/brick2/tinker/:
total 9771904
-rw-r--r--. 2 root root 8841353216 May  2 17:52 newdata
6.2G	/rhs/brick1/tinker/newdata
9.4G	/rhs/brick2/tinker/newdata
[root@dhcp35-191 ~]#  ll /rhs/brick*/tinker/;du -sh /rhs/brick*/tinker/*;getfattr -d -m . -e hex /rhs/brick*/tinker/
/rhs/brick1/tinker/:
total 6511580
-rw-r--r--. 2 root root 9028352000 May  2 17:53 newdata

/rhs/brick2/tinker/:
total 9771904
-rw-r--r--. 2 root root 8914724864 May  2 17:52 newdata
6.3G	/rhs/brick1/tinker/newdata
9.4G	/rhs/brick2/tinker/newdata
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/tinker/
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x5f00ff1a410f4a9b82a091d5cfae89c3

# file: rhs/brick2/tinker/
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x5f00ff1a410f4a9b82a091d5cfae89c3

[root@dhcp35-191 ~]#  ll /rhs/brick*/tinker/;du -sh /rhs/brick*/tinker/*;getfattr -d -m . -e hex /rhs/brick*/tinker/newdata
/rhs/brick1/tinker/:
total 6511580
-rw-r--r--. 2 root root 9028352000 May  2 17:53 newdata

/rhs/brick2/tinker/:
total 9771904
-rw-r--r--. 2 root root 8914724864 May  2 17:52 newdata
6.3G	/rhs/brick1/tinker/newdata
9.4G	/rhs/brick2/tinker/newdata
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/tinker/newdata
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000010000000000000000
trusted.afr.tinker-client-2=0x000002710000000000000000
trusted.bit-rot.version=0x05000000000000005727467200057c36
trusted.gfid=0xd08eb8a17ae64994bc79c14830b5d7d8

# file: rhs/brick2/tinker/newdata
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.tinker-client-0=0x0000001c0000000000000000
trusted.bit-rot.version=0x030000000000000057273cbd0003f1da
trusted.gfid=0xd08eb8a17ae64994bc79c14830b5d7d8

[root@dhcp35-191 ~]#  ll /rhs/brick*/tinker/;du -sh /rhs/brick*/tinker/*;getfattr -d -m . -e hex /rhs/brick*/tinker/newdata; ll /rhs/brick*/tinker/.glusterfs/
/rhs/brick1/tinker/:
total 6511580
-rw-r--r--. 2 root root 9028352000 May  2 17:53 newdata

/rhs/brick2/tinker/:
total 9771904
-rw-r--r--. 2 root root 8914724864 May  2 17:52 newdata
6.3G	/rhs/brick1/tinker/newdata
9.4G	/rhs/brick2/tinker/newdata
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/tinker/newdata
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000010000000000000000
trusted.afr.tinker-client-2=0x000002710000000000000000
trusted.bit-rot.version=0x05000000000000005727467200057c36
trusted.gfid=0xd08eb8a17ae64994bc79c14830b5d7d8

# file: rhs/brick2/tinker/newdata
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.tinker-client-0=0x0000001c0000000000000000
trusted.bit-rot.version=0x030000000000000057273cbd0003f1da
trusted.gfid=0xd08eb8a17ae64994bc79c14830b5d7d8

/rhs/brick1/tinker/.glusterfs/:
total 64
drwx------. 3 root root    15 May  2 16:57 00
drw-------. 4 root root    30 May  2 16:57 changelogs
drwx------. 3 root root    15 May  2 17:01 d0
-rw-r--r--. 1 root root    19 May  2 17:53 health_check
drw-------. 4 root root    32 May  2 16:57 indices
drwxr-xr-x. 2 root root     6 May  2 17:52 landfill
drw-------. 2 root root    54 May  2 16:57 quanrantine
-rw-r--r--. 1 root root  4096 May  2 16:57 tinker.db
-rw-r--r--. 1 root root 32768 May  2 16:57 tinker.db-shm
-rw-r--r--. 1 root root 20632 May  2 16:57 tinker.db-wal
drw-------. 2 root root     6 May  2 17:52 unlink

/rhs/brick2/tinker/.glusterfs/:
total 64
drwx------. 3 root root    15 May  2 16:57 00
drw-------. 4 root root    30 May  2 16:57 changelogs
drwx------. 3 root root    15 May  2 17:01 d0
-rw-r--r--. 1 root root    19 May  2 17:52 health_check
drw-------. 4 root root    32 May  2 16:57 indices
drwxr-xr-x. 2 root root     6 May  2 17:10 landfill
drw-------. 2 root root    54 May  2 16:57 quanrantine
-rw-r--r--. 1 root root  4096 May  2 16:57 tinker.db
-rw-r--r--. 1 root root 32768 May  2 16:57 tinker.db-shm
-rw-r--r--. 1 root root 20632 May  2 16:57 tinker.db-wal
drw-------. 2 root root     6 May  2 17:10 unlink
[root@dhcp35-191 ~]#  ll /rhs/brick*/tinker/;du -sh /rhs/brick*/tinker/*;getfattr -d -m . -e hex /rhs/brick*/tinker/newdata; ll /rhs/brick*/tinker/.glusterfs/indices
/rhs/brick1/tinker/:
total 6511580
-rw-r--r--. 2 root root 9028352000 May  2 17:53 newdata

/rhs/brick2/tinker/:
total 9771904
-rw-r--r--. 2 root root 8914724864 May  2 17:52 newdata
6.3G	/rhs/brick1/tinker/newdata
9.4G	/rhs/brick2/tinker/newdata
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/tinker/newdata
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000010000000000000000
trusted.afr.tinker-client-2=0x000002710000000000000000
trusted.bit-rot.version=0x05000000000000005727467200057c36
trusted.gfid=0xd08eb8a17ae64994bc79c14830b5d7d8

# file: rhs/brick2/tinker/newdata
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.tinker-client-0=0x0000001c0000000000000000
trusted.bit-rot.version=0x030000000000000057273cbd0003f1da
trusted.gfid=0xd08eb8a17ae64994bc79c14830b5d7d8

/rhs/brick1/tinker/.glusterfs/indices:
total 0
drw-------. 2 root root 147 May  2 17:52 dirty
drw-------. 2 root root 100 May  2 17:52 xattrop

/rhs/brick2/tinker/.glusterfs/indices:
total 0
drw-------. 2 root root  55 May  2 17:52 dirty
drw-------. 2 root root 100 May  2 17:52 xattrop
[root@dhcp35-191 ~]#  ll /rhs/brick*/tinker/;du -sh /rhs/brick*/tinker/*;getfattr -d -m . -e hex /rhs/brick*/tinker/newdata; ll /rhs/brick*/tinker/.glusterfs/indices/dirty
/rhs/brick1/tinker/:
total 6511580
-rw-r--r--. 2 root root 9028352000 May  2 17:53 newdata

/rhs/brick2/tinker/:
total 9771904
-rw-r--r--. 2 root root 8914724864 May  2 17:52 newdata
6.3G	/rhs/brick1/tinker/newdata
9.4G	/rhs/brick2/tinker/newdata
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/tinker/newdata
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000010000000000000000
trusted.afr.tinker-client-2=0x000002710000000000000000
trusted.bit-rot.version=0x05000000000000005727467200057c36
trusted.gfid=0xd08eb8a17ae64994bc79c14830b5d7d8

# file: rhs/brick2/tinker/newdata
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.tinker-client-0=0x0000001c0000000000000000
trusted.bit-rot.version=0x030000000000000057273cbd0003f1da
trusted.gfid=0xd08eb8a17ae64994bc79c14830b5d7d8

/rhs/brick1/tinker/.glusterfs/indices/dirty:
total 0
----------. 2 root root 0 May  2 17:31 d08eb8a1-7ae6-4994-bc79-c14830b5d7d8
----------. 2 root root 0 May  2 17:31 dirty-51e107a3-0073-4167-ad39-a31a6b21ec68
----------. 1 root root 0 May  2 17:52 dirty-b35a1d07-0707-48b4-ba32-41c5c3f94f87

/rhs/brick2/tinker/.glusterfs/indices/dirty:
total 0
----------. 1 root root 0 May  2 17:13 dirty-298b589f-529e-40f3-bf08-898cf246941d
[root@dhcp35-191 ~]#  ll /rhs/brick*/tinker/;du -sh /rhs/brick*/tinker/*;getfattr -d -m . -e hex /rhs/brick*/tinker/newdata; ll /rhs/brick*/tinker/.glusterfs/indices/xat*
/rhs/brick1/tinker/:
total 6511580
-rw-r--r--. 2 root root 9028352000 May  2 17:53 newdata

/rhs/brick2/tinker/:
total 9771904
-rw-r--r--. 2 root root 8914724864 May  2 17:52 newdata
6.3G	/rhs/brick1/tinker/newdata
9.4G	/rhs/brick2/tinker/newdata
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/tinker/newdata
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000010000000000000000
trusted.afr.tinker-client-2=0x000002710000000000000000
trusted.bit-rot.version=0x05000000000000005727467200057c36
trusted.gfid=0xd08eb8a17ae64994bc79c14830b5d7d8

# file: rhs/brick2/tinker/newdata
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.tinker-client-0=0x0000001c0000000000000000
trusted.bit-rot.version=0x030000000000000057273cbd0003f1da
trusted.gfid=0xd08eb8a17ae64994bc79c14830b5d7d8

/rhs/brick1/tinker/.glusterfs/indices/xattrop:
total 0
----------. 2 root root 0 May  2 17:52 d08eb8a1-7ae6-4994-bc79-c14830b5d7d8
----------. 2 root root 0 May  2 17:52 xattrop-b35a1d07-0707-48b4-ba32-41c5c3f94f87

/rhs/brick2/tinker/.glusterfs/indices/xattrop:
total 0
----------. 2 root root 0 May  2 17:10 d08eb8a1-7ae6-4994-bc79-c14830b5d7d8
----------. 2 root root 0 May  2 17:10 xattrop-298b589f-529e-40f3-bf08-898cf246941d

#########no more duplicate entries see################

[root@dhcp35-191 ~]#  ll /rhs/brick*/tinker/;du -sh /rhs/brick*/tinker/*;getfattr -d -m . -e hex /rhs/brick*/tinker/newdata; ll /rhs/brick*/tinker/.glusterfs/indices/xat*;ll /rhs/brick*/tinker/.glusterfs/indices/dirt*;getfattr -d -m . -e hex /rhs/brick*/tinker
/rhs/brick1/tinker/:
total 8816752
-rw-r--r--. 2 root root 9028352000 May  2 17:53 newdata

/rhs/brick2/tinker/:
total 8705788
-rw-r--r--. 2 root root 8914724864 May  2 17:52 newdata
8.5G	/rhs/brick1/tinker/newdata
8.4G	/rhs/brick2/tinker/newdata
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/tinker/newdata
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.tinker-client-2=0x000002720000000000000000
trusted.bit-rot.version=0x05000000000000005727467200057c36
trusted.gfid=0xd08eb8a17ae64994bc79c14830b5d7d8

# file: rhs/brick2/tinker/newdata
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.tinker-client-0=0x0000001c0000000000000000
trusted.bit-rot.version=0x030000000000000057273cbd0003f1da
trusted.gfid=0xd08eb8a17ae64994bc79c14830b5d7d8

/rhs/brick1/tinker/.glusterfs/indices/xattrop:
total 0
----------. 2 root root 0 May  2 17:52 d08eb8a1-7ae6-4994-bc79-c14830b5d7d8
----------. 2 root root 0 May  2 17:52 xattrop-b35a1d07-0707-48b4-ba32-41c5c3f94f87

/rhs/brick2/tinker/.glusterfs/indices/xattrop:
total 0
----------. 2 root root 0 May  2 17:10 d08eb8a1-7ae6-4994-bc79-c14830b5d7d8
----------. 2 root root 0 May  2 17:10 xattrop-298b589f-529e-40f3-bf08-898cf246941d
/rhs/brick1/tinker/.glusterfs/indices/dirty:
total 0
----------. 1 root root 0 May  2 17:52 dirty-b35a1d07-0707-48b4-ba32-41c5c3f94f87

/rhs/brick2/tinker/.glusterfs/indices/dirty:
total 0
----------. 1 root root 0 May  2 17:13 dirty-298b589f-529e-40f3-bf08-898cf246941d
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/tinker
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x5f00ff1a410f4a9b82a091d5cfae89c3

# file: rhs/brick2/tinker
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x5f00ff1a410f4a9b82a091d5cfae89c3

Comment 2 Nag Pavan Chilakam 2016-05-02 13:21:43 UTC
sosreports@nchilaka@rhsqe-repo bug.1332194]$ pwd
/home/repo/sosreports/nchilaka/bug.1332194

Comment 3 Pranith Kumar K 2016-05-03 05:29:04 UTC
Anuradha,
     Could you check if the issue is because of the known issue where the index exists in both dirty and xattrop indices? I didn't look at the logs...

Pranith

Comment 5 Ravishankar N 2017-11-28 10:28:54 UTC
Hi Vijay, can you see if this issue is re-creatable?


Note You need to log in before you can comment on or make changes to this bug.