Bug 2073919 - Pending heals in gluster arbiter volume and gfid2path not returning output for pending entries
Summary: Pending heals in gluster arbiter volume and gfid2path not returning output fo...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: arbiter
Version: rhgs-3.5
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Karthik U S
QA Contact: Vinayak Papnoi
URL:
Whiteboard:
Depends On:
Blocks: 2015551
TreeView+ depends on / blocked
 
Reported: 2022-04-11 06:39 UTC by Vinayak Papnoi
Modified: 2023-09-18 04:35 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-10-12 09:48:41 UTC
Embargoed:


Attachments (Terms of Use)

Description Vinayak Papnoi 2022-04-11 06:39:52 UTC
Before you record your issue, ensure you are using the latest version of Gluster.


Provide version-Release number of selected component (if applicable):
---------------------------------------------------------------------

# rpm -qa | grep glusterfs
glusterfs-libs-6.0-62.el7rhgs.x86_64
glusterfs-6.0-62.el7rhgs.x86_64
glusterfs-fuse-6.0-62.el7rhgs.x86_64
glusterfs-cli-6.0-62.el7rhgs.x86_64
glusterfs-geo-replication-6.0-62.el7rhgs.x86_64
glusterfs-client-xlators-6.0-62.el7rhgs.x86_64
glusterfs-api-6.0-62.el7rhgs.x86_64
glusterfs-server-6.0-62.el7rhgs.x86_64


 
Have you searched the Bugzilla archives for same/similar issues reported.



Did you run SoS report with Insights tool?.



Have you discovered any workarounds?. 
If not, Read the troubleshooting documentation to help solve your issue.
(https://mojo.redhat.com/groups/gss-gluster (Gluster feature and its troubleshooting)  https://access.redhat.com/articles/1365073 
(Specific debug data that needs to be collected for GlusterFS to help troubleshooting)



Please provide the below Mandatory Information:
-----------------------------------------------

1 - gluster v <volname> info
# gluster v info
 
Volume Name: nas
Type: Distributed-Replicate
Volume ID: 524e80d9-a063-46ed-9446-56b1e47356c3
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: birdman.lab.eng.blr.redhat.com:/brick/brick1/nas-b1
Brick2: tettnang.lab.eng.blr.redhat.com:/brick/brick1/nas-b2
Brick3: transformers.lab.eng.blr.redhat.com:/brick/brick1/nas-b3 (arbiter)
Brick4: birdman.lab.eng.blr.redhat.com:/brick/brick2/nas-b4
Brick5: tettnang.lab.eng.blr.redhat.com:/brick/brick2/nas-b5
Brick6: transformers.lab.eng.blr.redhat.com:/brick/brick2/nas-b6 (arbiter)
Brick7: birdman.lab.eng.blr.redhat.com:/brick/brick3/nas-b7
Brick8: tettnang.lab.eng.blr.redhat.com:/brick/brick3/nas-b8
Brick9: transformers.lab.eng.blr.redhat.com:/brick/brick3/nas-b9 (arbiter)
Options Reconfigured:
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off


2 - gluster v <volname> heal info

# gluster v heal nas info
Brick birdman.lab.eng.blr.redhat.com:/brick/brick1/nas-b1
Status: Connected
Number of entries: 0

Brick tettnang.lab.eng.blr.redhat.com:/brick/brick1/nas-b2
<gfid:808911c0-e1fa-43c0-a566-1171c49b715b> 
<gfid:129bc49f-752a-4969-ad28-7724ba00f243> 
<gfid:941cdcce-9d78-4813-910e-56925b6141e0> 
<gfid:7c3e11d5-b6ea-445d-8a39-6d83bb78b26a> 
<gfid:f240221e-e296-4354-9973-5b9b7de1b400> 
<gfid:135e498c-73df-417c-8345-0adc8dc02dc9> 
Status: Connected
Number of entries: 6

Brick transformers.lab.eng.blr.redhat.com:/brick/brick1/nas-b3
<gfid:129bc49f-752a-4969-ad28-7724ba00f243> 
<gfid:808911c0-e1fa-43c0-a566-1171c49b715b> 
<gfid:941cdcce-9d78-4813-910e-56925b6141e0> 
<gfid:7c3e11d5-b6ea-445d-8a39-6d83bb78b26a> 
<gfid:f240221e-e296-4354-9973-5b9b7de1b400> 
<gfid:135e498c-73df-417c-8345-0adc8dc02dc9> 
Status: Connected
Number of entries: 6

Brick birdman.lab.eng.blr.redhat.com:/brick/brick2/nas-b4
Status: Connected
Number of entries: 0

Brick tettnang.lab.eng.blr.redhat.com:/brick/brick2/nas-b5
Status: Connected
Number of entries: 0

Brick transformers.lab.eng.blr.redhat.com:/brick/brick2/nas-b6
Status: Connected
Number of entries: 0

Brick birdman.lab.eng.blr.redhat.com:/brick/brick3/nas-b7
Status: Connected
Number of entries: 0

Brick tettnang.lab.eng.blr.redhat.com:/brick/brick3/nas-b8
<gfid:37e0451d-6e94-46df-92f7-744e18051891> 
<gfid:1a974215-7e35-4703-b020-8df042989274> 
<gfid:6435c64f-8d87-4217-ac4a-c5bb33f72b33>  details required. 

Regards,
<gfid:35d96a31-1bbc-42e0-9990-c516e9fe3e97> 
<gfid:cdfaef44-9468-4243-80b1-415f2b8ef9b2> 
<gfid:17ca6e92-9f94-42c9-9da7-d6862fe72f30> 
<gfid:f34c5377-b340-44e8-865a-6ad56eff1e23> 
<gfid:135e498c-73df-417c-8345-0adc8dc02dc9> 
Status: Connected
Number of entries: 8

Brick transformers.lab.eng.blr.redhat.com:/brick/brick3/nas-b9
<gfid:37e0451d-6e94-46df-92f7-744e18051891> 
<gfid:1a974215-7e35-4703-b020-8df042989274> 
<gfid:6435c64f-8d87-4217-ac4a-c5bb33f72b33> 
<gfid:35d96a31-1bbc-42e0-9990-c516e9fe3e97> 
<gfid:cdfaef44-9468-4243-80b1-415f2b8ef9b2> 
<gfid:17ca6e92-9f94-42c9-9da7-d6862fe72f30> 
<gfid:f34c5377-b340-44e8-865a-6ad56eff1e23> 
<gfid:135e498c-73df-417c-8345-0adc8dc02dc9> 
Status: Connected
Number of entries: 8


3 - gluster v <volname> status

# gluster v status nas
Status of volume: nas
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick birdman.lab.eng.blr.redhat.com:/brick
/brick1/nas-b1                              49152     0          Y       11911
Brick tettnang.lab.eng.blr.redhat.com:/bric
k/brick1/nas-b2                             49152     0          Y       31259
Brick transformers.lab.eng.blr.redhat.com:/
brick/brick1/nas-b3                         49152     0          Y       41997
Brick birdman.lab.eng.blr.redhat.com:/brick
/brick2/nas-b4                              49153     0          Y       11912
Brick tettnang.lab.eng.blr.redhat.com:/bric
k/brick2/nas-b5                             49153     0          Y       31274
Brick transformers.lab.eng.blr.redhat.com:/
brick/brick2/nas-b6                         49153     0          Y       42012
Brick birdman.lab.eng.blr.redhat.com:/brick
/brick3/nas-b7                              49154     0          Y       11923
Brick tettnang.lab.eng.blr.redhat.com:/bric
k/brick3/nas-b8                             49154     0          Y       31289
Brick transformers.lab.eng.blr.redhat.com:/
brick/brick3/nas-b9                         49154     0          Y       42027
Self-heal Daemon on localhost               N/A       N/A        Y       11937
Self-heal Daemon on transformers.lab.eng.bl
r.redhat.com                                N/A       N/A        Y       42043
Self-heal Daemon on tettnang.lab.eng.blr.re
dhat.com                                    N/A       N/A        Y       3816 
 
Task Status of Volume nas
------------------------------------------------------------------------------
There are no active volume tasks
 

4 - Fuse Mount

# df -hT
Filesystem                          Type            Size  Used Avail Use% Mounted on
devtmpfs                            devtmpfs        7.8G     0  7.8G   0% /dev
tmpfs                               tmpfs           7.8G     0  7.8G   0% /dev/shm
tmpfs                               tmpfs           7.8G  9.6M  7.8G   1% /run
tmpfs                               tmpfs           7.8G     0  7.8G   0% /sys/fs/cgroup
/dev/mapper/rhel_rhs--client21-root xfs              50G  6.1G   44G  13% /
/dev/sda1                           xfs            1014M  240M  775M  24% /boot
/dev/mapper/rhel_rhs--client21-home xfs             1.8T   33M  1.8T   1% /home
tmpfs                               tmpfs           1.6G     0  1.6G   0% /run/user/0
birdman.lab.eng.blr.redhat.com:/nas fuse.glusterfs  1.2T  485G  709G  41% /mnt/nas




Describe the issue:(please be detailed as possible and provide log snippets)
[Provide TimeStamp when the issue is seen]
-------------------

While performing a node reboot scenario (with running IO's) on the above mentioned volume, there are multiple files pending heal. 

The gfid2file.sh script is also not returning any output for the above files and has been in a hung state.

Tried the gfid2file for a good file and got the following output:

# cat newgfid.txt | ./gfid2path.sh /brick/brick2/nas-b6/
0688e672-be12-48c3-aa54-b094214d7e9a /dir.1/linux-5.4.180/kernel/relay.c
  File: ‘/brick/brick2/nas-b6///dir.1/linux-5.4.180/kernel/relay.c’
  Size: 0             Blocks: 0          IO Block: 4096   regular empty file
Device: fd2ah/64810d    Inode: 537295365   Links: 2
Access: (0664/-rw-rw-r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:glusterd_brick_t:s0
Access: 2022-04-07 07:17:55.744280014 +0530
Modify: 2022-02-16 17:22:54.000000000 +0530
Change: 2022-04-07 07:17:54.461976963 +0530
 Birth: -
getfattr: Removing leading '/' from absolute path names
# file: brick/brick2/nas-b6///dir.1/linux-5.4.180/kernel/relay.c
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x0688e672be1248c3aa54b094214d7e9a
trusted.gfid2path.63c34589695e7154=0x64613134613161382d643736322d343565312d623762322d6662336133353630313131352f72656c61792e63
trusted.glusterfs.dht=0x00000000000000000000000055555554

-rw-rw-r--. 2 root root 0 Feb 16 17:22 /brick/brick2/nas-b6///dir.1/linux-5.4.180/kernel/relay.c




Is this issue reproducible? If yes, share more details.:
1/1

Steps to Reproduce:
-------------------

1. Create 5 node cluster with RHEL 7.9 + RHGS 3.5.7
2. Create 3 x (2 + 1) arbiter volume 
3. Mount the volume on 2 clients using node1 and start the IO (kernel untar, dd, rm, ls -lRt, renames)
4. Perform node reboot for node1 and trigger heal
5. Check for heal info


Actual results:
---------------

Heal info shows pending heals as listed above.


Expected results:
-----------------

Heal info must show 0 files pending heal.
 

Any Additional info:
--------------------

# cat /mnt/nas/.meta/graphs/active/nas-client-*/private | egrep -i 'connected'
connected = 1
connected = 1
connected = 1
connected = 1
connected = 1
connected = 1
connected = 1
connected = 1
connected = 1

Comment 27 Nicole Yancey 2022-05-17 14:39:01 UTC
Bug will remain opened until discuss https://bugzilla.redhat.com/show_bug.cgi?id=2015551

Comment 29 Red Hat Bugzilla 2023-09-18 04:35:11 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.