+++ This bug was initially created as a clone of Bug #1749322 +++ +++ This bug was initially created as a clone of Bug #1740968 +++ Description of problem: [root@mn-0:/home/robot] # gluster v heal services info Brick mn-0.local:/mnt/bricks/services/brick /db/upgrade Status: Connected Number of entries: 1 Brick mn-1.local:/mnt/bricks/services/brick /db/upgrade Status: Connected Number of entries: 1 Brick dbm-0.local:/mnt/bricks/services/brick Status: Connected Number of entries: 0 those entries keeps showing in gluster v heal info command, from glustershd log, each times when glustershd deal with this entry, nothing real is done, from gdb info, shd can not decide the heald_sinks, so nothing is done at each round of repair [root@mn-0:/home/robot] # gluster v heal services info Brick mn-0.local:/mnt/bricks/services/brick /db/upgrade Status: Connected Number of entries: 1 Brick mn-1.local:/mnt/bricks/services/brick /db/upgrade Status: Connected Number of entries: 1 Brick dbm-0.local:/mnt/bricks/services/brick Status: Connected Number of entries: 0 [Env info] Three bricks mn-0, mn-1,dbm-0 [root@mn-1:/mnt/bricks/services/brick/db/upgrade] # gluster v info services Volume Name: services Type: Replicate Volume ID: 062748ce-0876-46f6-9936-d9ff3a2b110a Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: mn-0.local:/mnt/bricks/services/brick Brick2: mn-1.local:/mnt/bricks/services/brick Brick3: dbm-0.local:/mnt/bricks/services/brick Options Reconfigured: cluster.heal-timeout: 60 performance.client-io-threads: off nfs.disable: on transport.address-family: inet cluster.server-quorum-type: none cluster.quorum-type: auto cluster.quorum-reads: true cluster.consistent-metadata: on server.allow-insecure: on network.ping-timeout: 42 cluster.favorite-child-policy: mtime client.ssl: on server.ssl: on ssl.private-key: /var/opt/nokia/certs/glusterfs/glusterfs.key ssl.own-cert: /var/opt/nokia/certs/glusterfs/glusterfs.pem ssl.ca-list: /var/opt/nokia/certs/glusterfs/glusterfs.ca cluster.server-quorum-ratio: 51% [debug info] [root@mn-0:/mnt/bricks/services/brick/db] # getfattr -m . -d -e hex upgrade/ # file: upgrade/ system.posix_acl_access=0x0200000001000700ffffffff04000500ffffffff08000700d302000008000700d402000010000700ffffffff20000500ffffffff trusted.afr.dirty=0x000000000000000000000000 trusted.afr.services-client-1=0x000000000000000000000015 trusted.afr.services-client-2=0x000000000000000000000000 trusted.gfid=0xf9ebed9856fb4e26987c3a890ed5203c trusted.glusterfs.dht=0x000000010000000000000000ffffffff [root@mn-1:/mnt/bricks/services/brick/db/upgrade] # getfattr -m . -d -e hex . # file: . system.posix_acl_access=0x0200000001000700ffffffff04000500ffffffff08000700d302000008000700d402000010000700ffffffff20000500ffffffff trusted.afr.dirty=0x000000000000000000000000 trusted.afr.services-client-0=0x000000000000000000000003 trusted.afr.services-client-2=0x000000000000000000000000 trusted.gfid=0xf9ebed9856fb4e26987c3a890ed5203c trusted.glusterfs.dht=0x000000010000000000000000ffffffff [root@dbm-0:/mnt/bricks/services/brick/db/upgrade] # getfattr -m . -d -e hex . # file: . system.posix_acl_access=0x0200000001000700ffffffff04000500ffffffff08000700d302000008000700d402000010000700ffffffff20000500ffffffff trusted.afr.dirty=0x000000000000000000000000 trusted.afr.services-client-0=0x000000000000000000000000 trusted.afr.services-client-1=0x000000000000000000000000 trusted.gfid=0xf9ebed9856fb4e26987c3a890ed5203c trusted.glusterfs.dht=0x000000010000000000000000ffffffff gdb attached to mn-0 glustershd process, Thread 14 "glustershdheal" hit Breakpoint 10, __afr_selfheal_entry_prepare (frame=frame@entry=0x7f54840321e0, this=this@entry=0x7f548c016980, inode=<optimized out>, locked_on=locked_on@entry=0x7f545effc780 "\001\001\001dT\177", sources=sources@entry=0x7f545effc7c0 "", sinks=sinks@entry=0x7f545effc7b0 "", healed_sinks=<optimized out>, replies=<optimized out>, source_p=<optimized out>, pflag=<optimized out>) at afr-self-heal-entry.c:546 546 in afr-self-heal-entry.c (gdb) print heald_sinks[0] No symbol "heald_sinks" in current context. (gdb) print healed_sinks[0] value has been optimized out (gdb) print source $12 = 2 (gdb) print sinks[0] $13 = 0 '\000' (gdb) print sinks[1] $14 = 0 '\000' (gdb) print sinks[2] $15 = 0 '\000' (gdb) print locked_on[0] $16 = 1 '\001' (gdb) print locked_on[1] $17 = 1 '\001' (gdb) print locked_on[2] $18 = 1 '\001' According to the code in __afr_selfheal_entry, each time of heal , because the head_sinks is all 0 so “if (AFR_COUNT(healed_sinks, priv->child_count) == 0)” will goto unlock, and skip this round of heal, /db/upgrade will keeps showing in “volume heal info" command. Seems current gluster shd code does not handle this kind of situation, but I think if it keeps showing up, it is not very perfect. Any idea how to improve this?
REVIEW: https://review.gluster.org/23541 (cluster/afr: Heal entries when there is a source & no healed_sinks) posted (#1) for review on release-7 by Karthik U S
REVIEW: https://review.gluster.org/23541 (cluster/afr: Heal entries when there is a source & no healed_sinks) merged (#2) on release-7 by hari gowtham