Bug 1761350

Summary: Directories are not healed, when dirs are created on the backend bricks and performed lookup from mount path.
Product: [Community] GlusterFS Reporter: milind <mwaykole>
Component: replicateAssignee: bugs <bugs>
Status: CLOSED UPSTREAM QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6CC: bugs, pasik
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-12 12:16:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description milind 2019-10-14 08:46:40 UTC
[afr] Heal is not completed in (1*3)replicated volume after enabling client side healing options .
 Some files are always left while healing .

"metadata-self-heal": "on",
"entry-self-heal": "on",
"data-self-heal": "on"

steps:
 
        1) create replicate volume ( 1 * 3 )
        2. Test the case with default afr options.
        3. Test the case with volume option 'self-heal-daemon'
        4) create dirs on bricks from the backend. lets say dir1, dir2 and dir3
        5) From mount point,
            echo "hi" >dir1 ->must fail
            touch dir2 --> must pass
            mkdir dir3 ->must fail
        6) From mount point,
            ls -l  and find, must list both dir1 and dir2 and dir3
        7) check on all backend bricks, dir1, dir2 and dir3 should be created
        8) heal info should show zero, and also gfid and other attributes
         must exist



Actual result :

# gluster volume heal testvol_replicated info
Brick server:/bricks/brick1/testvol_replicated_brick0
Status: Connected
Number of entries: 1

Brick server2:/bricks/brick1/testvol_replicated_brick1
Status: Connected
Number of entries: 1

Brick serevr3:/bricks/brick1/testvol_replicated_brick2
Status: Connected
Number of entries: 1

Expected Result :

Brick server:/bricks/brick1/testvol_replicated_brick0
Status: Connected
Number of entries: 0

Brick server2:/bricks/brick1/testvol_replicated_brick1
Status: Connected
Number of entries: 0

Brick serevr3:/bricks/brick1/testvol_replicated_brick2
Status: Connected
Number of entries: 0



Additional info:

[2019-10-09 09:15:36.822052] W [socket.c:774:__socket_rwv] 0-testvol_replicated-client-0: readv on 10.70.35.132:49152 failed (No data available)
The message "I [MSGID: 100040] [glusterfsd-mgmt.c:106:mgmt_process_volfile] 0-glusterfs: No change in volfile, continuing" repeated 2 times between [2019-10-09 09:15:35.989751] and [2019-10-09 09:15:36.303521]
[2019-10-09 09:15:36.822109] I [MSGID: 114018] [client.c:2398:client_rpc_notify] 0-testvol_replicated-client-0: disconnected from testvol_replicated-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2019-10-09 09:15:38.859761] W [socket.c:774:__socket_rwv] 0-testvol_replicated-client-1: readv on 10.70.35.216:49152 failed (No data available)
[2019-10-09 09:15:38.859805] I [MSGID: 114018] [client.c:2398:client_rpc_notify] 0-testvol_replicated-client-1: disconnected from testvol_replicated-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2019-10-09 09:15:38.859834] W [MSGID: 108001] [afr-common.c:5653:afr_notify] 0-testvol_replicated-replicate-0: Client-quorum is not met
[2019-10-09 09:15:38.860994] W [socket.c:774:__socket_rwv] 0-testvol_replicated-client-2: readv on 10.70.35.80:49152 failed (No data available)
[2019-10-09 09:15:38.861025] I [MSGID: 114018] [client.c:2398:client_rpc_notify] 0-testvol_replicated-client-2: disconnected from testvol_replicated-client-2. Client process will keep trying to connect to glusterd until brick's port is available
[2019-10-09 09:15:38.861046] E [MSGID: 108006] [afr-common.c:5357:__afr_handle_child_down_event] 0-testvol_replicated-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2019-10-09 09:15:39.827168] E [MSGID: 114058] [client-handshake.c:1268:client_query_portmap_cbk] 0-testvol_replicated-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2019-10-09 09:15:39.827274] I [MSGID: 114018] [client.c:2398:client_rpc_notify] 0-testvol_replicated-client-0: disconnected from testvol_replicated-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2019-10-09 09:15:39.881864] W [glusterfsd.c:1645:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fd028181dd5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x56243aebc805] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x56243aebc66b] ) 0-: received signum (15), shutting down



######################### latest logs
[2019-10-11 11:25:29.749047] I [rpc-clnt.c:1967:rpc_clnt_reconfig] 5-testvol_replicated-client-1: changing port to 49155 (from 0)
[2019-10-11 11:25:29.754160] I [rpc-clnt.c:1967:rpc_clnt_reconfig] 5-testvol_replicated-client-2: changing port to 49155 (from 0)
[2019-10-11 11:25:29.754806] I [MSGID: 114057] [client-handshake.c:1188:select_server_supported_programs] 5-testvol_replicated-client-1: Using Program GlusterFS 4.x v1, Num (1298437), Version (400) 
x[2019-10-11 11:25:29.756036] I [MSGID: 114046] [client-handshake.c:904:client_setvolume_cbk] 5-testvol_replicated-client-1: Connected to testvol_replicated-client-1, attached to remote volume '/bricks/brick1/testvol_replicated_brick1'. 
[2019-10-11 11:25:29.756076] I [MSGID: 108002] [afr-common.c:5648:afr_notify] 5-testvol_replicated-replicate-0: Client-quorum is met 
[2019-10-11 11:25:29.758143] I [MSGID: 114057] [client-handshake.c:1188:select_server_supported_programs] 5-testvol_replicated-client-2: Using Program GlusterFS 4.x v1, Num (1298437), Version (400) 
[2019-10-11 11:25:29.759918] I [MSGID: 114046] [client-handshake.c:904:client_setvolume_cbk] 5-testvol_replicated-client-2: Connected to testvol_replicated-client-2, attached to remote volume '/bricks/brick1/testvol_replicated_brick2'. 
[2019-10-11 11:25:30.778455] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 5-testvol_replicated-replicate-0: performing metadata selfheal on fb2f6540-41eb-4ed6-9fe1-e821f02bda9e 
[2019-10-11 11:25:30.793172] I [MSGID: 108026] [afr-self-heal-common.c:1750:afr_log_selfheal] 5-testvol_replicated-replicate-0: Completed metadata selfheal on fb2f6540-41eb-4ed6-9fe1-e821f02bda9e. sources=[0]  sinks=1 2  
[2019-10-11 11:25:30.797468] I [MSGID: 108026] [afr-self-heal-entry.c:916:afr_selfheal_entry_do] 5-testvol_replicated-replicate-0: performing entry selfheal on fb2f6540-41eb-4ed6-9fe1-e821f02bda9e 
[2019-10-11 11:25:30.812701] I [MSGID: 108026] [afr-self-heal-common.c:1750:afr_log_selfheal] 5-testvol_replicated-replicate-0: Completed entry selfheal on fb2f6540-41eb-4ed6-9fe1-e821f02bda9e. sources=[0]  sinks=1 2  
Ending Test: functional.afr.test_gfid_assignment_on_lookup.AssignGfidOnLookup_cplex_replicated_glusterfs.test_gfid_assignment_on_lookup : 16_55_11_10_2019
[2019-10-11 11:25:31.572199] W [socket.c:774:__socket_rwv] 5-testvol_replicated-client-0: readv on 10.70.35.132:49155 failed (No data available)
[2019-10-11 11:25:31.572250] I [MSGID: 114018] [client.c:2398:client_rpc_notify] 5-testvol_replicated-client-0: disconnected from testvol_replicated-client-0. Client process will keep trying to connect to glusterd until brick's port is available 
[2019-10-11 11:25:31.820309] W [MSGID: 114031] [client-rpc-fops_v2.c:911:client4_0_getxattr_cbk] 5-testvol_replicated-client-0: remote operation failed. [{path=/}, {gfid=00000000-0000-0000-0000-000000000001}, {key=glusterfs.xattrop_index_gfid}, {errno=107}, {error=Transport endpoint is not connected}] 
[2019-10-11 11:25:31.820350] W [MSGID: 114029] [client-rpc-fops_v2.c:4467:client4_0_getxattr] 5-testvol_replicated-client-0: failed to send the fop 
[2019-10-11 11:25:31.820366] W [MSGID: 108034] [afr-self-heald.c:463:afr_shd_index_sweep] 5-testvol_replicated-replicate-0: unable to get index-dir on testvol_replicated-client-0 
[2019-10-11 11:25:32.601159] I [MSGID: 101218] [graph.c:1522:glusterfs_process_svc_detach] 0-mgmt: detaching child shd/testvol_replicated 
[2019-10-11 11:25:32.601338] I [MSGID: 114021] [client.c:2498:notify] 5-testvol_replicated-client-0: current graph is no longer active, destroying rpc_client  
[2019-10-11 11:25:32.601377] I [MSGID: 114021] [client.c:2498:notify] 5-testvol_replicated-client-1: current graph is no longer active, destroying rpc_client  
[2019-10-11 11:25:32.601663] I [MSGID: 114018] [client.c:2398:client_rpc_notify] 5-testvol_replicated-client-1: disconnected from testvol_replicated-client-1. Client process will keep trying to connect to glusterd until brick's port is available 
[2019-10-11 11:25:32.601691] W [MSGID: 108001] [afr-common.c:5654:afr_notify] 5-testvol_replicated-replicate-0: Client-quorum is not met 
[2019-10-11 11:25:32.601600] I [MSGID: 114021] [client.c:2498:notify] 5-testvol_replicated-client-2: current graph is no longer active, destroying rpc_client  
[2019-10-11 11:25:32.602242] I [MSGID: 114018] [client.c:2398:client_rpc_notify] 5-testvol_replicated-client-2: disconnected from testvol_replicated-client-2. Client process will keep trying to connect to glusterd until brick's port is available 
[2019-10-11 11:25:32.602273] E [MSGID: 108006] [afr-common.c:5358:__afr_handle_child_down_event] 5-testvol_replicated-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up. 
[2019-10-11 11:25:32.602649] I [io-stats.c:4047:fini] 0-testvol_replicated: io-stats translator

Comment 1 Worker Ant 2020-03-12 12:16:09 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/856, and will be tracked there from now on. Visit GitHub issues URL for further details