Bug 1761350 - Directories are not healed, when dirs are created on the backend bricks and performed lookup from mount path.
Summary: Directories are not healed, when dirs are created on the backend bricks and p...
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 6
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-14 08:46 UTC by milind
Modified: 2020-03-12 12:16 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2020-03-12 12:16:09 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description milind 2019-10-14 08:46:40 UTC
[afr] Heal is not completed in (1*3)replicated volume after enabling client side healing options .
 Some files are always left while healing .

"metadata-self-heal": "on",
"entry-self-heal": "on",
"data-self-heal": "on"

steps:
 
        1) create replicate volume ( 1 * 3 )
        2. Test the case with default afr options.
        3. Test the case with volume option 'self-heal-daemon'
        4) create dirs on bricks from the backend. lets say dir1, dir2 and dir3
        5) From mount point,
            echo "hi" >dir1 ->must fail
            touch dir2 --> must pass
            mkdir dir3 ->must fail
        6) From mount point,
            ls -l  and find, must list both dir1 and dir2 and dir3
        7) check on all backend bricks, dir1, dir2 and dir3 should be created
        8) heal info should show zero, and also gfid and other attributes
         must exist



Actual result :

# gluster volume heal testvol_replicated info
Brick server:/bricks/brick1/testvol_replicated_brick0
Status: Connected
Number of entries: 1

Brick server2:/bricks/brick1/testvol_replicated_brick1
Status: Connected
Number of entries: 1

Brick serevr3:/bricks/brick1/testvol_replicated_brick2
Status: Connected
Number of entries: 1

Expected Result :

Brick server:/bricks/brick1/testvol_replicated_brick0
Status: Connected
Number of entries: 0

Brick server2:/bricks/brick1/testvol_replicated_brick1
Status: Connected
Number of entries: 0

Brick serevr3:/bricks/brick1/testvol_replicated_brick2
Status: Connected
Number of entries: 0



Additional info:

[2019-10-09 09:15:36.822052] W [socket.c:774:__socket_rwv] 0-testvol_replicated-client-0: readv on 10.70.35.132:49152 failed (No data available)
The message "I [MSGID: 100040] [glusterfsd-mgmt.c:106:mgmt_process_volfile] 0-glusterfs: No change in volfile, continuing" repeated 2 times between [2019-10-09 09:15:35.989751] and [2019-10-09 09:15:36.303521]
[2019-10-09 09:15:36.822109] I [MSGID: 114018] [client.c:2398:client_rpc_notify] 0-testvol_replicated-client-0: disconnected from testvol_replicated-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2019-10-09 09:15:38.859761] W [socket.c:774:__socket_rwv] 0-testvol_replicated-client-1: readv on 10.70.35.216:49152 failed (No data available)
[2019-10-09 09:15:38.859805] I [MSGID: 114018] [client.c:2398:client_rpc_notify] 0-testvol_replicated-client-1: disconnected from testvol_replicated-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2019-10-09 09:15:38.859834] W [MSGID: 108001] [afr-common.c:5653:afr_notify] 0-testvol_replicated-replicate-0: Client-quorum is not met
[2019-10-09 09:15:38.860994] W [socket.c:774:__socket_rwv] 0-testvol_replicated-client-2: readv on 10.70.35.80:49152 failed (No data available)
[2019-10-09 09:15:38.861025] I [MSGID: 114018] [client.c:2398:client_rpc_notify] 0-testvol_replicated-client-2: disconnected from testvol_replicated-client-2. Client process will keep trying to connect to glusterd until brick's port is available
[2019-10-09 09:15:38.861046] E [MSGID: 108006] [afr-common.c:5357:__afr_handle_child_down_event] 0-testvol_replicated-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2019-10-09 09:15:39.827168] E [MSGID: 114058] [client-handshake.c:1268:client_query_portmap_cbk] 0-testvol_replicated-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2019-10-09 09:15:39.827274] I [MSGID: 114018] [client.c:2398:client_rpc_notify] 0-testvol_replicated-client-0: disconnected from testvol_replicated-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2019-10-09 09:15:39.881864] W [glusterfsd.c:1645:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fd028181dd5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x56243aebc805] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x56243aebc66b] ) 0-: received signum (15), shutting down



######################### latest logs
[2019-10-11 11:25:29.749047] I [rpc-clnt.c:1967:rpc_clnt_reconfig] 5-testvol_replicated-client-1: changing port to 49155 (from 0)
[2019-10-11 11:25:29.754160] I [rpc-clnt.c:1967:rpc_clnt_reconfig] 5-testvol_replicated-client-2: changing port to 49155 (from 0)
[2019-10-11 11:25:29.754806] I [MSGID: 114057] [client-handshake.c:1188:select_server_supported_programs] 5-testvol_replicated-client-1: Using Program GlusterFS 4.x v1, Num (1298437), Version (400) 
x[2019-10-11 11:25:29.756036] I [MSGID: 114046] [client-handshake.c:904:client_setvolume_cbk] 5-testvol_replicated-client-1: Connected to testvol_replicated-client-1, attached to remote volume '/bricks/brick1/testvol_replicated_brick1'. 
[2019-10-11 11:25:29.756076] I [MSGID: 108002] [afr-common.c:5648:afr_notify] 5-testvol_replicated-replicate-0: Client-quorum is met 
[2019-10-11 11:25:29.758143] I [MSGID: 114057] [client-handshake.c:1188:select_server_supported_programs] 5-testvol_replicated-client-2: Using Program GlusterFS 4.x v1, Num (1298437), Version (400) 
[2019-10-11 11:25:29.759918] I [MSGID: 114046] [client-handshake.c:904:client_setvolume_cbk] 5-testvol_replicated-client-2: Connected to testvol_replicated-client-2, attached to remote volume '/bricks/brick1/testvol_replicated_brick2'. 
[2019-10-11 11:25:30.778455] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 5-testvol_replicated-replicate-0: performing metadata selfheal on fb2f6540-41eb-4ed6-9fe1-e821f02bda9e 
[2019-10-11 11:25:30.793172] I [MSGID: 108026] [afr-self-heal-common.c:1750:afr_log_selfheal] 5-testvol_replicated-replicate-0: Completed metadata selfheal on fb2f6540-41eb-4ed6-9fe1-e821f02bda9e. sources=[0]  sinks=1 2  
[2019-10-11 11:25:30.797468] I [MSGID: 108026] [afr-self-heal-entry.c:916:afr_selfheal_entry_do] 5-testvol_replicated-replicate-0: performing entry selfheal on fb2f6540-41eb-4ed6-9fe1-e821f02bda9e 
[2019-10-11 11:25:30.812701] I [MSGID: 108026] [afr-self-heal-common.c:1750:afr_log_selfheal] 5-testvol_replicated-replicate-0: Completed entry selfheal on fb2f6540-41eb-4ed6-9fe1-e821f02bda9e. sources=[0]  sinks=1 2  
Ending Test: functional.afr.test_gfid_assignment_on_lookup.AssignGfidOnLookup_cplex_replicated_glusterfs.test_gfid_assignment_on_lookup : 16_55_11_10_2019
[2019-10-11 11:25:31.572199] W [socket.c:774:__socket_rwv] 5-testvol_replicated-client-0: readv on 10.70.35.132:49155 failed (No data available)
[2019-10-11 11:25:31.572250] I [MSGID: 114018] [client.c:2398:client_rpc_notify] 5-testvol_replicated-client-0: disconnected from testvol_replicated-client-0. Client process will keep trying to connect to glusterd until brick's port is available 
[2019-10-11 11:25:31.820309] W [MSGID: 114031] [client-rpc-fops_v2.c:911:client4_0_getxattr_cbk] 5-testvol_replicated-client-0: remote operation failed. [{path=/}, {gfid=00000000-0000-0000-0000-000000000001}, {key=glusterfs.xattrop_index_gfid}, {errno=107}, {error=Transport endpoint is not connected}] 
[2019-10-11 11:25:31.820350] W [MSGID: 114029] [client-rpc-fops_v2.c:4467:client4_0_getxattr] 5-testvol_replicated-client-0: failed to send the fop 
[2019-10-11 11:25:31.820366] W [MSGID: 108034] [afr-self-heald.c:463:afr_shd_index_sweep] 5-testvol_replicated-replicate-0: unable to get index-dir on testvol_replicated-client-0 
[2019-10-11 11:25:32.601159] I [MSGID: 101218] [graph.c:1522:glusterfs_process_svc_detach] 0-mgmt: detaching child shd/testvol_replicated 
[2019-10-11 11:25:32.601338] I [MSGID: 114021] [client.c:2498:notify] 5-testvol_replicated-client-0: current graph is no longer active, destroying rpc_client  
[2019-10-11 11:25:32.601377] I [MSGID: 114021] [client.c:2498:notify] 5-testvol_replicated-client-1: current graph is no longer active, destroying rpc_client  
[2019-10-11 11:25:32.601663] I [MSGID: 114018] [client.c:2398:client_rpc_notify] 5-testvol_replicated-client-1: disconnected from testvol_replicated-client-1. Client process will keep trying to connect to glusterd until brick's port is available 
[2019-10-11 11:25:32.601691] W [MSGID: 108001] [afr-common.c:5654:afr_notify] 5-testvol_replicated-replicate-0: Client-quorum is not met 
[2019-10-11 11:25:32.601600] I [MSGID: 114021] [client.c:2498:notify] 5-testvol_replicated-client-2: current graph is no longer active, destroying rpc_client  
[2019-10-11 11:25:32.602242] I [MSGID: 114018] [client.c:2398:client_rpc_notify] 5-testvol_replicated-client-2: disconnected from testvol_replicated-client-2. Client process will keep trying to connect to glusterd until brick's port is available 
[2019-10-11 11:25:32.602273] E [MSGID: 108006] [afr-common.c:5358:__afr_handle_child_down_event] 5-testvol_replicated-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up. 
[2019-10-11 11:25:32.602649] I [io-stats.c:4047:fini] 0-testvol_replicated: io-stats translator

Comment 1 Worker Ant 2020-03-12 12:16:09 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/856, and will be tracked there from now on. Visit GitHub issues URL for further details


Note You need to log in before you can comment on or make changes to this bug.