Bug 765175 (GLUSTER-3443)

Summary: afr-open.c reporting warning message "None of the files are open" even though the files are already opened.
Product: [Community] GlusterFS Reporter: Shwetha H Panduranga <shwetha>
Component: replicateAssignee: Kaushal <kaushal>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: gluster-bugs, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: 3.2.5qa6 Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Shwetha H Panduranga 2011-08-19 06:06:21 UTC
Found the Issue while executing the following steps:

1) Create a replicate volume (replicate_volume) with 2 bricks ("brick1" and "brick2")
2) mount the replicate_volume from client
3) execute `dd if=/dev/zero of=file1G.tmp bs=1M count=15` on mount point (creating a file on the mount point)
4) Bring down one of the bricks ("brick1" or "brick2") while "dd" in progress 
5) Bring back the brick while the "dd"in progress. 

Even though the fd's are already open, afr-open.c reports a warning "None of the files are open" which is quite misleading. 

Expected: Do not print the warning message when files are already open.

Comment 1 Kaushal 2011-10-13 03:54:38 UTC
afr has had a lot of changes since this bug was reported. This bug doesn't occur now.

Comment 2 Shwetha H Panduranga 2011-10-14 03:09:16 UTC
The Bug Still exits on 3.2.4 

Client Log:- 
------------
[2011-10-14 01:56:56.884450] W [socket.c:1494:__socket_proto_state_machine] 0-replicate_vol-client-0: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.11.111:24009)
[2011-10-14 01:56:56.884641] I [client.c:1883:client_rpc_notify] 0-replicate_vol-client-0: disconnected
[2011-10-14 01:57:10.757365] E [socket.c:1685:socket_connect_finish] 0-replicate_vol-client-0: connection to 10.1.11.111:24009 failed (Connection refused)
[2011-10-14 01:57:25.777427] I [client-handshake.c:1090:select_server_supported_programs] 0-replicate_vol-client-0: Using Program GlusterFS 3.2.4, Num (1298437), Version (310)
[2011-10-14 01:57:25.777780] I [client-handshake.c:913:client_setvolume_cbk] 0-replicate_vol-client-0: Connected to 10.1.11.111:24009, attached to remote volume '/home/export1'.
[2011-10-14 01:57:25.777815] I [client-handshake.c:779:client_post_handshake] 0-replicate_vol-client-0: 1 fds open - Delaying child_up until they are re-opened
[2011-10-14 01:57:25.778385] I [client-lk.c:617:decrement_reopen_fd_count] 0-replicate_vol-client-0: last fd open'd/lock-self-heal'd - notifying CHILD-UP

###############################################################################
[2011-10-14 01:57:25.799111] W [afr-open.c:624:afr_openfd_flush] 0-replicate_vol-replicate-0: fd not open on any subvolume 0x7f488cfd2024 (/file3G.tmp)
###############################################################################

[2011-10-14 01:57:25.799564] I [afr-open.c:435:afr_openfd_sh] 0-replicate_vol-replicate-0:  data self-heal triggered. path: /file3G.tmp, reason: Replicate up down flush, data lock is held
[2011-10-14 01:57:25.800204] I [afr-self-heal-common.c:1233:sh_missing_entries_create] 0-replicate_vol-replicate-0: no missing files - /file3G.tmp. proceeding to metadata check
[2011-10-14 01:57:25.899580] W [dict.c:418:dict_unref] (-->/opt/glusterfs/3.2.4/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7f4891997365] (-->/opt/glusterfs/3.2.4/lib64/glusterfs/3.2.4/xlator/protocol/client.so(client3_1_fstat_cbk+0x33b) [0x7f488ea5fa9b] (-->/opt/glusterfs/3.2.4/lib64/glusterfs/3.2.4/xlator/cluster/replicate.so(afr_sh_data_fstat_cbk+0x131) [0x7f488e8094b1]))) 0-dict: dict is NULL
[2011-10-14 01:57:32.348810] I [afr-self-heal-algorithm.c:520:sh_diff_loop_driver_done] 0-replicate_vol-replicate-0: diff self-heal on /file3G.tmp: completed. (1728 blocks of 3512 were different (49.20%))
[2011-10-14 01:57:32.716880] W [afr-common.c:122:afr_set_split_brain] (-->/opt/glusterfs/3.2.4/lib64/glusterfs/3.2.4/xlator/cluster/replicate.so(afr_sh_data_flush_cbk+0x79) [0x7f488e809749] (-->/opt/glusterfs/3.2.4/lib64/glusterfs/3.2.4/xlator/cluster/replicate.so(afr_sh_data_done+0x50) [0x7f488e807b20] (-->/opt/glusterfs/3.2.4/lib64/glusterfs/3.2.4/xlator/cluster/replicate.so(afr_self_heal_completion_cbk+0x271) [0x7f488e80d981]))) 0-replicate_vol-replicate-0: invalid argument: inode
[2011-10-14 01:57:32.716982] I [afr-self-heal-common.c:1557:afr_self_heal_completion_cbk] 0-replicate_vol-replicate-0: background  data data self-heal completed on /file3G.tmp

Comment 3 Shwetha H Panduranga 2011-10-21 03:15:39 UTC
Bug still exists on 3.2.5qa2

[2011-10-21 01:56:36.568442] I [client-handshake.c:913:client_setvolume_cbk] 0-replicate-client-1: Connected to 10.1.11.111:24013, attached to remote volume '/export1'.
[2011-10-21 01:56:36.568496] I [client-handshake.c:779:client_post_handshake] 0-replicate-client-1: 1 fds open - Delaying child_up until they are re-opened
[2011-10-21 01:56:36.587672] I [client-lk.c:617:decrement_reopen_fd_count] 0-replicate-client-1: last fd open'd/lock-self-heal'd - notifying CHILD-UP

#############################################################################
[2011-10-21 01:56:36.589563] W [afr-open.c:624:afr_openfd_flush] 0-replicate-replicate-0: fd not open on any subvolume 0x7fe909075120 (/file1G.tmp)

[2011-10-21 01:56:36.589651] W [afr-open.c:624:afr_openfd_flush] 0-replicate-replicate-0: fd not open on any subvolume 0x7fe909075120 (/file1G.tmp)

[2011-10-21 01:56:36.589751] W [afr-open.c:624:afr_openfd_flush] 0-replicate-replicate-0: fd not open on any subvolume 0x7fe909075120 (/file1G.tmp)
##############################################################################
[2011-10-21 01:56:36.590450] I [afr-open.c:435:afr_openfd_sh] 0-replicate-replicate-0:  data missing-entry gfid self-heal triggered. path: /file1G.tmp, reason: Replicate up down flush, data lock is held
[2011-10-21 01:56:36.591137] I [afr-self-heal-common.c:1184:sh_missing_entries_create] 0-replicate-replicate-0: no missing files - /file1G.tmp. proceeding to metadata check
[2011-10-21 01:56:36.610543] W [dict.c:418:dict_unref] (-->/opt/glusterfs/3.2.5qa2/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2) [0x7fe90da38502] (-->/opt/glusterfs/3.2.5qa2/lib64/glusterfs/3.2.5qa2/xlator/protocol/client.so(client3_1_fstat_cbk+0x2d9) [0x7fe90ab04949] (-->/opt/glusterfs/3.2.5qa2/lib64/glusterfs/3.2.5qa2/xlator/cluster/replicate.so(afr_sh_data_fstat_cbk+0x19a) [0x7fe90a8a9dca]))) 0-dict: dict is NULL
[2011-10-21 01:56:45.833143] I [afr-self-heal-algorithm.c:520:sh_diff_loop_driver_done] 0-replicate-replicate-0: diff self-heal on /file1G.tmp: completed. (1885 blocks of 3685 were different (51.15%))
[2011-10-21 01:56:49.348103] I [afr-self-heal-common.c:2044:afr_self_heal_completion_cbk] 0-replicate-replicate-0: background  data missing-entry gfid self-heal completed on /file1G.tmp

###########################################################################
[2011-10-21 01:56:49.348197] W [afr-open.c:327:afr_openfd_sh_unwind] 0-replicate-replicate-0: fd not open on any subvolume 0x7fe909075120 (/file1G.tmp)
###########################################################################

[2011-10-21 01:56:49.348478] I [afr-open.c:435:afr_openfd_sh] 0-replicate-replicate-0:  data missing-entry gfid self-heal triggered. path: /file1G.tmp, reason: Replicate up down flush, data lock is held
[2011-10-21 01:56:49.390116] I [afr-self-heal-common.c:1184:sh_missing_entries_create] 0-replicate-replicate-0: no missing files - /file1G.tmp. proceeding to metadata check
[2011-10-21 01:56:49.391195] I [afr-self-heal-common.c:2044:afr_self_heal_completion_cbk] 0-replicate-replicate-0: background  data missing-entry gfid self-heal completed on /file1G.tmp

###########################################################################
[2011-10-21 01:56:49.391229] W [afr-open.c:327:afr_openfd_sh_unwind] 0-replicate-replicate-0: fd not open on any subvolume 0x7fe909075120 (/file1G.tmp)
#############################################################################

[2011-10-21 01:56:49.391508] I [afr-open.c:435:afr_openfd_sh] 0-replicate-replicate-0:  data missing-entry gfid self-heal triggered. path: /file1G.tmp, reason: Replicate up down flush, data lock is held
[2011-10-21 01:56:49.432683] I [afr-self-heal-common.c:1184:sh_missing_entries_create] 0-replicate-replicate-0: no missing files - /file1G.tmp. proceeding to metadata check
[2011-10-21 01:56:49.440167] I [afr-self-heal-common.c:2044:afr_self_heal_completion_cbk] 0-replicate-replicate-0: background  data missing-entry gfid self-heal completed on /file1G.tmp

#############################################################################
[2011-10-21 01:56:49.440203] W [afr-open.c:327:afr_openfd_sh_unwind] 0-replicate-replicate-0: fd not open on any subvolume 0x7fe909075120 (/file1G.tmp)
##############################################################################

Comment 4 Anand Avati 2011-11-03 06:04:53 UTC
CHANGE: http://review.gluster.com/648 (Removes a wrong and unneeded log message in afr-open.c) merged in release-3.2 by Vijay Bellur (vijay)

Comment 5 Anand Avati 2011-11-04 13:55:34 UTC
CHANGE: http://review.gluster.com/674 (Currently if no open is required on any child of replicate, we are) merged in release-3.2 by Vijay Bellur (vijay)

Comment 6 Shwetha H Panduranga 2011-11-09 01:50:25 UTC
Verified on 3.2.5qa6 and it is working.