Bug 765175 (GLUSTER-3443) - afr-open.c reporting warning message "None of the files are open" even though the files are already opened.
Summary: afr-open.c reporting warning message "None of the files are open" even though...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-3443
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Kaushal
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-08-19 06:06 UTC by Shwetha H Panduranga
Modified: 2011-11-09 04:50 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions: 3.2.5qa6


Attachments (Terms of Use)

Description Shwetha H Panduranga 2011-08-19 06:06:21 UTC
Found the Issue while executing the following steps:

1) Create a replicate volume (replicate_volume) with 2 bricks ("brick1" and "brick2")
2) mount the replicate_volume from client
3) execute `dd if=/dev/zero of=file1G.tmp bs=1M count=15` on mount point (creating a file on the mount point)
4) Bring down one of the bricks ("brick1" or "brick2") while "dd" in progress 
5) Bring back the brick while the "dd"in progress. 

Even though the fd's are already open, afr-open.c reports a warning "None of the files are open" which is quite misleading. 

Expected: Do not print the warning message when files are already open.

Comment 1 Kaushal 2011-10-13 03:54:38 UTC
afr has had a lot of changes since this bug was reported. This bug doesn't occur now.

Comment 2 Shwetha H Panduranga 2011-10-14 03:09:16 UTC
The Bug Still exits on 3.2.4 

Client Log:- 
------------
[2011-10-14 01:56:56.884450] W [socket.c:1494:__socket_proto_state_machine] 0-replicate_vol-client-0: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.11.111:24009)
[2011-10-14 01:56:56.884641] I [client.c:1883:client_rpc_notify] 0-replicate_vol-client-0: disconnected
[2011-10-14 01:57:10.757365] E [socket.c:1685:socket_connect_finish] 0-replicate_vol-client-0: connection to 10.1.11.111:24009 failed (Connection refused)
[2011-10-14 01:57:25.777427] I [client-handshake.c:1090:select_server_supported_programs] 0-replicate_vol-client-0: Using Program GlusterFS 3.2.4, Num (1298437), Version (310)
[2011-10-14 01:57:25.777780] I [client-handshake.c:913:client_setvolume_cbk] 0-replicate_vol-client-0: Connected to 10.1.11.111:24009, attached to remote volume '/home/export1'.
[2011-10-14 01:57:25.777815] I [client-handshake.c:779:client_post_handshake] 0-replicate_vol-client-0: 1 fds open - Delaying child_up until they are re-opened
[2011-10-14 01:57:25.778385] I [client-lk.c:617:decrement_reopen_fd_count] 0-replicate_vol-client-0: last fd open'd/lock-self-heal'd - notifying CHILD-UP

###############################################################################
[2011-10-14 01:57:25.799111] W [afr-open.c:624:afr_openfd_flush] 0-replicate_vol-replicate-0: fd not open on any subvolume 0x7f488cfd2024 (/file3G.tmp)
###############################################################################

[2011-10-14 01:57:25.799564] I [afr-open.c:435:afr_openfd_sh] 0-replicate_vol-replicate-0:  data self-heal triggered. path: /file3G.tmp, reason: Replicate up down flush, data lock is held
[2011-10-14 01:57:25.800204] I [afr-self-heal-common.c:1233:sh_missing_entries_create] 0-replicate_vol-replicate-0: no missing files - /file3G.tmp. proceeding to metadata check
[2011-10-14 01:57:25.899580] W [dict.c:418:dict_unref] (-->/opt/glusterfs/3.2.4/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7f4891997365] (-->/opt/glusterfs/3.2.4/lib64/glusterfs/3.2.4/xlator/protocol/client.so(client3_1_fstat_cbk+0x33b) [0x7f488ea5fa9b] (-->/opt/glusterfs/3.2.4/lib64/glusterfs/3.2.4/xlator/cluster/replicate.so(afr_sh_data_fstat_cbk+0x131) [0x7f488e8094b1]))) 0-dict: dict is NULL
[2011-10-14 01:57:32.348810] I [afr-self-heal-algorithm.c:520:sh_diff_loop_driver_done] 0-replicate_vol-replicate-0: diff self-heal on /file3G.tmp: completed. (1728 blocks of 3512 were different (49.20%))
[2011-10-14 01:57:32.716880] W [afr-common.c:122:afr_set_split_brain] (-->/opt/glusterfs/3.2.4/lib64/glusterfs/3.2.4/xlator/cluster/replicate.so(afr_sh_data_flush_cbk+0x79) [0x7f488e809749] (-->/opt/glusterfs/3.2.4/lib64/glusterfs/3.2.4/xlator/cluster/replicate.so(afr_sh_data_done+0x50) [0x7f488e807b20] (-->/opt/glusterfs/3.2.4/lib64/glusterfs/3.2.4/xlator/cluster/replicate.so(afr_self_heal_completion_cbk+0x271) [0x7f488e80d981]))) 0-replicate_vol-replicate-0: invalid argument: inode
[2011-10-14 01:57:32.716982] I [afr-self-heal-common.c:1557:afr_self_heal_completion_cbk] 0-replicate_vol-replicate-0: background  data data self-heal completed on /file3G.tmp

Comment 3 Shwetha H Panduranga 2011-10-21 03:15:39 UTC
Bug still exists on 3.2.5qa2

[2011-10-21 01:56:36.568442] I [client-handshake.c:913:client_setvolume_cbk] 0-replicate-client-1: Connected to 10.1.11.111:24013, attached to remote volume '/export1'.
[2011-10-21 01:56:36.568496] I [client-handshake.c:779:client_post_handshake] 0-replicate-client-1: 1 fds open - Delaying child_up until they are re-opened
[2011-10-21 01:56:36.587672] I [client-lk.c:617:decrement_reopen_fd_count] 0-replicate-client-1: last fd open'd/lock-self-heal'd - notifying CHILD-UP

#############################################################################
[2011-10-21 01:56:36.589563] W [afr-open.c:624:afr_openfd_flush] 0-replicate-replicate-0: fd not open on any subvolume 0x7fe909075120 (/file1G.tmp)

[2011-10-21 01:56:36.589651] W [afr-open.c:624:afr_openfd_flush] 0-replicate-replicate-0: fd not open on any subvolume 0x7fe909075120 (/file1G.tmp)

[2011-10-21 01:56:36.589751] W [afr-open.c:624:afr_openfd_flush] 0-replicate-replicate-0: fd not open on any subvolume 0x7fe909075120 (/file1G.tmp)
##############################################################################
[2011-10-21 01:56:36.590450] I [afr-open.c:435:afr_openfd_sh] 0-replicate-replicate-0:  data missing-entry gfid self-heal triggered. path: /file1G.tmp, reason: Replicate up down flush, data lock is held
[2011-10-21 01:56:36.591137] I [afr-self-heal-common.c:1184:sh_missing_entries_create] 0-replicate-replicate-0: no missing files - /file1G.tmp. proceeding to metadata check
[2011-10-21 01:56:36.610543] W [dict.c:418:dict_unref] (-->/opt/glusterfs/3.2.5qa2/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2) [0x7fe90da38502] (-->/opt/glusterfs/3.2.5qa2/lib64/glusterfs/3.2.5qa2/xlator/protocol/client.so(client3_1_fstat_cbk+0x2d9) [0x7fe90ab04949] (-->/opt/glusterfs/3.2.5qa2/lib64/glusterfs/3.2.5qa2/xlator/cluster/replicate.so(afr_sh_data_fstat_cbk+0x19a) [0x7fe90a8a9dca]))) 0-dict: dict is NULL
[2011-10-21 01:56:45.833143] I [afr-self-heal-algorithm.c:520:sh_diff_loop_driver_done] 0-replicate-replicate-0: diff self-heal on /file1G.tmp: completed. (1885 blocks of 3685 were different (51.15%))
[2011-10-21 01:56:49.348103] I [afr-self-heal-common.c:2044:afr_self_heal_completion_cbk] 0-replicate-replicate-0: background  data missing-entry gfid self-heal completed on /file1G.tmp

###########################################################################
[2011-10-21 01:56:49.348197] W [afr-open.c:327:afr_openfd_sh_unwind] 0-replicate-replicate-0: fd not open on any subvolume 0x7fe909075120 (/file1G.tmp)
###########################################################################

[2011-10-21 01:56:49.348478] I [afr-open.c:435:afr_openfd_sh] 0-replicate-replicate-0:  data missing-entry gfid self-heal triggered. path: /file1G.tmp, reason: Replicate up down flush, data lock is held
[2011-10-21 01:56:49.390116] I [afr-self-heal-common.c:1184:sh_missing_entries_create] 0-replicate-replicate-0: no missing files - /file1G.tmp. proceeding to metadata check
[2011-10-21 01:56:49.391195] I [afr-self-heal-common.c:2044:afr_self_heal_completion_cbk] 0-replicate-replicate-0: background  data missing-entry gfid self-heal completed on /file1G.tmp

###########################################################################
[2011-10-21 01:56:49.391229] W [afr-open.c:327:afr_openfd_sh_unwind] 0-replicate-replicate-0: fd not open on any subvolume 0x7fe909075120 (/file1G.tmp)
#############################################################################

[2011-10-21 01:56:49.391508] I [afr-open.c:435:afr_openfd_sh] 0-replicate-replicate-0:  data missing-entry gfid self-heal triggered. path: /file1G.tmp, reason: Replicate up down flush, data lock is held
[2011-10-21 01:56:49.432683] I [afr-self-heal-common.c:1184:sh_missing_entries_create] 0-replicate-replicate-0: no missing files - /file1G.tmp. proceeding to metadata check
[2011-10-21 01:56:49.440167] I [afr-self-heal-common.c:2044:afr_self_heal_completion_cbk] 0-replicate-replicate-0: background  data missing-entry gfid self-heal completed on /file1G.tmp

#############################################################################
[2011-10-21 01:56:49.440203] W [afr-open.c:327:afr_openfd_sh_unwind] 0-replicate-replicate-0: fd not open on any subvolume 0x7fe909075120 (/file1G.tmp)
##############################################################################

Comment 4 Anand Avati 2011-11-03 06:04:53 UTC
CHANGE: http://review.gluster.com/648 (Removes a wrong and unneeded log message in afr-open.c) merged in release-3.2 by Vijay Bellur (vijay)

Comment 5 Anand Avati 2011-11-04 13:55:34 UTC
CHANGE: http://review.gluster.com/674 (Currently if no open is required on any child of replicate, we are) merged in release-3.2 by Vijay Bellur (vijay)

Comment 6 Shwetha H Panduranga 2011-11-09 01:50:25 UTC
Verified on 3.2.5qa6 and it is working.


Note You need to log in before you can comment on or make changes to this bug.