| Summary: | afr-open.c reporting warning message "None of the files are open" even though the files are already opened. | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Shwetha H Panduranga <shwetha> |
| Component: | replicate | Assignee: | Kaushal <kaushal> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | mainline | CC: | gluster-bugs, vijay |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | 3.2.5qa6 | Category: | --- |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
afr has had a lot of changes since this bug was reported. This bug doesn't occur now. The Bug Still exits on 3.2.4 Client Log:- ------------ [2011-10-14 01:56:56.884450] W [socket.c:1494:__socket_proto_state_machine] 0-replicate_vol-client-0: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.11.111:24009) [2011-10-14 01:56:56.884641] I [client.c:1883:client_rpc_notify] 0-replicate_vol-client-0: disconnected [2011-10-14 01:57:10.757365] E [socket.c:1685:socket_connect_finish] 0-replicate_vol-client-0: connection to 10.1.11.111:24009 failed (Connection refused) [2011-10-14 01:57:25.777427] I [client-handshake.c:1090:select_server_supported_programs] 0-replicate_vol-client-0: Using Program GlusterFS 3.2.4, Num (1298437), Version (310) [2011-10-14 01:57:25.777780] I [client-handshake.c:913:client_setvolume_cbk] 0-replicate_vol-client-0: Connected to 10.1.11.111:24009, attached to remote volume '/home/export1'. [2011-10-14 01:57:25.777815] I [client-handshake.c:779:client_post_handshake] 0-replicate_vol-client-0: 1 fds open - Delaying child_up until they are re-opened [2011-10-14 01:57:25.778385] I [client-lk.c:617:decrement_reopen_fd_count] 0-replicate_vol-client-0: last fd open'd/lock-self-heal'd - notifying CHILD-UP ############################################################################### [2011-10-14 01:57:25.799111] W [afr-open.c:624:afr_openfd_flush] 0-replicate_vol-replicate-0: fd not open on any subvolume 0x7f488cfd2024 (/file3G.tmp) ############################################################################### [2011-10-14 01:57:25.799564] I [afr-open.c:435:afr_openfd_sh] 0-replicate_vol-replicate-0: data self-heal triggered. path: /file3G.tmp, reason: Replicate up down flush, data lock is held [2011-10-14 01:57:25.800204] I [afr-self-heal-common.c:1233:sh_missing_entries_create] 0-replicate_vol-replicate-0: no missing files - /file3G.tmp. proceeding to metadata check [2011-10-14 01:57:25.899580] W [dict.c:418:dict_unref] (-->/opt/glusterfs/3.2.4/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7f4891997365] (-->/opt/glusterfs/3.2.4/lib64/glusterfs/3.2.4/xlator/protocol/client.so(client3_1_fstat_cbk+0x33b) [0x7f488ea5fa9b] (-->/opt/glusterfs/3.2.4/lib64/glusterfs/3.2.4/xlator/cluster/replicate.so(afr_sh_data_fstat_cbk+0x131) [0x7f488e8094b1]))) 0-dict: dict is NULL [2011-10-14 01:57:32.348810] I [afr-self-heal-algorithm.c:520:sh_diff_loop_driver_done] 0-replicate_vol-replicate-0: diff self-heal on /file3G.tmp: completed. (1728 blocks of 3512 were different (49.20%)) [2011-10-14 01:57:32.716880] W [afr-common.c:122:afr_set_split_brain] (-->/opt/glusterfs/3.2.4/lib64/glusterfs/3.2.4/xlator/cluster/replicate.so(afr_sh_data_flush_cbk+0x79) [0x7f488e809749] (-->/opt/glusterfs/3.2.4/lib64/glusterfs/3.2.4/xlator/cluster/replicate.so(afr_sh_data_done+0x50) [0x7f488e807b20] (-->/opt/glusterfs/3.2.4/lib64/glusterfs/3.2.4/xlator/cluster/replicate.so(afr_self_heal_completion_cbk+0x271) [0x7f488e80d981]))) 0-replicate_vol-replicate-0: invalid argument: inode [2011-10-14 01:57:32.716982] I [afr-self-heal-common.c:1557:afr_self_heal_completion_cbk] 0-replicate_vol-replicate-0: background data data self-heal completed on /file3G.tmp Bug still exists on 3.2.5qa2 [2011-10-21 01:56:36.568442] I [client-handshake.c:913:client_setvolume_cbk] 0-replicate-client-1: Connected to 10.1.11.111:24013, attached to remote volume '/export1'. [2011-10-21 01:56:36.568496] I [client-handshake.c:779:client_post_handshake] 0-replicate-client-1: 1 fds open - Delaying child_up until they are re-opened [2011-10-21 01:56:36.587672] I [client-lk.c:617:decrement_reopen_fd_count] 0-replicate-client-1: last fd open'd/lock-self-heal'd - notifying CHILD-UP ############################################################################# [2011-10-21 01:56:36.589563] W [afr-open.c:624:afr_openfd_flush] 0-replicate-replicate-0: fd not open on any subvolume 0x7fe909075120 (/file1G.tmp) [2011-10-21 01:56:36.589651] W [afr-open.c:624:afr_openfd_flush] 0-replicate-replicate-0: fd not open on any subvolume 0x7fe909075120 (/file1G.tmp) [2011-10-21 01:56:36.589751] W [afr-open.c:624:afr_openfd_flush] 0-replicate-replicate-0: fd not open on any subvolume 0x7fe909075120 (/file1G.tmp) ############################################################################## [2011-10-21 01:56:36.590450] I [afr-open.c:435:afr_openfd_sh] 0-replicate-replicate-0: data missing-entry gfid self-heal triggered. path: /file1G.tmp, reason: Replicate up down flush, data lock is held [2011-10-21 01:56:36.591137] I [afr-self-heal-common.c:1184:sh_missing_entries_create] 0-replicate-replicate-0: no missing files - /file1G.tmp. proceeding to metadata check [2011-10-21 01:56:36.610543] W [dict.c:418:dict_unref] (-->/opt/glusterfs/3.2.5qa2/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2) [0x7fe90da38502] (-->/opt/glusterfs/3.2.5qa2/lib64/glusterfs/3.2.5qa2/xlator/protocol/client.so(client3_1_fstat_cbk+0x2d9) [0x7fe90ab04949] (-->/opt/glusterfs/3.2.5qa2/lib64/glusterfs/3.2.5qa2/xlator/cluster/replicate.so(afr_sh_data_fstat_cbk+0x19a) [0x7fe90a8a9dca]))) 0-dict: dict is NULL [2011-10-21 01:56:45.833143] I [afr-self-heal-algorithm.c:520:sh_diff_loop_driver_done] 0-replicate-replicate-0: diff self-heal on /file1G.tmp: completed. (1885 blocks of 3685 were different (51.15%)) [2011-10-21 01:56:49.348103] I [afr-self-heal-common.c:2044:afr_self_heal_completion_cbk] 0-replicate-replicate-0: background data missing-entry gfid self-heal completed on /file1G.tmp ########################################################################### [2011-10-21 01:56:49.348197] W [afr-open.c:327:afr_openfd_sh_unwind] 0-replicate-replicate-0: fd not open on any subvolume 0x7fe909075120 (/file1G.tmp) ########################################################################### [2011-10-21 01:56:49.348478] I [afr-open.c:435:afr_openfd_sh] 0-replicate-replicate-0: data missing-entry gfid self-heal triggered. path: /file1G.tmp, reason: Replicate up down flush, data lock is held [2011-10-21 01:56:49.390116] I [afr-self-heal-common.c:1184:sh_missing_entries_create] 0-replicate-replicate-0: no missing files - /file1G.tmp. proceeding to metadata check [2011-10-21 01:56:49.391195] I [afr-self-heal-common.c:2044:afr_self_heal_completion_cbk] 0-replicate-replicate-0: background data missing-entry gfid self-heal completed on /file1G.tmp ########################################################################### [2011-10-21 01:56:49.391229] W [afr-open.c:327:afr_openfd_sh_unwind] 0-replicate-replicate-0: fd not open on any subvolume 0x7fe909075120 (/file1G.tmp) ############################################################################# [2011-10-21 01:56:49.391508] I [afr-open.c:435:afr_openfd_sh] 0-replicate-replicate-0: data missing-entry gfid self-heal triggered. path: /file1G.tmp, reason: Replicate up down flush, data lock is held [2011-10-21 01:56:49.432683] I [afr-self-heal-common.c:1184:sh_missing_entries_create] 0-replicate-replicate-0: no missing files - /file1G.tmp. proceeding to metadata check [2011-10-21 01:56:49.440167] I [afr-self-heal-common.c:2044:afr_self_heal_completion_cbk] 0-replicate-replicate-0: background data missing-entry gfid self-heal completed on /file1G.tmp ############################################################################# [2011-10-21 01:56:49.440203] W [afr-open.c:327:afr_openfd_sh_unwind] 0-replicate-replicate-0: fd not open on any subvolume 0x7fe909075120 (/file1G.tmp) ############################################################################## CHANGE: http://review.gluster.com/648 (Removes a wrong and unneeded log message in afr-open.c) merged in release-3.2 by Vijay Bellur (vijay) CHANGE: http://review.gluster.com/674 (Currently if no open is required on any child of replicate, we are) merged in release-3.2 by Vijay Bellur (vijay) Verified on 3.2.5qa6 and it is working. |
Found the Issue while executing the following steps: 1) Create a replicate volume (replicate_volume) with 2 bricks ("brick1" and "brick2") 2) mount the replicate_volume from client 3) execute `dd if=/dev/zero of=file1G.tmp bs=1M count=15` on mount point (creating a file on the mount point) 4) Bring down one of the bricks ("brick1" or "brick2") while "dd" in progress 5) Bring back the brick while the "dd"in progress. Even though the fd's are already open, afr-open.c reports a warning "None of the files are open" which is quite misleading. Expected: Do not print the warning message when files are already open.