Bug 763637 (GLUSTER-1905)
Summary: | Mounting volume is not working when any one server is down | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Dhandapani <dgopal> |
Component: | glusterd | Assignee: | Amar Tumballi <amarts> |
Status: | CLOSED DUPLICATE | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | pre-2.0 | CC: | amarts, gluster-bugs, lakshmipathi, vijay, vraman |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | Type: | --- | |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Dhandapani
2010-10-11 11:07:54 UTC
the client seems to be retrying to query the portmap of the downed server every few seconds causing the mount op to hang. (gdb) bt #0 client_query_portmap_cbk (req=0x7faecf6d9280, iov=0x7faecf6d92c0, count=1, myframe=0x7faed210f0c4) at client-handshake.c:746 #1 0x00007faed377b6f8 in rpc_clnt_handle_reply (clnt=0xccca58, pollin=0xcb8238) at rpc-clnt.c:752 #2 0x00007faed377ba57 in rpc_clnt_notify (trans=0xcccc78, mydata=0xccca88, event=RPC_TRANSPORT_MSG_RECEIVED, data=0xcb8238) at rpc-clnt.c:865 #3 0x00007faed3778e68 in rpc_transport_notify (this=0xcccc78, event=RPC_TRANSPORT_MSG_RECEIVED, data=0xcb8238) at rpc-transport.c:1142 #4 0x00007faed11fae90 in socket_event_poll_in (this=0xcccc78) at socket.c:1619 #5 0x00007faed11fb243 in socket_event_handler (fd=10, idx=3, data=0xcccc78, poll_in=1, poll_out=0, poll_err=0) at socket.c:1733 #6 0x00007faed39ccc2b in event_dispatch_epoll_handler (event_pool=0xcb2b48, events=0xcb7538, i=0) at event.c:812 #7 0x00007faed39cce3b in event_dispatch_epoll (event_pool=0xcb2b48) at event.c:876 #8 0x00007faed39cd1a3 in event_dispatch (event_pool=0xcb2b48) at event.c:984 #9 0x00000000004066fc in main (argc=5, argv=0x7fff11118db8) at glusterfsd.c:1410 the error msgs: [2010-10-14 14:47:48.563727] E [client-handshake.c:773:client_query_portmap_cbk] new-client-2: failed to get the port number for remote subvolume [2010-10-14 14:47:52.556309] E [client-handshake.c:773:client_query_portmap_cbk] new-client-2: failed to get the port number for remote subvolume [2010-10-14 14:47:52.556855] I [client-handshake.c:699:select_server_supported_programs] new-client-1: Using Program GlusterFS-3.1.0, Num (1298437), Version (310) [2010-10-14 14:47:52.557519] I [client-handshake.c:535:client_setvolume_cbk] new-client-1: Connected to 127.0.1.1:24018, attached to remote volume '/export/dir2'. [2010-10-14 14:47:52.557772] I [client-handshake.c:699:select_server_supported_programs] new-client-0: Using Program GlusterFS-3.1.0, Num (1298437), Version (310) [2010-10-14 14:47:52.558290] I [client-handshake.c:535:client_setvolume_cbk] new-client-0: Connected to 127.0.1.1:24017, attached to remote volume '/export/dir1'. [2010-10-14 14:47:55.558037] E [client-handshake.c:773:client_query_portmap_cbk] new-client-2: failed to get the port number for remote subvolume [2010-10-14 14:47:58.559514] E [client-handshake.c:773:client_query_portmap_cbk] new-client-2: failed to get the port number for remote subvolume [2010-10-14 14:48:02.560978] E [client-handshake.c:773:client_query_portmap_cbk] new-client-2: failed to get the port number for remote subvolume [2010-10-14 14:48:06.562368] E [client-handshake.c:773:client_query_portmap_cbk] new-client-2: failed to get the port number for remote subvolume [2010-10-14 14:48:10.563552] E [client-handshake.c:773:client_query_portmap_cbk] new-client-2: failed to get the port number for remote subvolume [2010-10-14 14:48:14.565056] E [client-handshake.c:773:client_query_portmap_cbk] new-client-2: failed to get the port number for remote subvolume [2010-10-14 14:48:18.566797] E [client-handshake.c:773:client_query_portmap_cbk] new-client-2: failed to get the port number for remote subvolume [2010-10-14 14:48:22.568069] E [client-handshake.c:773:client_query_portmap_cbk] new-client-2: failed to get the port number for remote subvolume [2010-10-14 14:48:26.569469] E [client-handshake.c:773:client_query_portmap_cbk] new-client-2: failed to get the port number for remote subvolume [2010-10-14 14:48:30.570912] E [client-handshake.c:773:client_query_portmap_cbk] new-client-2: failed to get the port number for remote subvolume PATCH: http://patches.gluster.com/patch/5713 in master (protocol/client: skip notify if query portmap is successful) testing with 3.1.1qa9 ,created dht volume with 4 bricks and mount it to client. now stopped the volume and rebooted brick2. while trying to start again,its not working. brick1# gluster volume start qa9 brick1#ps aux | grep gluste root 21893 0.1 0.2 68252 17344 ? Ssl 03:57 0:00 glusterd #gluster volume info Volume Name: qa9 Type: Distribute Status: Stopped Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: 10.209.59.112:/mnt/310 Brick2: 10.208.47.224:/mnt/310 Brick3: 10.209.163.191:/mnt/310 Brick4: 10.208.186.47:/mnt/310 #gluster volume start qa9 force Starting volume qa9 has been unsuccessful log file ------ [2010-11-21 03:58:39.164973] I [glusterd-handler.c:936:glusterd_handle_cli_start_volume] glusterd: Received start vol reqfor volume qa9 [2010-11-21 03:58:39.165028] I [glusterd-utils.c:232:glusterd_lock] glusterd: Cluster lock held by e8f5aa99-84d8-4bdb-be10-f79ce4e2734c [2010-11-21 03:58:39.165044] I [glusterd-handler.c:2835:glusterd_op_txn_begin] glusterd: Acquired local lock [2010-11-21 03:58:39.165149] I [glusterd3_1-mops.c:1091:glusterd3_1_cluster_lock] glusterd: Sent lock req to 3 peers [2010-11-21 03:58:39.165678] I [glusterd3_1-mops.c:395:glusterd3_1_cluster_lock_cbk] glusterd: Received ACC from uuid: 3588f8a7-244e-4b2e-890f-ddfed8a9bf84 [2010-11-21 03:58:39.165703] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:58:39.165743] I [glusterd3_1-mops.c:395:glusterd3_1_cluster_lock_cbk] glusterd: Received ACC from uuid: 049df669-aca0-4182-9de7-10afc1bc1122 [2010-11-21 03:58:39.165755] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:58:39.165776] I [glusterd3_1-mops.c:395:glusterd3_1_cluster_lock_cbk] glusterd: Received ACC from uuid: 3f484f92-f435-4bda-9179-be1f1d68ea41 [2010-11-21 03:58:39.165788] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:58:39.166768] I [glusterd-utils.c:2101:glusterd_friend_find_by_hostname] glusterd: Friend 10.208.47.224 found.. state: 3 [2010-11-21 03:58:39.166787] I [glusterd-utils.c:2101:glusterd_friend_find_by_hostname] glusterd: Friend 10.209.163.191 found.. state: 3 [2010-11-21 03:58:39.166800] I [glusterd-utils.c:2101:glusterd_friend_find_by_hostname] glusterd: Friend 10.208.186.47 found.. state: 3 [2010-11-21 03:58:39.166855] I [glusterd3_1-mops.c:1233:glusterd3_1_stage_op] glusterd: Sent op req to 3 peers [2010-11-21 03:58:39.168007] I [glusterd3_1-mops.c:594:glusterd3_1_stage_op_cbk] glusterd: Received ACC from uuid: 049df669-aca0-4182-9de7-10afc1bc1122 [2010-11-21 03:58:39.168025] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:58:39.168057] I [glusterd3_1-mops.c:594:glusterd3_1_stage_op_cbk] glusterd: Received ACC from uuid: 3f484f92-f435-4bda-9179-be1f1d68ea41 [2010-11-21 03:58:39.168072] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:58:39.168234] I [glusterd3_1-mops.c:594:glusterd3_1_stage_op_cbk] glusterd: Received ACC from uuid: 3588f8a7-244e-4b2e-890f-ddfed8a9bf84 [2010-11-21 03:58:39.168250] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:58:39.168323] I [glusterd-utils.c:971:glusterd_volume_start_glusterfs] : About to start glusterfs for brick 10.209.59.112:/mnt/310 [2010-11-21 03:58:39.286443] I [glusterd3_1-mops.c:1323:glusterd3_1_commit_op] glusterd: Sent op req to 3 peers [2010-11-21 03:58:39.288953] I [glusterd-pmap.c:237:pmap_registry_bind] pmap: adding brick /mnt/310 on port 24010 [2010-11-21 03:58:39.417515] I [glusterd3_1-mops.c:717:glusterd3_1_commit_op_cbk] glusterd: Received ACC from uuid: 3588f8a7-244e-4b2e-890f-ddfed8a9bf84 [2010-11-21 03:58:39.417539] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:58:39.426714] I [glusterd3_1-mops.c:717:glusterd3_1_commit_op_cbk] glusterd: Received ACC from uuid: 049df669-aca0-4182-9de7-10afc1bc1122 [2010-11-21 03:58:39.426736] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:58:39.443323] I [glusterd3_1-mops.c:717:glusterd3_1_commit_op_cbk] glusterd: Received ACC from uuid: 3f484f92-f435-4bda-9179-be1f1d68ea41 [2010-11-21 03:58:39.443367] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:58:39.443433] I [glusterd3_1-mops.c:1145:glusterd3_1_cluster_unlock] glusterd: Sent unlock req to 3 peers [2010-11-21 03:58:39.443712] I [glusterd3_1-mops.c:456:glusterd3_1_cluster_unlock_cbk] glusterd: Received ACC from uuid: 3f484f92-f435-4bda-9179-be1f1d68ea41 [2010-11-21 03:58:39.443730] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:58:39.444085] I [glusterd3_1-mops.c:456:glusterd3_1_cluster_unlock_cbk] glusterd: Received ACC from uuid: 049df669-aca0-4182-9de7-10afc1bc1122 [2010-11-21 03:58:39.444103] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:58:39.444132] I [glusterd3_1-mops.c:456:glusterd3_1_cluster_unlock_cbk] glusterd: Received ACC from uuid: 3588f8a7-244e-4b2e-890f-ddfed8a9bf84 [2010-11-21 03:58:39.444148] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:58:39.444167] I [glusterd-op-sm.c:4736:glusterd_op_txn_complete] glusterd: Cleared local lock [2010-11-21 03:59:55.232350] I [glusterd-handler.c:965:glusterd_handle_cli_stop_volume] glusterd: Received stop vol reqfor volume qa9 [2010-11-21 03:59:55.232411] I [glusterd-utils.c:232:glusterd_lock] glusterd: Cluster lock held by e8f5aa99-84d8-4bdb-be10-f79ce4e2734c [2010-11-21 03:59:55.232425] I [glusterd-handler.c:2835:glusterd_op_txn_begin] glusterd: Acquired local lock [2010-11-21 03:59:55.232522] I [glusterd3_1-mops.c:1091:glusterd3_1_cluster_lock] glusterd: Sent lock req to 3 peers [2010-11-21 03:59:55.233061] I [glusterd3_1-mops.c:395:glusterd3_1_cluster_lock_cbk] glusterd: Received ACC from uuid: 049df669-aca0-4182-9de7-10afc1bc1122 [2010-11-21 03:59:55.233084] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:59:55.233217] I [glusterd3_1-mops.c:395:glusterd3_1_cluster_lock_cbk] glusterd: Received ACC from uuid: 3f484f92-f435-4bda-9179-be1f1d68ea41 [2010-11-21 03:59:55.233239] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:59:55.233261] I [glusterd3_1-mops.c:395:glusterd3_1_cluster_lock_cbk] glusterd: Received ACC from uuid: 3588f8a7-244e-4b2e-890f-ddfed8a9bf84 [2010-11-21 03:59:55.233273] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:59:55.233343] I [glusterd3_1-mops.c:1233:glusterd3_1_stage_op] glusterd: Sent op req to 3 peers [2010-11-21 03:59:55.233706] I [glusterd3_1-mops.c:594:glusterd3_1_stage_op_cbk] glusterd: Received ACC from uuid: 049df669-aca0-4182-9de7-10afc1bc1122 [2010-11-21 03:59:55.233724] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:59:55.233755] I [glusterd3_1-mops.c:594:glusterd3_1_stage_op_cbk] glusterd: Received ACC from uuid: 3588f8a7-244e-4b2e-890f-ddfed8a9bf84 [2010-11-21 03:59:55.233771] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:59:55.233813] I [glusterd3_1-mops.c:594:glusterd3_1_stage_op_cbk] glusterd: Received ACC from uuid: 3f484f92-f435-4bda-9179-be1f1d68ea41 [2010-11-21 03:59:55.233825] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:59:55.233842] I [glusterd-utils.c:2219:glusterd_brick_stop] : About to stop glusterfs for brick 10.209.59.112:/mnt/310 [2010-11-21 03:59:55.233912] I [glusterd-utils.c:854:glusterd_service_stop] : Stopping gluster brick running in pid: 21937 [2010-11-21 03:59:55.242369] I [glusterd-utils.c:854:glusterd_service_stop] : Stopping gluster nfsd running in pid: 21942 [2010-11-21 03:59:56.248626] I [glusterd3_1-mops.c:1323:glusterd3_1_commit_op] glusterd: Sent op req to 3 peers [2010-11-21 03:59:56.248780] I [glusterd-pmap.c:281:pmap_registry_remove] pmap: removing brick /mnt/310 on port 24010 [2010-11-21 03:59:57.257197] I [glusterd3_1-mops.c:717:glusterd3_1_commit_op_cbk] glusterd: Received ACC from uuid: 049df669-aca0-4182-9de7-10afc1bc1122 [2010-11-21 03:59:57.257227] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:59:57.258344] I [glusterd3_1-mops.c:717:glusterd3_1_commit_op_cbk] glusterd: Received ACC from uuid: 3f484f92-f435-4bda-9179-be1f1d68ea41 [2010-11-21 03:59:57.258362] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:59:57.262791] I [glusterd3_1-mops.c:717:glusterd3_1_commit_op_cbk] glusterd: Received ACC from uuid: 3588f8a7-244e-4b2e-890f-ddfed8a9bf84 [2010-11-21 03:59:57.262812] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:59:57.262876] I [glusterd3_1-mops.c:1145:glusterd3_1_cluster_unlock] glusterd: Sent unlock req to 3 peers [2010-11-21 03:59:57.263214] I [glusterd3_1-mops.c:456:glusterd3_1_cluster_unlock_cbk] glusterd: Received ACC from uuid: 049df669-aca0-4182-9de7-10afc1bc1122 [2010-11-21 03:59:57.263232] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:59:57.263302] I [glusterd3_1-mops.c:456:glusterd3_1_cluster_unlock_cbk] glusterd: Received ACC from uuid: 3588f8a7-244e-4b2e-890f-ddfed8a9bf84 [2010-11-21 03:59:57.263315] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:59:57.263402] I [glusterd3_1-mops.c:456:glusterd3_1_cluster_unlock_cbk] glusterd: Received ACC from uuid: 3f484f92-f435-4bda-9179-be1f1d68ea41 [2010-11-21 03:59:57.263418] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 03:59:57.263437] I [glusterd-op-sm.c:4736:glusterd_op_txn_complete] glusterd: Cleared local lock [2010-11-21 04:00:11.54797] I [glusterd-handler.c:716:glusterd_handle_cli_get_volume] glusterd: Received get vol req [2010-11-21 04:00:11.55503] I [glusterd-handler.c:716:glusterd_handle_cli_get_volume] glusterd: Received get vol req [2010-11-21 04:00:20.628679] I [glusterd-handler.c:936:glusterd_handle_cli_start_volume] glusterd: Received start vol reqfor volume qa9 [2010-11-21 04:00:20.628732] I [glusterd-utils.c:232:glusterd_lock] glusterd: Cluster lock held by e8f5aa99-84d8-4bdb-be10-f79ce4e2734c [2010-11-21 04:00:20.628746] I [glusterd-handler.c:2835:glusterd_op_txn_begin] glusterd: Acquired local lock [2010-11-21 04:00:20.628845] I [glusterd3_1-mops.c:1091:glusterd3_1_cluster_lock] glusterd: Sent lock req to 3 peers [2010-11-21 04:00:20.629769] I [glusterd3_1-mops.c:395:glusterd3_1_cluster_lock_cbk] glusterd: Received ACC from uuid: 049df669-aca0-4182-9de7-10afc1bc1122 [2010-11-21 04:00:20.629788] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 04:00:20.629812] I [glusterd3_1-mops.c:395:glusterd3_1_cluster_lock_cbk] glusterd: Received ACC from uuid: 3f484f92-f435-4bda-9179-be1f1d68ea41 [2010-11-21 04:00:20.629824] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster [2010-11-21 04:00:48.569291] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x2af151fd8579] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x2af151fd7d2e] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x2af151fd7c9e]))) rpc-clnt: forced unwinding frame type(Mgmt 3.1) op(--(3)) called at 2010-11-21 04:00:20.628829 [2010-11-21 04:00:52.268747] E [socket.c:1657:socket_connect_finish] management: connection to 10.208.186.47:24007 failed (Connection refused) but if killall glusterd from brick1/brick3/brick4 and start again it worked. #gluster volume info Volume Name: qa9 Type: Distribute Status: Stopped Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: 10.209.59.112:/mnt/310 Brick2: 10.208.47.224:/mnt/310 Brick3: 10.209.163.191:/mnt/310 Brick4: 10.208.186.47:/mnt/310 [root@epsilon ~] gluster volume start qa9 Starting volume qa9 has been successful [root@epsilon ~] gluster volume stop qa9 Stopping volume will make its data inaccessible. Do you want to Continue? (y/n) y Stopping volume qa9 has been successful here--> rebooted brick2 [root@epsilon ~] gluster volume start qa9 [root@epsilon ~] gluster volume start qa9 force Starting volume qa9 has been unsuccessful kill glusterd from all remaining bricks 1,3,4 [root@epsilon ~] killall glusterd and start again - [root@epsilon ~] glusterd [root@epsilon ~] gluster volume start qa9 Starting volume qa9 has been successful [root@epsilon ~] gluster volume info Volume Name: qa9 Type: Distribute Status: Started Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: 10.209.59.112:/mnt/310 Brick2: 10.208.47.224:/mnt/310 Brick3: 10.209.163.191:/mnt/310 Brick4: 10.208.186.47:/mnt/310 |