Bug 1065623 - "Gluster volume status" command doesn't return to prompt if peer netwok is down
Summary: "Gluster volume status" command doesn't return to prompt if peer netwok is down
Keywords:
Status: CLOSED DUPLICATE of bug 1038261
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: pre-release
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
Assignee: krishnan parthasarathi
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-02-15 08:13 UTC by Sankar Ramalingam
Modified: 2015-11-03 23:06 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-02-17 08:49:17 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Sankar Ramalingam 2014-02-15 08:13:44 UTC
Description of problem: Gluster volume status fails to return to the prompt and waits forever. Even though, I try to kill the command, it keeps running in the background. When I run for the second time, it shows...

 gluster volume status
Another transaction is in progress. Please try again after sometime.
 

Version-Release number of selected component (if applicable): glusterfs-server-3.5.0-0.5.beta3.fc20.x86_64


How reproducible: Consistently


Steps to Reproduce:
1. Create xfs volumes in two peers. brick1 on Server1. brick2 on Server2
2. Make sure the peers are readhable. Run "gluster peer probe Server2" on Server1. Run "gluster peer probe Server1" on Server2.
3. Setup gluserFS volume. gluster volume create gv0 replica 2 192.168.0.71:/brick1/Share/ 192.168.0.72:/brick2/Share
4. Start gluster volume. gluster volume start gv0
5. Run "gluster volume status" to check the staus. It shows the right information and the command exits.
6. Stop the network on Server1. service network stop
7. RUn "gluster volume status" to check the staus. The command hangs and never returns to the prompt.

Actual results: If the network is down, the "gluster volume status" doesn't print the right information.

 
Expected results: The command should exit and print that the network is not reachable.


Additional info:

When trying to kill the command, it still runs the background.

[root@desktop72 mnt]# gluster volume status
Another transaction is in progress. Please try again after sometime.

Comment 1 Sankar Ramalingam 2014-02-15 10:46:10 UTC
Few lines from logs...

[root@desktop72 ~]# tail -f /var/log/glusterfs/glustershd.log 
[2014-02-15 09:42:16.363072] I [client-handshake.c:450:client_set_lk_version_cbk] 0-gv0-client-0: Server lk version = 1
[2014-02-15 09:42:16.365709] E [afr-self-heald.c:1189:afr_crawl_build_start_loc] 0-gv0-replicate-0: lookup failed on index dir on gv0-client-1 - (Stale file handle)
[2014-02-15 09:52:12.453880] E [afr-self-heald.c:1189:afr_crawl_build_start_loc] 0-gv0-replicate-0: lookup failed on index dir on gv0-client-1 - (Stale file handle)
[2014-02-15 10:37:59.740454] W [socket.c:522:__socket_rwv] 0-gv0-client-0: readv on 192.168.0.71:49153 failed (Connection timed out)
[2014-02-15 10:37:59.740526] I [client.c:2208:client_rpc_notify] 0-gv0-client-0: disconnected from 192.168.0.71:49153. Client process will keep trying to connect to glusterd until brick's port is available
[2014-02-15 10:38:11.876606] E [socket.c:2161:socket_connect_finish] 0-gv0-client-0: connection to 192.168.0.71:24007 failed (No route to host)


[root@desktop72 glusterfs]# tail -f /var/log/glusterfs/etc-glusterfs-glusterd.vol.log 
[2014-02-15 10:37:39.905258] I [glusterd-handler.c:1169:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2014-02-15 10:37:39.905970] I [glusterd-handler.c:1169:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2014-02-15 10:40:03.392695] E [glusterd-utils.c:153:glusterd_lock] 0-management: Unable to get lock for uuid: c65f4430-970b-4f5c-a7ad-1aa1fb5f8bad, lock held by: c65f4430-970b-4f5c-a7ad-1aa1fb5f8bad
[2014-02-15 10:40:03.392738] E [glusterd-syncop.c:1221:gd_sync_task_begin] 0-management: Unable to acquire lock
[2014-02-15 10:47:54.127134] E [rpc-clnt.c:208:call_bail] 0-management: bailing out frame type(glusterd mgmt) op(--(1)) xid = 0x20 sent = 2014-02-15 10:37:44.172713. timeout = 600 for 192.168.0.71:24007
[2014-02-15 10:47:54.127202] E [glusterd-syncop.c:101:gd_collate_errors] 0-: Locking failed on 192.168.0.71. Please check log file for details.
[2014-02-15 10:47:54.127325] E [glusterd-syncop.c:863:gd_lock_op_phase] 0-management: Failed to acquire lock
[2014-02-15 10:47:54.127414] I [socket.c:3134:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2014-02-15 10:47:54.127431] E [rpcsvc.c:1206:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management)
[2014-02-15 10:47:54.127455] E [glusterd-utils.c:407:glusterd_submit_reply] 0-: Reply submission failed

Comment 2 SATHEESARAN 2014-02-17 08:49:17 UTC
This bug as I know, is fixed by ping-timer implementation.
And also there is a upstream bug for the same issue - https://bugzilla.redhat.com/show_bug.cgi?id=1038261

So marking this bug as DUP

*** This bug has been marked as a duplicate of bug 1038261 ***


Note You need to log in before you can comment on or make changes to this bug.