Bug 816822

Summary: [bb55a0c967a829a0b5eb5a4883d86540511a9d1c]: dbench keeps on running on nfs mount
Product: [Community] GlusterFS Reporter: Raghavendra Bhat <rabhat>
Component: nfsAssignee: Rajesh <rajesh>
Status: CLOSED NOTABUG QA Contact:
Severity: unspecified Docs Contact:
Priority: urgent    
Version: mainlineCC: gluster-bugs, vagarwal, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-04-27 11:55:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Raghavendra Bhat 2012-04-27 05:55:12 UTC
Description of problem:

Replicate volume with replica count 3. Mounted via nfs. Ran posix compliance test and while the tests were running brought down one of the bricks. After posix compliance tests are over started running dbench.

dbench keeps on running in the cleanup phase since it is not able to release all its clients. 

These are the logs of the nfs server when dbench entered the cleanup phase.

[2012-04-27 11:14:43.380407] D [socket.c:1807:socket_event_handler] 0-transport: disconnecting now
[2012-04-27 11:14:46.380490] D [name.c:158:client_fill_address_family] 0-mirror-client-0: address-family not specified, guessing it to be inet/inet6
[2012-04-27 11:14:46.381174] D [common-utils.c:160:gf_resolve_ip6] 0-resolver: returning ip-127.0.0.1 (port-24007) for hostname: hyperspace and port: 24007
[2012-04-27 11:14:46.381362] D [socket.c:289:__socket_disconnect] 0-mirror-client-0: shutdown() returned -1. Transport endpoint is not connected
[2012-04-27 11:14:46.381442] D [socket.c:193:__socket_rwv] 0-mirror-client-0: EOF from peer 127.0.0.1:24009
[2012-04-27 11:14:46.381480] D [socket.c:1521:__socket_proto_state_machine] 0-mirror-client-0: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:24009)
[2012-04-27 11:14:46.381512] D [socket.c:1807:socket_event_handler] 0-transport: disconnecting now
[2012-04-27 11:14:49.381640] D [name.c:158:client_fill_address_family] 0-mirror-client-0: address-family not specified, guessing it to be inet/inet6
[2012-04-27 11:14:49.382522] D [common-utils.c:160:gf_resolve_ip6] 0-resolver: returning ip-127.0.0.1 (port-24007) for hostname: hyperspace and port: 24007
[2012-04-27 11:14:49.382650] D [socket.c:289:__socket_disconnect] 0-mirror-client-0: shutdown() returned -1. Transport endpoint is not connected
[2012-04-27 11:14:49.382676] D [socket.c:193:__socket_rwv] 0-mirror-client-0: EOF from peer 127.0.0.1:24009
[2012-04-27 11:14:49.382699] D [socket.c:1521:__socket_proto_state_machine] 0-mirror-client-0: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:24009)
[2012-04-27 11:14:49.382744] D [socket.c:1807:socket_event_handler] 0-transport: disconnecting now
[2012-04-27 11:14:51.272873] D [client.c:189:client_submit_request] 0-mirror-client-0: connection in disconnected state
[2012-04-27 11:14:51.272999] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-0: remote operation failed: Transport endpoint is not connected
[2012-04-27 11:14:51.273829] D [fd-lk.c:465:fd_lk_insert_and_merge] 0-fd-lk: new lock requrest: owner = 3040687970657273-70616365, fl_type = F_WRLCK, fs_start = 2147483538, fs_end = 2147483538, user_flock: l_type = F_WRLCK, l_start = 2147483538, l_len = 1
[2012-04-27 11:14:51.273915] D [fd-lk.c:428:print_lock_list] 0-fd-lk: lock list:
[2012-04-27 11:14:51.273953] D [fd-lk.c:440:print_lock_list] 0-fd-lk: owner = 3040687970657273-70616365, cmd = F_SETLKW fl_type = F_WRLCK, fs_start = 2147483538, fs_end = 2147483538, user_flock: l_type = F_WRLCK, l_start = 2147483538, l_len = 1, 
[2012-04-27 11:14:51.274143] E [rpcsvc.c:212:rpcsvc_program_actor] 0-rpc-service: RPC Program procedure not available for procedure 5 in NLM4
[2012-04-27 11:14:51.274255] D [rpcsvc.c:1119:rpcsvc_error_reply] (-->/usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x115) [0x7ffda60a0025] (-->/usr/local/lib/libgfrpc.so.0(rpcsvc_notify+0x16a) [0x7ffda609a4de] (-->/usr/local/lib/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x37b) [0x7ffda609a17b]))) 0-: sending a RPC error reply
[2012-04-27 11:14:52.382938] D [name.c:158:client_fill_address_family] 0-mirror-client-0: address-family not specified, guessing it to be inet/inet6
[2012-04-27 11:14:52.383655] D [common-utils.c:160:gf_resolve_ip6] 0-resolver: returning ip-127.0.0.1 (port-24007) for hostname: hyperspace and port: 24007
[2012-04-27 11:14:52.383867] D [socket.c:289:__socket_disconnect] 0-mirror-client-0: shutdown() returned -1. Transport endpoint is not connected
[2012-04-27 11:14:52.383957] D [socket.c:193:__socket_rwv] 0-mirror-client-0: EOF from peer 127.0.0.1:24009
[2012-04-27 11:14:52.383997] D [socket.c:1521:__socket_proto_state_machine] 0-mirror-client-0: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:24009)
[2012-04-27 11:14:52.384029] D [socket.c:1807:socket_event_handler] 0-transport: disconnecting now

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create a replicate volume and mount it via nfs
2. Start running tests and bring one of the bricks down.
3. Start running dbench (in my case options were dbench -t 100 10)
  
Actual results:

Dbench keeps on running, not being able to release all the clients.

Expected results:

Dbench should release all the clients in the cleanup phase and exit successfully.

Additional info:
Tried the same test (i.e. running posix compliance test and dbench) with a distribute volume by mounting it via nfs. Both tests ran successfully.


gluster volume info
 
Volume Name: mirror
Type: Replicate
Volume ID: e68ec23f-140e-46fd-9d21-e2662dc175f9
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: hyperspace:/mnt/sda7/export3
Brick2: hyperspace:/mnt/sda8/export3
Brick3: hyperspace:/mnt/sda10/export3
Options Reconfigured:
diagnostics.client-log-level: DEBUG
 
Volume Name: vol
Type: Distribute
Volume ID: 5f8f1099-eee2-4596-89a4-66cd03036a25
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: hyperspace:/mnt/sda7/export4
Brick2: hyperspace:/mnt/sda8/export4

Comment 1 Vijay Bellur 2012-04-27 11:55:55 UTC
Problem due to client and server being on the same machine. Please re-open if you can reproduce this with different machines.