Hide Forgot
Gluster CLI cmds "operation failed" when a node off network until glusterd svc is restarted. When the communication path between two Gluster nodes is disrupted, and any CLI command that changes the configuration is attempted, it fails with "operation failed". This situation persists until the 'glusterd' service is restarted. To reproduce, with a two-way replica volume, disrupt communication between the two gluster nodes by downing the network interface on one of them. Then attempt to change a volume option. After a longish 2 minute timeout, the command will fail with "operation failed". Repeating the command immediately fails with the same message, but different exit status. Re-establish the network communications by bringing the network back up on the remote gluster node. Again, attempting to change a volume option fails, and will continue to fail even though network path to all nodes is back up. CLI commands will fail until the local 'glusterd' service is restarted. This is a known work-around: Can't set volume options: "operation failed" http://comments.gmane.org/gmane.comp.file-systems.gluster.user/7652 But the work-around interrupts client I/O to all volumes. Step-by-step below: [root@S-100]# gluster peer status Number of Peers: 1 Hostname: 192.168.5.99 Uuid: c0efccec-b0a0-a091-e517-00114331c257 State: Peer in Cluster (Connected) [root@S-100]# gluster volume info nas-volume-0001 Volume Name: nas-volume-0001 Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: 192.168.5.100:/exports/nas-segment-0004/nas-volume-0001 Brick2: 192.168.5.99:/exports/nas-segment-0007/nas-volume-0001 Options Reconfigured: performance.read-ahead: off performance.write-behind: off performance.stat-prefetch: on nfs.disable: on nfs.addr-namelookup: off [root@S-100]# time gluster volume set nas-volume-0001 performance.stat-prefetch off Set volume successful real 0m2.244s [root@S-99]# ifdown eth1 [root@S-100]# ping 192.168.5.99 PING 192.168.5.99 (192.168.5.99) 56(84) bytes of data. From 192.168.5.100 icmp_seq=1 Destination Host Unreachable [root@S-100]# time gluster volume set nas-volume-0001 performance.stat-prefetch on real 2m0.144s [root@S-100]# echo $? 110 [root@S-100]# tail -f /usr/local/var/log/glusterfs/usr-local-etc-glusterfs-glusterd.vol.log [2012-02-08 11:37:29.987084] I [glusterd-utils.c:243:glusterd_lock] 0-glusterd: Cluster lock held by c0efccec-b0a0-a091-e517-00114331bf4c [2012-02-08 11:37:29.987117] I [glusterd-handler.c:420:glusterd_op_txn_begin] 0-glusterd: Acquired local lock [2012-02-08 11:37:41.950195] W [socket.c:204:__socket_rwv] 0-socket.management: readv failed (Connection timed out) [2012-02-08 11:37:41.950282] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Connection timed out), peer (192.168.5.99:1019) [2012-02-08 11:39:30.8670] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:952) [root@S-100]# time gluster volume set nas-volume-0001 performance.stat-prefetch on operation failed real 0m0.142s [root@S-100]# tail -f /usr/local/var/log/glusterfs/usr-local-etc-glusterfs-glusterd.vol.log [2012-02-08 11:41:35.627701] E [glusterd-handler.c:1905:glusterd_handle_set_volume] 0-: Unable to set cli op: 16 [2012-02-08 11:41:35.628143] E [glusterd-rpc-ops.c:61:glusterd_op_send_cli_response] (-->/usr/local/lib/libgfrpc.so.0(rpcsvc_notify+0x16c) [0x2b7f31e7737c] (-->/usr/local/lib/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x291) [0x2b7f31e77171] (-->/usr/local/lib/glusterfs/3.2.5-SF.1/xlator/mgmt/glusterd.so(glusterd_handle_set_volume+0x105) [0x2aaaaaacc3a5]))) 0-: Assertion failed: op_ctx [2012-02-08 11:41:35.630258] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:1015) [root@S-100]# gluster volume info nas-volume-0001 Volume Name: nas-volume-0001 Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: 192.168.5.100:/exports/nas-segment-0004/nas-volume-0001 Brick2: 192.168.5.99:/exports/nas-segment-0007/nas-volume-0001 Options Reconfigured: performance.read-ahead: off performance.write-behind: off performance.stat-prefetch: off nfs.disable: on nfs.addr-namelookup: off [root@S-100]# gluster volume stop nas-volume-0001 Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y operation failed [root@S-100]# gluster peer status Number of Peers: 1 Hostname: 192.168.5.99 Uuid: c0efccec-b0a0-a091-e517-00114331c257 State: Peer in Cluster (Connected) [root@S-99]# ifup eth1 [root@S-100]# ping 192.168.5.99 PING 192.168.5.99 (192.168.5.99) 56(84) bytes of data. 64 bytes from 192.168.5.99: icmp_seq=1 ttl=64 time=0.121 ms [root@S-100]# gluster peer status Number of Peers: 1 Hostname: 192.168.5.99 Uuid: c0efccec-b0a0-a091-e517-00114331c257 State: Peer in Cluster (Connected) [root@S-100]# time gluster volume set nas-volume-0001 performance.stat-prefetch off operation failed real 0m0.124s [root@S-100]# tail -f /usr/local/var/log/glusterfs/usr-local-etc-glusterfs-glusterd.vol.log [2012-02-08 11:44:02.822372] E [glusterd-handler.c:1905:glusterd_handle_set_volume] 0-: Unable to set cli op: 16 [2012-02-08 11:44:02.822523] E [glusterd-rpc-ops.c:61:glusterd_op_send_cli_response] (-->/usr/local/lib/libgfrpc.so.0(rpcsvc_notify+0x16c) [0x2b7f31e7737c] (-->/usr/local/lib/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x291) [0x2b7f31e77171] (-->/usr/local/lib/glusterfs/3.2.5-SF.1/xlator/mgmt/glusterd.so(glusterd_handle_set_volume+0x105) [0x2aaaaaacc3a5]))) 0-: Assertion failed: op_ctx [2012-02-08 11:44:02.824656] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:1009) [root@S-100]# /etc/init.d/glusterd restart Stopping glusterd: [ OK ] Starting glusterd: [ OK ] [root@S-100]# time gluster volume set nas-volume-0001 performance.stat-prefetch on Set volume successful real 0m2.293s [root@S-100]# glusterd --version glusterfs 3.2.5 built on Jan 6 2012 12:57:01
need to check the behavior on master branch. Lets not focus much on 3.2.x branch, but make sure this will not be happening on master branch.
Doesn't happen anymore on master. Will check once again and update the bug.
This doesn't happen anymore on the master branch.