Description of problem: glusterd updates volume info with new brick when replace brick command returns failure(commit failed on ... ). This presents a problem to the caller in determining the status of replace brick. gluster --mode=script volume replace-brick vol_9480cf14bde1c564b1d67d8e364fe18f 10.70.47.166:/var/lib/heketi/mounts/vg_2acb109d758154a8d12de3b90e084f0c/brick_32967d19fad73bae52c45bdc3e7574a1/brick 10.70.47.166:/var/lib/heketi/mounts/vg_ea53cfb78199251ba13668b8ff16a350/brick_7cad9f700851411d3ae9b687a7a1f308/brick commit force volume replace-brick: failed: Commit failed on 10.70.47.162. Please check log file for details. original brick list from gluster: 10.70.47.162:/var/lib/heketi/mounts/vg_94138d2abb10d88c1d83ba18c881b389/brick_da6dd5bfff1415b786f8ce88446a0846/brick 10.70.46.45:/var/lib/heketi/mounts/vg_4c39fa5f5ccdd94712b471fc46b92760/brick_8140f263b15c19b5e31cd4f5632a369d/brick 10.70.47.166:/var/lib/heketi/mounts/vg_2acb109d758154a8d12de3b90e084f0c/brick_32967d19fad73bae52c45bdc3e7574a1/brick new brick list from gluster: 10.70.47.162:/var/lib/heketi/mounts/vg_94138d2abb10d88c1d83ba18c881b389/brick_da6dd5bfff1415b786f8ce88446a0846/brick 10.70.46.45:/var/lib/heketi/mounts/vg_4c39fa5f5ccdd94712b471fc46b92760/brick_8140f263b15c19b5e31cd4f5632a369d/brick 10.70.47.166:/var/lib/heketi/mounts/vg_ea53cfb78199251ba13668b8ff16a350/brick_7cad9f700851411d3ae9b687a7a1f308/brick Steps to Reproduce: 1. create a replica 3 volume 2. execute replace-brick command and in parallel bring down the node in cluster which is neither the owner of old brick nor the owner of new brick. This is a race condition bug and may not be hit always. Retry the steps till command fails with commit failed on error. Actual results: volume info is updated with new brick Expected results: volume info should retain old bricks Additional info:
relevant glusterd log 2018-04-02 11:30:58.108514] W [socket.c:593:__socket_rwv] 0-management: readv on 192.168.10.103:24007 failed (No data available) [2018-04-02 11:30:58.108550] I [MSGID: 106004] [glusterd-handler.c:6317:__glusterd_peer_rpc_notify] 0-management: Peer <192.168.10.103> (<cd9b911d-c537-4ef4-82a1-1f8fd26a8e7a>), in state <Peer in Cluster>, has disconnected from glusterd. [2018-04-02 11:30:58.108694] W [glusterd-locks.c:854:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0x2322a) [0x7fb0f80de22a] -->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0x2d198) [0x7fb0f80e8198] -->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0xe458c) [0x7fb0f819f58c] ) 0-management: Lock owner mismatch. Lock for vol vol_1c15c1637ac1e325f03d56fdd41f75f5 held by 0eafd295-aa2d-4170-a7b8-3ce98fbff0fc [2018-04-02 11:30:58.108704] W [MSGID: 106118] [glusterd-handler.c:6342:__glusterd_peer_rpc_notify] 0-management: Lock not released for vol_1c15c1637ac1e325f03d56fdd41f75f5 [2018-04-02 11:30:58.108722] W [glusterd-locks.c:843:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0x2322a) [0x7fb0f80de22a] -->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0x2d198) [0x7fb0f80e8198] -->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0xe4765) [0x7fb0f819f765] ) 0-management: Lock for vol vol_a12b9272cc1f4cb70c17b0ccbbc57633 not held [2018-04-02 11:30:58.108729] W [MSGID: 106118] [glusterd-handler.c:6342:__glusterd_peer_rpc_notify] 0-management: Lock not released for vol_a12b9272cc1f4cb70c17b0ccbbc57633 [2018-04-02 11:30:58.108743] W [glusterd-locks.c:843:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0x2322a) [0x7fb0f80de22a] -->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0x2d198) [0x7fb0f80e8198] -->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0xe4765) [0x7fb0f819f765] ) 0-management: Lock for vol vol_b2b243820c1277c994178a58d465db25 not held [2018-04-02 11:30:58.108749] W [MSGID: 106118] [glusterd-handler.c:6342:__glusterd_peer_rpc_notify] 0-management: Lock not released for vol_b2b243820c1277c994178a58d465db25 [2018-04-02 11:30:58.108762] W [glusterd-locks.c:843:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0x2322a) [0x7fb0f80de22a] -->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0x2d198) [0x7fb0f80e8198] -->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0xe4765) [0x7fb0f819f765] ) 0-management: Lock for vol vol_d1a96444079258dae3e28dd4a8b6fd81 not held [2018-04-02 11:30:58.108768] W [MSGID: 106118] [glusterd-handler.c:6342:__glusterd_peer_rpc_notify] 0-management: Lock not released for vol_d1a96444079258dae3e28dd4a8b6fd81 [2018-04-02 11:30:58.109188] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fb0fd46cedb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb0fd231e6e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb0fd231f8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7fb0fd233710] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fb0fd234200] ))))) 0-management: forced unwinding frame type(glusterd mgmt v3) op(--(4)) called at 2018-04-02 11:30:58.107966 (xid=0x13) [2018-04-02 11:30:58.109236] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Commit failed on 192.168.10.103. Please check log file for details. [2018-04-02 11:30:58.109372] I [MSGID: 106144] [glusterd-pmap.c:396:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_e08f107a4127b1b05d577654773be4cc/brick_168912fe5c0deb6c23772ea7acd96550/brick on port 49154 [2018-04-02 11:30:58.109405] I [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2018-04-02 11:31:00.164657] E [MSGID: 106123] [glusterd-mgmt.c:1677:glusterd_mgmt_v3_commit] 0-management: Commit failed on peers [2018-04-02 11:31:00.164697] E [MSGID: 106123] [glusterd-replace-brick.c:669:glusterd_mgmt_v3_initiate_replace_brick_cmd_phases] 0-management: Commit Op Failed [2018-04-02 11:32:14.112970] E [socket.c:2369:socket_connect_finish] 0-management: connection to 192.168.10.103:24007 failed (No route to host); disconnecting socket
This is an issue due to the lack of rollback mechanism in transaction's engine of glusterd. With replace brick operation being a heavy weight command with multiple steps involved in the different phases, probability of partial failures are high. I'm not very confident if this can be at all solved in glusterd's code space until and unless a robust rollback mechanism can be written which itself requires lots of effort and bandwidth from engineering. GlusterD2 should be able to address this. On the other hand, I'm moving this bug to Karthik to see if any of the individual steps in the replace brick command can be rollbacked to reduce the probability of such failures.
Karthik/Ashish - Can this bug be looked at priority basis and see if we can reduce the overall probability of partial failures?
Hi Atin, I don't think this can be fixed in GD1 easily, as this needs the rollback mechanism to be implemented and getting that correct is one more challenge. Currently we don't have any immediate plan to do this in GD1. Since we are going to support rollback in GD2 for such transactions, I think it is better to defer and close this. Thanks, Karthik