Bug 1631664 - glusterd updates volume info with new brick when replace brick command returns failure
Summary: glusterd updates volume info with new brick when replace brick command return...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Karthik U S
QA Contact: Bala Konda Reddy M
URL:
Whiteboard: ocs-dependency-issue
Depends On:
Blocks: 1562484 1582526 1598340
TreeView+ depends on / blocked
 
Reported: 2018-09-21 08:44 UTC by Raghavendra Talur
Modified: 2020-08-12 14:53 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-12-02 07:51:37 UTC
Embargoed:


Attachments (Terms of Use)

Description Raghavendra Talur 2018-09-21 08:44:10 UTC
Description of problem:

glusterd updates volume info with new brick when replace brick command returns failure(commit failed on ... ). This presents a problem to the caller in determining the status of replace brick.

gluster --mode=script volume replace-brick vol_9480cf14bde1c564b1d67d8e364fe18f 10.70.47.166:/var/lib/heketi/mounts/vg_2acb109d758154a8d12de3b90e084f0c/brick_32967d19fad73bae52c45bdc3e7574a1/brick 10.70.47.166:/var/lib/heketi/mounts/vg_ea53cfb78199251ba13668b8ff16a350/brick_7cad9f700851411d3ae9b687a7a1f308/brick commit force
volume replace-brick: failed: Commit failed on 10.70.47.162. Please check log file for details.




original brick list from gluster:
10.70.47.162:/var/lib/heketi/mounts/vg_94138d2abb10d88c1d83ba18c881b389/brick_da6dd5bfff1415b786f8ce88446a0846/brick
10.70.46.45:/var/lib/heketi/mounts/vg_4c39fa5f5ccdd94712b471fc46b92760/brick_8140f263b15c19b5e31cd4f5632a369d/brick
10.70.47.166:/var/lib/heketi/mounts/vg_2acb109d758154a8d12de3b90e084f0c/brick_32967d19fad73bae52c45bdc3e7574a1/brick

new brick list from gluster:
10.70.47.162:/var/lib/heketi/mounts/vg_94138d2abb10d88c1d83ba18c881b389/brick_da6dd5bfff1415b786f8ce88446a0846/brick
10.70.46.45:/var/lib/heketi/mounts/vg_4c39fa5f5ccdd94712b471fc46b92760/brick_8140f263b15c19b5e31cd4f5632a369d/brick
10.70.47.166:/var/lib/heketi/mounts/vg_ea53cfb78199251ba13668b8ff16a350/brick_7cad9f700851411d3ae9b687a7a1f308/brick



Steps to Reproduce:
1. create a replica 3 volume
2. execute replace-brick command and in parallel bring down the node in cluster which is neither the owner of old brick nor the owner of new brick.

This is a race condition bug and may not be hit always. Retry the steps till command fails with commit failed on error.

Actual results:
volume info is updated with new brick

Expected results:
volume info should retain old bricks

Additional info:

Comment 2 Raghavendra Talur 2018-09-21 08:45:37 UTC
relevant glusterd log


2018-04-02 11:30:58.108514] W [socket.c:593:__socket_rwv] 0-management: readv on 192.168.10.103:24007 failed (No data available)
[2018-04-02 11:30:58.108550] I [MSGID: 106004] [glusterd-handler.c:6317:__glusterd_peer_rpc_notify] 0-management: Peer <192.168.10.103> (<cd9b911d-c537-4ef4-82a1-1f8fd26a8e7a>), in state <Peer in Cluster>, has disconnected from glusterd.
[2018-04-02 11:30:58.108694] W [glusterd-locks.c:854:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0x2322a) [0x7fb0f80de22a] -->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0x2d198) [0x7fb0f80e8198] -->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0xe458c) [0x7fb0f819f58c] ) 0-management: Lock owner mismatch. Lock for vol vol_1c15c1637ac1e325f03d56fdd41f75f5 held by 0eafd295-aa2d-4170-a7b8-3ce98fbff0fc
[2018-04-02 11:30:58.108704] W [MSGID: 106118] [glusterd-handler.c:6342:__glusterd_peer_rpc_notify] 0-management: Lock not released for vol_1c15c1637ac1e325f03d56fdd41f75f5
[2018-04-02 11:30:58.108722] W [glusterd-locks.c:843:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0x2322a) [0x7fb0f80de22a] -->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0x2d198) [0x7fb0f80e8198] -->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0xe4765) [0x7fb0f819f765] ) 0-management: Lock for vol vol_a12b9272cc1f4cb70c17b0ccbbc57633 not held
[2018-04-02 11:30:58.108729] W [MSGID: 106118] [glusterd-handler.c:6342:__glusterd_peer_rpc_notify] 0-management: Lock not released for vol_a12b9272cc1f4cb70c17b0ccbbc57633
[2018-04-02 11:30:58.108743] W [glusterd-locks.c:843:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0x2322a) [0x7fb0f80de22a] -->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0x2d198) [0x7fb0f80e8198] -->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0xe4765) [0x7fb0f819f765] ) 0-management: Lock for vol vol_b2b243820c1277c994178a58d465db25 not held
[2018-04-02 11:30:58.108749] W [MSGID: 106118] [glusterd-handler.c:6342:__glusterd_peer_rpc_notify] 0-management: Lock not released for vol_b2b243820c1277c994178a58d465db25
[2018-04-02 11:30:58.108762] W [glusterd-locks.c:843:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0x2322a) [0x7fb0f80de22a] -->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0x2d198) [0x7fb0f80e8198] -->/usr/lib64/glusterfs/3.12.6/xlator/mgmt/glusterd.so(+0xe4765) [0x7fb0f819f765] ) 0-management: Lock for vol vol_d1a96444079258dae3e28dd4a8b6fd81 not held
[2018-04-02 11:30:58.108768] W [MSGID: 106118] [glusterd-handler.c:6342:__glusterd_peer_rpc_notify] 0-management: Lock not released for vol_d1a96444079258dae3e28dd4a8b6fd81
[2018-04-02 11:30:58.109188] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fb0fd46cedb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb0fd231e6e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb0fd231f8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7fb0fd233710] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fb0fd234200] ))))) 0-management: forced unwinding frame type(glusterd mgmt v3) op(--(4)) called at 2018-04-02 11:30:58.107966 (xid=0x13)
[2018-04-02 11:30:58.109236] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Commit failed on 192.168.10.103. Please check log file for details.
[2018-04-02 11:30:58.109372] I [MSGID: 106144] [glusterd-pmap.c:396:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_e08f107a4127b1b05d577654773be4cc/brick_168912fe5c0deb6c23772ea7acd96550/brick on port 49154
[2018-04-02 11:30:58.109405] I [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2018-04-02 11:31:00.164657] E [MSGID: 106123] [glusterd-mgmt.c:1677:glusterd_mgmt_v3_commit] 0-management: Commit failed on peers
[2018-04-02 11:31:00.164697] E [MSGID: 106123] [glusterd-replace-brick.c:669:glusterd_mgmt_v3_initiate_replace_brick_cmd_phases] 0-management: Commit Op Failed
[2018-04-02 11:32:14.112970] E [socket.c:2369:socket_connect_finish] 0-management: connection to 192.168.10.103:24007 failed (No route to host); disconnecting socket

Comment 4 Atin Mukherjee 2018-09-22 05:01:54 UTC
This is an issue due to the lack of rollback mechanism in transaction's engine of glusterd. With replace brick operation being a heavy weight command with multiple steps involved in the different phases, probability of partial failures are high.

I'm not very confident if this can be at all solved in glusterd's code space until and unless a robust rollback mechanism can be written which itself requires lots of effort and bandwidth from engineering. GlusterD2 should be able to address this.

On the other hand, I'm moving this bug to Karthik to see if any of the individual steps in the replace brick command can be rollbacked to reduce the probability of such failures.

Comment 5 Atin Mukherjee 2018-11-01 15:39:17 UTC
Karthik/Ashish - Can this bug be looked at priority basis and see if we can reduce the overall probability of partial failures?

Comment 6 Karthik U S 2018-11-19 05:38:33 UTC
Hi Atin,

I don't think this can be fixed in GD1 easily, as this needs the rollback mechanism to be implemented and getting that correct is one more challenge. Currently we don't have any immediate plan to do this in GD1.
Since we are going to support rollback in GD2 for such transactions, I think it is better to defer and close this.

Thanks,
Karthik


Note You need to log in before you can comment on or make changes to this bug.