Description of problem: ======================= Failing to remove/replace the the bad brick part of volume., getting the following CLI error when tried remove-brick: ------------------------ # gluster volume remove-brick Dis IP:/bricks/brick0/xz0 start volume remove-brick start: failed: Incorrect brick IP:/bricks/brick0/xz0 for volume Dis [root@ ~]# # gluster volume remove-brick Dis IP:/bricks/brick2/xz0 force Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit force: failed: Incorrect brick IP:/bricks/brick2/xz0 for volume Dis [root@~]# When tried replace-brick: ------------------------- ]# gluster volume replace-brick Dis IP:/bricks/brick0/xz0 IP:/bricks/brick2/xz2 commit force volume replace-brick: failed: brick: IP:/bricks/brick0/xz0 does not exist in volume: Dis [root@dhcp42-77 ~]# Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.7.9-4 How reproducible: ================= Always Steps to Reproduce: =================== 1. Create a simple volume using one or two node cluster. 2. Crash one of the volume brick underlying filesystem. // to make it bad. 3. Try to remove/replace the the crashed brick Actual results: ============== Failing to remove/replace the bad brick part of volume Expected results: ================= Remove/replace of bad brick should work. Additional info:
[2016-05-12 05:03:27.268957] I [MSGID: 106484] [glusterd-brick-ops.c:837:__glusterd_handle_remove_brick] 0-management: Received rem brick req [2016-05-12 05:03:27.269042] C [MSGID: 106425] [glusterd-utils.c:1125:glusterd_brickinfo_new_from_brick] 0-management: realpath () failed for brick /bricks/brick0/xz0. The underlying filesys tem may be in bad state [Input/output error] [2016-05-12 05:03:27.269086] E [MSGID: 106256] [glusterd-brick-ops.c:1049:__glusterd_handle_remove_brick] 0-management: Incorrect brick 10.70.42.77:/bricks/brick0/xz0 for volume Dis [Invalid argument] [2016-05-12 05:03:27.269100] E [MSGID: 106265] [glusterd-brick-ops.c:1090:__glusterd_handle_remove_brick] 0-management: Incorrect brick 10.70.42.77:/bricks/brick0/xz0 for volume Dis The message "I [MSGID: 106499] [glusterd-handler.c:4330:__glusterd_handle_status_volume] 0-management: Received status volume req for volume Dis" repeated 10 times between [2016-05-12 05:02:28.616648] and [2016-05-12 05:02:43.870331] The message "I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 31 times between [2016-05-12 05:02:39.702516] and [2016-05-12 05:04:10.231004] [2016-05-12 05:04:13.231728] I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. [2016-05-12 05:04:22.660291] I [MSGID: 106484] [glusterd-brick-ops.c:837:__glusterd_handle_remove_brick] 0-management: Received rem brick req [2016-05-12 05:04:22.660947] E [MSGID: 106256] [glusterd-brick-ops.c:1049:__glusterd_handle_remove_brick] 0-management: Incorrect brick 10.70.42.77:/bricks/brick2/xz0 for volume Dis [Invalid argument] [2016-05-12 05:04:22.660984] E [MSGID: 106265] [glusterd-brick-ops.c:1090:__glusterd_handle_remove_brick] 0-management: Incorrect brick 10.70.42.77:/bricks/brick2/xz0 for volume Dis [2016-05-12 05:04:43.236001] W [socket.c:701:__socket_rwv] 0-management: readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed (Invalid argument) [2016-05-12 05:05:34.114705] I [MSGID: 106484] [glusterd-brick-ops.c:837:__glusterd_handle_remove_brick] 0-management: Received rem brick req [2016-05-12 05:05:34.115142] E [MSGID: 106256] [glusterd-brick-ops.c:1049:__glusterd_handle_remove_brick] 0-management: Incorrect brick 10.70.42.77:/bricks/brick2/xz0 for volume Dis [Invalid argument] [2016-05-12 05:05:34.115144] E [MSGID: 106265] [glusterd-brick-ops.c:1090:__glusterd_handle_remove_brick] 0-management: Incorrect brick 10.70.42.77:/bricks/brick2/xz0 for volume Dis The message "I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 times between [2016-05-12 05:04:13.231728] and [2016-05-12 05:06:10.249082] [2016-05-12 05:06:13.249694] I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. [2016-05-12 05:06:49.254924] W [socket.c:701:__socket_rwv] 0-management: readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed (Invalid argument) [2016-05-12 05:07:33.633100] I [MSGID: 106505] [glusterd-replace-brick.c:76:__glusterd_handle_replace_brick] 0-management: Received replace brick req [2016-05-12 05:07:33.633157] I [MSGID: 106503] [glusterd-replace-brick.c:136:__glusterd_handle_replace_brick] 0-management: Received replace brick commit-force request operation [2016-05-12 05:07:33.635942] C [MSGID: 106425] [glusterd-utils.c:1125:glusterd_brickinfo_new_from_brick] 0-management: realpath () failed for brick /bricks/brick0/xz0. The underlying filesystem may be in bad state [Input/output error] [2016-05-12 05:07:33.636000] W [MSGID: 106122] [glusterd-mgmt.c:179:gd_mgmt_v3_pre_validate_fn] 0-management: Replace-brick prevalidation failed. [2016-05-12 05:07:33.636013] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Replace brick on local node [2016-05-12 05:07:33.636022] E [MSGID: 106122] [glusterd-replace-brick.c:851:glusterd_mgmt_v3_initiate_replace_brick_cmd_phases] 0-management: Pre Validation Failed The message "I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 times between [2016-05-12 05:06:13.249694] and [2016-05-12 05:08:10.267819] [2016-05-12 05:08:13.268731] I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. [2016-05-12 05:08:55.275005] W [socket.c:701:__socket_rwv] 0-management: readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed (Invalid argument) The message "I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 times between [2016-05-12 05:08:13.268731] and [2016-05-12 05:10:10.286403] [2016-05-12 05:10:13.286896] I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. [2016-05-12 05:11:01.294075] W [socket.c:701:__socket_rwv] 0-management: readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed (Invalid argument) The message "I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 times between [2016-05-12 05:10:13.286896] and [2016-05-12 05:12:10.305179] [2016-05-12 05:12:13.305858] I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. [2016-05-12 05:13:07.313875] W [socket.c:701:__socket_rwv] 0-management: readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed (Invalid argument) The message "I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 times between [2016-05-12 05:12:13.305858] and [2016-05-12 05:14:10.323962] [2016-05-12 05:14:13.324396] I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. [2016-05-12 05:15:13.333714] W [socket.c:701:__socket_rwv] 0-management: readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed (Invalid argument) The message "I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 times bet: [2016-05-12 05:15:13.333714] W [socket.c:701:__socket_rwv] 0-management: readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed (Invalid argument) The message "I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 times between [2016-05-12 05:14:13.324396] and [2016-05-12 05:16:10.344242] [2016-05-12 05:16:13.344693] I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. [2016-05-12 05:17:19.354251] W [socket.c:701:__socket_rwv] 0-management: readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed (Invalid argument) The message "I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 times between [2016-05-12 05:16:13.344693] and [2016-05-12 05:18:10.361976] [2016-05-12 05:18:13.362516] I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. (END)
(In reply to Byreddy from comment #2) > [2016-05-12 05:03:27.268957] I [MSGID: 106484] > [glusterd-brick-ops.c:837:__glusterd_handle_remove_brick] 0-management: > Received rem brick req > [2016-05-12 05:03:27.269042] C [MSGID: 106425] > [glusterd-utils.c:1125:glusterd_brickinfo_new_from_brick] 0-management: > realpath () failed for brick /bricks/brick0/xz0. The underlying filesys > tem may be in bad state [Input/output error] > [2016-05-12 05:03:27.269086] E [MSGID: 106256] > [glusterd-brick-ops.c:1049:__glusterd_handle_remove_brick] 0-management: > Incorrect brick 10.70.42.77:/bricks/brick0/xz0 for volume Dis [Invalid > argument] > [2016-05-12 05:03:27.269100] E [MSGID: 106265] > [glusterd-brick-ops.c:1090:__glusterd_handle_remove_brick] 0-management: > Incorrect brick 10.70.42.77:/bricks/brick0/xz0 for volume Dis > The message "I [MSGID: 106499] > [glusterd-handler.c:4330:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume Dis" repeated 10 times between > [2016-05-12 05:02:28.616648] and [2016-05-12 05:02:43.870331] > The message "I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 31 > times between [2016-05-12 05:02:39.702516] and [2016-05-12 05:04:10.231004] > [2016-05-12 05:04:13.231728] I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. > [2016-05-12 05:04:22.660291] I [MSGID: 106484] > [glusterd-brick-ops.c:837:__glusterd_handle_remove_brick] 0-management: > Received rem brick req > [2016-05-12 05:04:22.660947] E [MSGID: 106256] > [glusterd-brick-ops.c:1049:__glusterd_handle_remove_brick] 0-management: > Incorrect brick 10.70.42.77:/bricks/brick2/xz0 for volume Dis [Invalid > argument] > [2016-05-12 05:04:22.660984] E [MSGID: 106265] > [glusterd-brick-ops.c:1090:__glusterd_handle_remove_brick] 0-management: > Incorrect brick 10.70.42.77:/bricks/brick2/xz0 for volume Dis > [2016-05-12 05:04:43.236001] W [socket.c:701:__socket_rwv] 0-management: > readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed > (Invalid argument) > [2016-05-12 05:05:34.114705] I [MSGID: 106484] > [glusterd-brick-ops.c:837:__glusterd_handle_remove_brick] 0-management: > Received rem brick req > [2016-05-12 05:05:34.115142] E [MSGID: 106256] > [glusterd-brick-ops.c:1049:__glusterd_handle_remove_brick] 0-management: > Incorrect brick 10.70.42.77:/bricks/brick2/xz0 for volume Dis [Invalid > argument] > [2016-05-12 05:05:34.115144] E [MSGID: 106265] > [glusterd-brick-ops.c:1090:__glusterd_handle_remove_brick] 0-management: > Incorrect brick 10.70.42.77:/bricks/brick2/xz0 for volume Dis > The message "I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 > times between [2016-05-12 05:04:13.231728] and [2016-05-12 05:06:10.249082] > [2016-05-12 05:06:13.249694] I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. > [2016-05-12 05:06:49.254924] W [socket.c:701:__socket_rwv] 0-management: > readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed > (Invalid argument) > [2016-05-12 05:07:33.633100] I [MSGID: 106505] > [glusterd-replace-brick.c:76:__glusterd_handle_replace_brick] 0-management: > Received replace brick req > [2016-05-12 05:07:33.633157] I [MSGID: 106503] > [glusterd-replace-brick.c:136:__glusterd_handle_replace_brick] 0-management: > Received replace brick commit-force request operation > [2016-05-12 05:07:33.635942] C [MSGID: 106425] > [glusterd-utils.c:1125:glusterd_brickinfo_new_from_brick] 0-management: > realpath () failed for brick /bricks/brick0/xz0. The underlying filesystem > may be in bad state [Input/output error] > [2016-05-12 05:07:33.636000] W [MSGID: 106122] > [glusterd-mgmt.c:179:gd_mgmt_v3_pre_validate_fn] 0-management: Replace-brick > prevalidation failed. > [2016-05-12 05:07:33.636013] E [MSGID: 106122] > [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre > Validation failed for operation Replace brick on local node > [2016-05-12 05:07:33.636022] E [MSGID: 106122] > [glusterd-replace-brick.c:851: > glusterd_mgmt_v3_initiate_replace_brick_cmd_phases] 0-management: Pre > Validation Failed > The message "I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 > times between [2016-05-12 05:06:13.249694] and [2016-05-12 05:08:10.267819] > [2016-05-12 05:08:13.268731] I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. > [2016-05-12 05:08:55.275005] W [socket.c:701:__socket_rwv] 0-management: > readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed > (Invalid argument) > The message "I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 > times between [2016-05-12 05:08:13.268731] and [2016-05-12 05:10:10.286403] > [2016-05-12 05:10:13.286896] I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. > [2016-05-12 05:11:01.294075] W [socket.c:701:__socket_rwv] 0-management: > readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed > (Invalid argument) > The message "I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 > times between [2016-05-12 05:10:13.286896] and [2016-05-12 05:12:10.305179] > [2016-05-12 05:12:13.305858] I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. > [2016-05-12 05:13:07.313875] W [socket.c:701:__socket_rwv] 0-management: > readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed > (Invalid argument) > The message "I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 > times between [2016-05-12 05:12:13.305858] and [2016-05-12 05:14:10.323962] > [2016-05-12 05:14:13.324396] I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. > [2016-05-12 05:15:13.333714] W [socket.c:701:__socket_rwv] 0-management: > readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed > (Invalid argument) > The message "I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 > times bet: > [2016-05-12 05:15:13.333714] W [socket.c:701:__socket_rwv] 0-management: > readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed > (Invalid argument) > The message "I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 > times between [2016-05-12 05:14:13.324396] and [2016-05-12 05:16:10.344242] > [2016-05-12 05:16:13.344693] I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. > [2016-05-12 05:17:19.354251] W [socket.c:701:__socket_rwv] 0-management: > readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed > (Invalid argument) > The message "I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 > times between [2016-05-12 05:16:13.344693] and [2016-05-12 05:18:10.361976] > [2016-05-12 05:18:13.362516] I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. > (END) Log info from glusterd
RCA: We are going for a realpath () call for a brick which is to be removed which is unnecessary here and can cause a failure if the underlying file system for the same brick has crashed.
Fix of BZ 1335357 will take care of this issue too and hence moving the state to Post.
Even if we fix this issue you'd not end up removing a brick where the brick process is down as the validation is been introduced in 3.1.3 (BZ 1201205) and it makes sense to add this validation as otherwise the data migration from the brick can't happen. However with this fix you'd not see realpath () failures which can be the source of truth to prove the validity of this fix.
Downstream patch : https://code.engineering.redhat.com/gerrit/#/c/74663/ Upstream patches: mainline : http://review.gluster.org/#/c/14306 release-3.7 : http://review.gluster.org/#/c/14410 release-3.8 : http://review.gluster.org/#/c/14411
Verified this bug using the build "glusterfs-3.7.9-6". replacing of offline brick is working good and removing of offline brick is not allowed because of new conditions added ( http://review.gluster.org/#/c/13306/ for bug-1201205 ) Based on the above details moving to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240