Bug 1335367
Summary: | Failing to remove/replace the bad brick part of the volume. | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Byreddy <bsrirama> |
Component: | glusterd | Assignee: | Atin Mukherjee <amukherj> |
Status: | CLOSED ERRATA | QA Contact: | Byreddy <bsrirama> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.1 | CC: | asrivast, rcyriac, rhinduja, rhs-bugs, storage-qa-internal, vbellur |
Target Milestone: | --- | Keywords: | Regression, ZStream |
Target Release: | RHGS 3.1.3 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.7.9-6 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-06-23 05:23:01 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1335357 | ||
Bug Blocks: | 1311817 |
Description
Byreddy
2016-05-12 05:20:57 UTC
[2016-05-12 05:03:27.268957] I [MSGID: 106484] [glusterd-brick-ops.c:837:__glusterd_handle_remove_brick] 0-management: Received rem brick req [2016-05-12 05:03:27.269042] C [MSGID: 106425] [glusterd-utils.c:1125:glusterd_brickinfo_new_from_brick] 0-management: realpath () failed for brick /bricks/brick0/xz0. The underlying filesys tem may be in bad state [Input/output error] [2016-05-12 05:03:27.269086] E [MSGID: 106256] [glusterd-brick-ops.c:1049:__glusterd_handle_remove_brick] 0-management: Incorrect brick 10.70.42.77:/bricks/brick0/xz0 for volume Dis [Invalid argument] [2016-05-12 05:03:27.269100] E [MSGID: 106265] [glusterd-brick-ops.c:1090:__glusterd_handle_remove_brick] 0-management: Incorrect brick 10.70.42.77:/bricks/brick0/xz0 for volume Dis The message "I [MSGID: 106499] [glusterd-handler.c:4330:__glusterd_handle_status_volume] 0-management: Received status volume req for volume Dis" repeated 10 times between [2016-05-12 05:02:28.616648] and [2016-05-12 05:02:43.870331] The message "I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 31 times between [2016-05-12 05:02:39.702516] and [2016-05-12 05:04:10.231004] [2016-05-12 05:04:13.231728] I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. [2016-05-12 05:04:22.660291] I [MSGID: 106484] [glusterd-brick-ops.c:837:__glusterd_handle_remove_brick] 0-management: Received rem brick req [2016-05-12 05:04:22.660947] E [MSGID: 106256] [glusterd-brick-ops.c:1049:__glusterd_handle_remove_brick] 0-management: Incorrect brick 10.70.42.77:/bricks/brick2/xz0 for volume Dis [Invalid argument] [2016-05-12 05:04:22.660984] E [MSGID: 106265] [glusterd-brick-ops.c:1090:__glusterd_handle_remove_brick] 0-management: Incorrect brick 10.70.42.77:/bricks/brick2/xz0 for volume Dis [2016-05-12 05:04:43.236001] W [socket.c:701:__socket_rwv] 0-management: readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed (Invalid argument) [2016-05-12 05:05:34.114705] I [MSGID: 106484] [glusterd-brick-ops.c:837:__glusterd_handle_remove_brick] 0-management: Received rem brick req [2016-05-12 05:05:34.115142] E [MSGID: 106256] [glusterd-brick-ops.c:1049:__glusterd_handle_remove_brick] 0-management: Incorrect brick 10.70.42.77:/bricks/brick2/xz0 for volume Dis [Invalid argument] [2016-05-12 05:05:34.115144] E [MSGID: 106265] [glusterd-brick-ops.c:1090:__glusterd_handle_remove_brick] 0-management: Incorrect brick 10.70.42.77:/bricks/brick2/xz0 for volume Dis The message "I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 times between [2016-05-12 05:04:13.231728] and [2016-05-12 05:06:10.249082] [2016-05-12 05:06:13.249694] I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. [2016-05-12 05:06:49.254924] W [socket.c:701:__socket_rwv] 0-management: readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed (Invalid argument) [2016-05-12 05:07:33.633100] I [MSGID: 106505] [glusterd-replace-brick.c:76:__glusterd_handle_replace_brick] 0-management: Received replace brick req [2016-05-12 05:07:33.633157] I [MSGID: 106503] [glusterd-replace-brick.c:136:__glusterd_handle_replace_brick] 0-management: Received replace brick commit-force request operation [2016-05-12 05:07:33.635942] C [MSGID: 106425] [glusterd-utils.c:1125:glusterd_brickinfo_new_from_brick] 0-management: realpath () failed for brick /bricks/brick0/xz0. The underlying filesystem may be in bad state [Input/output error] [2016-05-12 05:07:33.636000] W [MSGID: 106122] [glusterd-mgmt.c:179:gd_mgmt_v3_pre_validate_fn] 0-management: Replace-brick prevalidation failed. [2016-05-12 05:07:33.636013] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Replace brick on local node [2016-05-12 05:07:33.636022] E [MSGID: 106122] [glusterd-replace-brick.c:851:glusterd_mgmt_v3_initiate_replace_brick_cmd_phases] 0-management: Pre Validation Failed The message "I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 times between [2016-05-12 05:06:13.249694] and [2016-05-12 05:08:10.267819] [2016-05-12 05:08:13.268731] I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. [2016-05-12 05:08:55.275005] W [socket.c:701:__socket_rwv] 0-management: readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed (Invalid argument) The message "I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 times between [2016-05-12 05:08:13.268731] and [2016-05-12 05:10:10.286403] [2016-05-12 05:10:13.286896] I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. [2016-05-12 05:11:01.294075] W [socket.c:701:__socket_rwv] 0-management: readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed (Invalid argument) The message "I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 times between [2016-05-12 05:10:13.286896] and [2016-05-12 05:12:10.305179] [2016-05-12 05:12:13.305858] I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. [2016-05-12 05:13:07.313875] W [socket.c:701:__socket_rwv] 0-management: readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed (Invalid argument) The message "I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 times between [2016-05-12 05:12:13.305858] and [2016-05-12 05:14:10.323962] [2016-05-12 05:14:13.324396] I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. [2016-05-12 05:15:13.333714] W [socket.c:701:__socket_rwv] 0-management: readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed (Invalid argument) The message "I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 times bet: [2016-05-12 05:15:13.333714] W [socket.c:701:__socket_rwv] 0-management: readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed (Invalid argument) The message "I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 times between [2016-05-12 05:14:13.324396] and [2016-05-12 05:16:10.344242] [2016-05-12 05:16:13.344693] I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. [2016-05-12 05:17:19.354251] W [socket.c:701:__socket_rwv] 0-management: readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed (Invalid argument) The message "I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 times between [2016-05-12 05:16:13.344693] and [2016-05-12 05:18:10.361976] [2016-05-12 05:18:13.362516] I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. (END) (In reply to Byreddy from comment #2) > [2016-05-12 05:03:27.268957] I [MSGID: 106484] > [glusterd-brick-ops.c:837:__glusterd_handle_remove_brick] 0-management: > Received rem brick req > [2016-05-12 05:03:27.269042] C [MSGID: 106425] > [glusterd-utils.c:1125:glusterd_brickinfo_new_from_brick] 0-management: > realpath () failed for brick /bricks/brick0/xz0. The underlying filesys > tem may be in bad state [Input/output error] > [2016-05-12 05:03:27.269086] E [MSGID: 106256] > [glusterd-brick-ops.c:1049:__glusterd_handle_remove_brick] 0-management: > Incorrect brick 10.70.42.77:/bricks/brick0/xz0 for volume Dis [Invalid > argument] > [2016-05-12 05:03:27.269100] E [MSGID: 106265] > [glusterd-brick-ops.c:1090:__glusterd_handle_remove_brick] 0-management: > Incorrect brick 10.70.42.77:/bricks/brick0/xz0 for volume Dis > The message "I [MSGID: 106499] > [glusterd-handler.c:4330:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume Dis" repeated 10 times between > [2016-05-12 05:02:28.616648] and [2016-05-12 05:02:43.870331] > The message "I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 31 > times between [2016-05-12 05:02:39.702516] and [2016-05-12 05:04:10.231004] > [2016-05-12 05:04:13.231728] I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. > [2016-05-12 05:04:22.660291] I [MSGID: 106484] > [glusterd-brick-ops.c:837:__glusterd_handle_remove_brick] 0-management: > Received rem brick req > [2016-05-12 05:04:22.660947] E [MSGID: 106256] > [glusterd-brick-ops.c:1049:__glusterd_handle_remove_brick] 0-management: > Incorrect brick 10.70.42.77:/bricks/brick2/xz0 for volume Dis [Invalid > argument] > [2016-05-12 05:04:22.660984] E [MSGID: 106265] > [glusterd-brick-ops.c:1090:__glusterd_handle_remove_brick] 0-management: > Incorrect brick 10.70.42.77:/bricks/brick2/xz0 for volume Dis > [2016-05-12 05:04:43.236001] W [socket.c:701:__socket_rwv] 0-management: > readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed > (Invalid argument) > [2016-05-12 05:05:34.114705] I [MSGID: 106484] > [glusterd-brick-ops.c:837:__glusterd_handle_remove_brick] 0-management: > Received rem brick req > [2016-05-12 05:05:34.115142] E [MSGID: 106256] > [glusterd-brick-ops.c:1049:__glusterd_handle_remove_brick] 0-management: > Incorrect brick 10.70.42.77:/bricks/brick2/xz0 for volume Dis [Invalid > argument] > [2016-05-12 05:05:34.115144] E [MSGID: 106265] > [glusterd-brick-ops.c:1090:__glusterd_handle_remove_brick] 0-management: > Incorrect brick 10.70.42.77:/bricks/brick2/xz0 for volume Dis > The message "I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 > times between [2016-05-12 05:04:13.231728] and [2016-05-12 05:06:10.249082] > [2016-05-12 05:06:13.249694] I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. > [2016-05-12 05:06:49.254924] W [socket.c:701:__socket_rwv] 0-management: > readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed > (Invalid argument) > [2016-05-12 05:07:33.633100] I [MSGID: 106505] > [glusterd-replace-brick.c:76:__glusterd_handle_replace_brick] 0-management: > Received replace brick req > [2016-05-12 05:07:33.633157] I [MSGID: 106503] > [glusterd-replace-brick.c:136:__glusterd_handle_replace_brick] 0-management: > Received replace brick commit-force request operation > [2016-05-12 05:07:33.635942] C [MSGID: 106425] > [glusterd-utils.c:1125:glusterd_brickinfo_new_from_brick] 0-management: > realpath () failed for brick /bricks/brick0/xz0. The underlying filesystem > may be in bad state [Input/output error] > [2016-05-12 05:07:33.636000] W [MSGID: 106122] > [glusterd-mgmt.c:179:gd_mgmt_v3_pre_validate_fn] 0-management: Replace-brick > prevalidation failed. > [2016-05-12 05:07:33.636013] E [MSGID: 106122] > [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre > Validation failed for operation Replace brick on local node > [2016-05-12 05:07:33.636022] E [MSGID: 106122] > [glusterd-replace-brick.c:851: > glusterd_mgmt_v3_initiate_replace_brick_cmd_phases] 0-management: Pre > Validation Failed > The message "I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 > times between [2016-05-12 05:06:13.249694] and [2016-05-12 05:08:10.267819] > [2016-05-12 05:08:13.268731] I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. > [2016-05-12 05:08:55.275005] W [socket.c:701:__socket_rwv] 0-management: > readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed > (Invalid argument) > The message "I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 > times between [2016-05-12 05:08:13.268731] and [2016-05-12 05:10:10.286403] > [2016-05-12 05:10:13.286896] I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. > [2016-05-12 05:11:01.294075] W [socket.c:701:__socket_rwv] 0-management: > readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed > (Invalid argument) > The message "I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 > times between [2016-05-12 05:10:13.286896] and [2016-05-12 05:12:10.305179] > [2016-05-12 05:12:13.305858] I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. > [2016-05-12 05:13:07.313875] W [socket.c:701:__socket_rwv] 0-management: > readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed > (Invalid argument) > The message "I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 > times between [2016-05-12 05:12:13.305858] and [2016-05-12 05:14:10.323962] > [2016-05-12 05:14:13.324396] I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. > [2016-05-12 05:15:13.333714] W [socket.c:701:__socket_rwv] 0-management: > readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed > (Invalid argument) > The message "I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 > times bet: > [2016-05-12 05:15:13.333714] W [socket.c:701:__socket_rwv] 0-management: > readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed > (Invalid argument) > The message "I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 > times between [2016-05-12 05:14:13.324396] and [2016-05-12 05:16:10.344242] > [2016-05-12 05:16:13.344693] I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. > [2016-05-12 05:17:19.354251] W [socket.c:701:__socket_rwv] 0-management: > readv on /var/run/gluster/2a73608302e3994566d9bae2ed12a2eb.socket failed > (Invalid argument) > The message "I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd." repeated 39 > times between [2016-05-12 05:16:13.344693] and [2016-05-12 05:18:10.361976] > [2016-05-12 05:18:13.362516] I [MSGID: 106005] > [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick > 10.70.42.77:/bricks/brick0/xz0 has disconnected from glusterd. > (END) Log info from glusterd RCA: We are going for a realpath () call for a brick which is to be removed which is unnecessary here and can cause a failure if the underlying file system for the same brick has crashed. Fix of BZ 1335357 will take care of this issue too and hence moving the state to Post. Even if we fix this issue you'd not end up removing a brick where the brick process is down as the validation is been introduced in 3.1.3 (BZ 1201205) and it makes sense to add this validation as otherwise the data migration from the brick can't happen. However with this fix you'd not see realpath () failures which can be the source of truth to prove the validity of this fix. Downstream patch : https://code.engineering.redhat.com/gerrit/#/c/74663/ Upstream patches: mainline : http://review.gluster.org/#/c/14306 release-3.7 : http://review.gluster.org/#/c/14410 release-3.8 : http://review.gluster.org/#/c/14411 Verified this bug using the build "glusterfs-3.7.9-6". replacing of offline brick is working good and removing of offline brick is not allowed because of new conditions added ( http://review.gluster.org/#/c/13306/ for bug-1201205 ) Based on the above details moving to verified state. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240 |