Bug 1335357
Summary: | Modified volume options are not syncing once glusterd comes up. | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Byreddy <bsrirama> | |
Component: | glusterd | Assignee: | Atin Mukherjee <amukherj> | |
Status: | CLOSED ERRATA | QA Contact: | Byreddy <bsrirama> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.1 | CC: | asrivast, rhinduja, rhs-bugs, storage-qa-internal, vbellur | |
Target Milestone: | --- | Keywords: | Regression, ZStream | |
Target Release: | RHGS 3.1.3 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.7.9-6 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1335531 (view as bug list) | Environment: | ||
Last Closed: | 2016-06-23 05:22:40 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1311817, 1335359, 1335367, 1335531, 1337113, 1337114 |
Description
Byreddy
2016-05-12 04:08:08 UTC
i will provide the logs glusterd log from node-1: ========================= [2016-05-12 05:55:30.764115] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2016-05-12 05:55:30.766892] I [MSGID: 106163] [glusterd-handshake.c:1194:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30707 [2016-05-12 05:55:30.825808] I [MSGID: 106490] [glusterd-handler.c:2600:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: f6dea3b5-a249-4108-9ac8-0d0319e9ed5d [2016-05-12 05:55:30.827399] I [MSGID: 106009] [glusterd-utils.c:2824:glusterd_compare_friend_volume] 0-management: Version of volume Dis differ. local version = 2, remote version = 3 on peer 10.70.43.151 [2016-05-12 05:55:30.827750] C [MSGID: 106425] [glusterd-utils.c:3134:glusterd_import_new_brick] 0-management: realpath() failed for brick /bricks/brick0/h0. The underlying file system may be in bad state [Input/output error] [2016-05-12 05:55:30.827838] E [MSGID: 106376] [glusterd-sm.c:1401:glusterd_friend_sm] 0-glusterd: handler returned: -1 [2016-05-12 05:55:30.834463] I [MSGID: 106493] [glusterd-rpc-ops.c:481:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: f6dea3b5-a249-4108-9ac8-0d0319e9ed5d, host: 10.70.43.151, port: 0 [2016-05-12 05:55:30.890545] I [MSGID: 106492] [glusterd-handler.c:2776:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: f6dea3b5-a249-4108-9ac8-0d0319e9ed5d [2016-05-12 05:55:30.890595] I [MSGID: 106502] [glusterd-handler.c:2821:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2016-05-12 05:55:30.891211] I [rpc-clnt.c:984:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600 [2016-05-12 05:55:31.892768] I [MSGID: 106540] [glusterd-utils.c:4384:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered MOUNTV3 successfully [2016-05-12 05:55:31.893132] I [MSGID: 106540] [glusterd-utils.c:4393:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered MOUNTV1 successfully [2016-05-12 05:55:31.893455] I [MSGID: 106540] [glusterd-utils.c:4402:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered NFSV3 successfully [2016-05-12 05:55:31.893808] I [MSGID: 106540] [glusterd-utils.c:4411:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered NLM v4 successfully [2016-05-12 05:55:31.894266] I [MSGID: 106540] [glusterd-utils.c:4420:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered NLM v1 successfully [2016-05-12 05:55:31.894626] I [MSGID: 106540] [glusterd-utils.c:4429:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered ACL v3 successfully [2016-05-12 05:55:31.900965] W [socket.c:3133:socket_connect] 0-nfs: Ignore failed connection attempt on , (No such file or directory) [2016-05-12 05:55:31.901072] I [rpc-clnt.c:984:rpc_clnt_connection_init] 0-glustershd: setting frame-timeout to 600 [2016-05-12 05:55:31.901196] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd already stopped [2016-05-12 05:55:31.901274] I [rpc-clnt.c:984:rpc_clnt_connection_init] 0-quotad: setting frame-timeout to 600 [2016-05-12 05:55:31.901408] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad already stopped [2016-05-12 05:55:31.901463] I [rpc-clnt.c:984:rpc_clnt_connection_init] 0-bitd: setting frame-timeout to 600 [2016-05-12 05:55:31.901589] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped [2016-05-12 05:55:31.901636] I [rpc-clnt.c:984:rpc_clnt_connection_init] 0-scrub: setting frame-timeout to 600 [2016-05-12 05:55:31.901745] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped [2016-05-12 05:55:31.909398] I [rpc-clnt.c:984:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2016-05-12 05:55:31.909624] I [rpc-clnt.c:984:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600 [2016-05-12 05:55:31.909847] I [MSGID: 106493] [glusterd-rpc-ops.c:696:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: f6dea3b5-a249-4108-9ac8-0d0319e9ed5d [2016-05-12 05:55:31.910016] W [socket.c:701:__socket_rwv] 0-nfs: readv on /var/run/gluster/b7f9049f47b3ab44cdbaf600d7f5c6f8.socket failed (Invalid argument) [2016-05-12 05:55:31.910034] I [MSGID: 106006] [glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_notify] 0-management: nfs has disconnected from glusterd. [2016-05-12 05:55:31.910296] W [socket.c:701:__socket_rwv] 0-management: readv on /var/run/gluster/c1eec530a1c811faf8e3d20e6c09c320.socket failed (Invalid argument) [2016-05-12 05:55:31.910527] I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/h0 has disconnected from glusterd. [2016-05-12 05:55:34.993921] I [MSGID: 106499] [glusterd-handler.c:4330:__glusterd_handle_status_volume] 0-management: Received status volume req for volume Dis [2016-05-12 05:55:35.742207] I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/h0 has disconnected from glusterd. [2016-05-12 05:55:36.189912] I [MSGID: 106499] [glusterd-handler.c:4330:__glusterd_handle_status_volume] 0-management: Received status volume req for volume Dis [2016-05-12 05:55:38.742298] I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/h0 has disconnected from glusterd. [2016-05-12 05:55:41.742760] I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/h0 has disconnected from glusterd. [2016-05-12 05:55:41.962352] I [MSGID: 106488] [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2016-05-12 05:55:41.963934] I [MSGID: 106488] [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2016-05-12 05:55:44.743559] I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/h0 has disconnected from glusterd. (END) glusterd log from node-2 ======================== [2016-05-12 05:55:27.968405] W [socket.c:984:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 7, Invalid argument [2016-05-12 05:55:27.968450] E [socket.c:3089:socket_connect] 0-management: Failed to set keep-alive: Invalid argument The message "I [MSGID: 106004] [glusterd-handler.c:5192:__glusterd_peer_rpc_notify] 0-management: Peer <dhcp42-77.lab.eng.blr.redhat.com> (<f6f03f6a-dc88-4592-95b4-76482b1742d5>), in state <Peer in Cluster>, has disconnected from glusterd." repeated 11 times between [2016-05-12 05:54:44.009808] and [2016-05-12 05:55:24.962631] [2016-05-12 05:55:30.769240] I [MSGID: 106163] [glusterd-handshake.c:1194:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30707 [2016-05-12 05:55:30.828060] I [MSGID: 106490] [glusterd-handler.c:2600:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: f6f03f6a-dc88-4592-95b4-76482b1742d5 [2016-05-12 05:55:30.836792] I [MSGID: 106493] [glusterd-handler.c:3842:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to dhcp42-77.lab.eng.blr.redhat.com (0), ret: 0 [2016-05-12 05:55:30.894602] I [MSGID: 106493] [glusterd-rpc-ops.c:696:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: f6f03f6a-dc88-4592-95b4-76482b1742d5 [2016-05-12 05:55:30.894652] I [MSGID: 106492] [glusterd-handler.c:2776:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: f6f03f6a-dc88-4592-95b4-76482b1742d5 [2016-05-12 05:55:30.894683] I [MSGID: 106502] [glusterd-handler.c:2821:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend The message "I [MSGID: 106488] [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req" repeated 3 times between [2016-05-12 05:54:20.746141] and [2016-05-12 05:55:18.238141] (END) RCA : Since the volume configuration was changed on glusterd instance on Node 1 where glusterd was down on Node 2, when glusterd restarts on Node 2 importing the updated volume happens at Node 2. However the brickinfo objects are created freshly and here since the underlying the file system was crashed, realpath () call in glusterd_import_new_brick () fails resulting into importing the volume and volume data goes inconsistent. An upstream patch http://review.gluster.org/14306 posted for review. Downstream patch : https://code.engineering.redhat.com/gerrit/#/c/74663/ Upstream patches: mainline : http://review.gluster.org/#/c/14306 release-3.7 : http://review.gluster.org/#/c/14410 release-3.8 : http://review.gluster.org/#/c/14411 Verified this bug using the build "glusterfs-3.7.9-6" and found that fix is working good. Steps done: Repeated the reproducing steps mentioned in the description section. Moving to verified state. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240 |