Description of problem: Peers goes to rejected state after reboot of one node when quota is enabled on cloned volume Version-Release number of selected component (if applicable): glusterfs-3.7.5-19 How reproducible: 2/2 Steps to Reproduce: 1.Create a tiered volume, start it, enable quota and attach tier on the volume. Volume Name: tiervolume Type: Tier Volume ID: ec14de5c-45dc-4a3a-80f1-b5f5b569fab2 Status: Started Number of Bricks: 15 Transport-type: tcp Hot Tier : Hot Tier Type : Replicate Number of Bricks: 1 x 3 = 3 Brick1: 10.70.35.142:/bricks/brick3/b3 Brick2: 10.70.35.141:/bricks/brick3/b3 Brick3: 10.70.35.228:/bricks/brick3/b3 Cold Tier: Cold Tier Type : Distributed-Disperse Number of Bricks: 2 x (4 + 2) = 12 Brick4: 10.70.35.228:/bricks/brick0/b0 Brick5: 10.70.35.141:/bricks/brick0/b0 Brick6: 10.70.35.142:/bricks/brick0/b0 Brick7: 10.70.35.140:/bricks/brick0/b0 Brick8: 10.70.35.228:/bricks/brick1/b1 Brick9: 10.70.35.141:/bricks/brick1/b1 Brick10: 10.70.35.142:/bricks/brick1/b1 Brick11: 10.70.35.140:/bricks/brick1/b1 Brick12: 10.70.35.228:/bricks/brick2/b2 Brick13: 10.70.35.141:/bricks/brick2/b2 Brick14: 10.70.35.142:/bricks/brick2/b2 Brick15: 10.70.35.140:/bricks/brick2/b2 Options Reconfigured: features.barrier: disable cluster.tier-mode: cache features.ctr-enabled: on features.quota-deem-statfs: on features.inode-quota: on features.quota: on performance.readdir-ahead: on cluster.enable-shared-storage: enable 2.Create a snapshot of this volume and activate it. 3.Create a clone of this snapshot and start it. Observe in gluster volume info that quota is enabled on the cloned volume. Volume Name: clone1 Type: Tier Volume ID: 3dbf687c-c2cf-46c0-af4e-ca542c5bae0d Status: Started Number of Bricks: 15 Transport-type: tcp Hot Tier : Hot Tier Type : Replicate Number of Bricks: 1 x 3 = 3 Brick1: 10.70.35.142:/run/gluster/snaps/clone1/brick1/b3 Brick2: 10.70.35.141:/run/gluster/snaps/clone1/brick2/b3 Brick3: 10.70.35.228:/run/gluster/snaps/clone1/brick3/b3 Cold Tier: Cold Tier Type : Distributed-Disperse Number of Bricks: 2 x (4 + 2) = 12 Brick4: 10.70.35.228:/run/gluster/snaps/clone1/brick4/b0 Brick5: 10.70.35.141:/run/gluster/snaps/clone1/brick5/b0 Brick6: 10.70.35.142:/run/gluster/snaps/clone1/brick6/b0 Brick7: 10.70.35.140:/run/gluster/snaps/clone1/brick7/b0 Brick8: 10.70.35.228:/run/gluster/snaps/clone1/brick8/b1 Brick9: 10.70.35.141:/run/gluster/snaps/clone1/brick9/b1 Brick10: 10.70.35.142:/run/gluster/snaps/clone1/brick10/b1 Brick11: 10.70.35.140:/run/gluster/snaps/clone1/brick11/b1 Brick12: 10.70.35.228:/run/gluster/snaps/clone1/brick12/b2 Brick13: 10.70.35.141:/run/gluster/snaps/clone1/brick13/b2 Brick14: 10.70.35.142:/run/gluster/snaps/clone1/brick14/b2 Brick15: 10.70.35.140:/run/gluster/snaps/clone1/brick15/b2 Options Reconfigured: cluster.tier-mode: cache features.ctr-enabled: on features.quota-deem-statfs: on features.inode-quota: on features.quota: on performance.readdir-ahead: on cluster.enable-shared-storage: enable 4. Reboot one of the node and when it comes up, observe that the peer status shows other nodes in "Peer rejected state" [root@dhcp35-140 ~]# gluster peer status Number of Peers: 3 Hostname: dhcp35-228.lab.eng.blr.redhat.com Uuid: 66d2c49c-dd6c-4ba3-8840-e57e34dbaf3a State: Peer Rejected (Connected) Hostname: 10.70.35.141 Uuid: a7e0bb8a-d7bb-4d61-83e9-67de349bd250 State: Peer Rejected (Connected) Hostname: 10.70.35.142 Uuid: 8ea53341-1055-4288-8692-b1adc8244168 State: Peer Rejected (Connected) 5.Following messages are observed in glusterd logs [2016-02-16 08:53:28.318532] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2016-02-16 08:53:28.320218] I [MSGID: 106163] [glusterd-handshake.c:1194:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30707 The message "I [MSGID: 106163] [glusterd-handshake.c:1194:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30707" repeated 2 times between [2016-02-16 08:53:28.320218] and [2016-02-16 08:53:28.353309] [2016-02-16 08:53:28.415908] I [MSGID: 106490] [glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: a7e0bb8a-d7bb-4d61-83e9-67de349bd250 [2016-02-16 08:53:28.418308] E [MSGID: 106012] [glusterd-utils.c:2845:glusterd_compare_friend_volume] 0-management: Cksums of quota configuration of volume clone1 differ. local cksum = 1405646976, remote cksum = 0 on peer 10.70.35.141 [2016-02-16 08:53:28.418453] I [MSGID: 106493] [glusterd-handler.c:3780:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 10.70.35.141 (0), ret: 0 [2016-02-16 08:53:28.435497] I [MSGID: 106490] [glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 8ea53341-1055-4288-8692-b1adc8244168 [2016-02-16 08:53:28.436686] E [MSGID: 106012] [glusterd-utils.c:2845:glusterd_compare_friend_volume] 0-management: Cksums of quota configuration of volume clone1 differ. local cksum = 1405646976, remote cksum = 0 on peer 10.70.35.142 [2016-02-16 08:53:28.436797] I [MSGID: 106493] [glusterd-handler.c:3780:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 10.70.35.142 (0), ret: 0 [2016-02-16 08:53:28.460544] I [MSGID: 106490] [glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 66d2c49c-dd6c-4ba3-8840-e57e34dbaf3a [2016-02-16 08:53:28.461872] E [MSGID: 106012] [glusterd-utils.c:2845:glusterd_compare_friend_volume] 0-management: Cksums of quota configuration of volume clone1 differ. local cksum = 1405646976, remote cksum = 0 on peer dhcp35-228.lab.eng.blr.redhat.com [2016-02-16 08:53:28.461972] I [MSGID: 106493] [glusterd-handler.c:3780:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to dhcp35-228.lab.eng.blr.redhat.com (0), ret: 0 [2016-02-16 08:53:28.475055] I [MSGID: 106493] [glusterd-rpc-ops.c:481:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: 8ea53341-1055-4288-8692-b1adc8244168, host: 10.70.35.142, port: 0 [2016-02-16 08:53:28.478578] I [MSGID: 106493] [glusterd-rpc-ops.c:481:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: 66d2c49c-dd6c-4ba3-8840-e57e34dbaf3a, host: dhcp35-228.lab.eng.blr.redhat.com, port: 0 [2016-02-16 08:53:28.482286] I [MSGID: 106493] [glusterd-rpc-ops.c:481:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: a7e0bb8a-d7bb-4d61-83e9-67de349bd250, host: 10.70.35.141, port: 0 6.After checking under /var/lib/glusterd/snaps/snap-name/snap-id and /var/lib/glusterd/vols/clone1, it is seen that "quota.cksum" file is missing which is a part of tiervolume under /var/lib/glusterd/vols/tiervolume [root@dhcp35-228 tiervolume]# ls bricks tiervolume.10.70.35.140.bricks-brick0-b0.vol tiervolume.10.70.35.142.bricks-brick2-b2.vol cksum tiervolume.10.70.35.140.bricks-brick1-b1.vol tiervolume.10.70.35.142.bricks-brick3-b3.vol info tiervolume.10.70.35.140.bricks-brick2-b2.vol tiervolume.10.70.35.228.bricks-brick0-b0.vol node_state.info tiervolume.10.70.35.141.bricks-brick0-b0.vol tiervolume.10.70.35.228.bricks-brick1-b1.vol quota.cksum tiervolume.10.70.35.141.bricks-brick1-b1.vol tiervolume.10.70.35.228.bricks-brick2-b2.vol quota.conf tiervolume.10.70.35.141.bricks-brick2-b2.vol tiervolume.10.70.35.228.bricks-brick3-b3.vol run tiervolume.10.70.35.141.bricks-brick3-b3.vol tiervolume-rebalance.vol snapd.info tiervolume.10.70.35.142.bricks-brick0-b0.vol tiervolume.tcp-fuse.vol tier tiervolume.10.70.35.142.bricks-brick1-b1.vol trusted-tiervolume.tcp- Actual results: quota.cksum file is not getting copied after creating snapshot and clone. Expected results: quota.cksum file should also get copied after creation of snapshot and clone and the peers should not go into peer rejected state once a node is rebooted. Additional info:
It looks great. Thanks Laura
Master URL: http://review.gluster.org/#/c/13760/ (IN REVIEW)
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions
Patches: Upstream master : http://review.gluster.org/13760 Upstream release-3.7 : http://review.gluster.org/14047 Downstream : https://code.engineering.redhat.com/gerrit/73092
Volume Name: vol Type: Distributed-Replicate Volume ID: b1b9fcd8-8025-4884-8d7a-f99a346f3a18 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.46.4:/rhs/brick1/b1 Brick2: 10.70.47.46:/rhs/brick1/b2 Brick3: 10.70.46.213:/rhs/brick1/b3 Brick4: 10.70.46.148:/rhs/brick1/b4 Options Reconfigured: performance.readdir-ahead: on features.quota: on features.inode-quota: on features.quota-deem-statfs: on features.bitrot: on features.scrub: Active features.barrier: disable ========================= [root@dhcp46-4 ~]# gluster v attach-tier vol replica 2 10.70.46.4:/rhs/brick2/b1 10.70.47.46:/rhs/brick2/b2 10.70.46.213:/rhs/brick2/b3 10.70.46.148:/rhs/brick2/b4 volume attach-tier: success Tiering Migration Functionality: vol: success: Attach tier is successful on vol. use tier status to check the status. ID: 0f504264-81c8-46e4-85b8-348ea76123b3 ================================================ [root@dhcp46-4 ~]# gluster v info vol Volume Name: vol Type: Tier Volume ID: b1b9fcd8-8025-4884-8d7a-f99a346f3a18 Status: Started Number of Bricks: 8 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick1: 10.70.46.148:/rhs/brick2/b4 Brick2: 10.70.46.213:/rhs/brick2/b3 Brick3: 10.70.47.46:/rhs/brick2/b2 Brick4: 10.70.46.4:/rhs/brick2/b1 Cold Tier: Cold Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick5: 10.70.46.4:/rhs/brick1/b1 Brick6: 10.70.47.46:/rhs/brick1/b2 Brick7: 10.70.46.213:/rhs/brick1/b3 Brick8: 10.70.46.148:/rhs/brick1/b4 Options Reconfigured: cluster.tier-mode: cache features.ctr-enabled: on performance.readdir-ahead: on features.quota: on features.inode-quota: on features.quota-deem-statfs: on features.bitrot: on features.scrub: Active features.barrier: disable [root@dhcp46-4 ~]# =========================================== [root@dhcp46-4 ~]# gluster peer status Number of Peers: 3 Hostname: 10.70.47.46 Uuid: 112df27c-d246-4b89-9b24-f52536da263c State: Peer in Cluster (Connected) Hostname: 10.70.46.213 Uuid: 0e6f19f6-3dde-487c-a10f-c1c53b37ed2b State: Peer in Cluster (Connected) Hostname: 10.70.46.148 Uuid: fc406ac0-2cd5-4aef-ab21-77707f7a17d0 State: Peer in Cluster (Connected) =========================================== [root@dhcp46-4 ~]# gluster snapshot create snap2 vol no-timestamp snapshot create: success: Snap snap2 created successfully [root@dhcp46-4 ~]# gluster snapshot activate snap2 Snapshot activate: snap2: Snap activated successfully [root@dhcp46-4 ~]# gluster snapshot clone clone2 snap2 snapshot clone: success: Clone clone2 created successfully ============================================ [root@dhcp46-4 ~]# init 6 Connection to 10.70.46.4 closed by remote host. Connection to 10.70.46.4 closed. [ashah@localhost ~]$ ssh root.46.4 root.46.4's password: Last login: Tue May 3 20:37:13 2016 from dhcp-0-50.blr.redhat.com [root@dhcp46-4 ~]# gluster peer status Number of Peers: 3 Hostname: 10.70.47.46 Uuid: 112df27c-d246-4b89-9b24-f52536da263c State: Peer in Cluster (Connected) Hostname: 10.70.46.213 Uuid: 0e6f19f6-3dde-487c-a10f-c1c53b37ed2b State: Peer in Cluster (Connected) Hostname: 10.70.46.148 Uuid: fc406ac0-2cd5-4aef-ab21-77707f7a17d0 State: Peer in Cluster (Connected) Bug verified on build glusterfs-3.7.9-3.el7rhgs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240