1367472 – [GSS]Quota version not changing in the quota.conf after upgrading to 3.1.1 from 3.0.x

Bug 1367472 - [GSS]Quota version not changing in the quota.conf after upgrading to 3.1.1 from 3.0.x

Summary: [GSS]Quota version not changing in the quota.conf after upgrading to 3.1.1 fr...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.1
Hardware:	All
OS:	All
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Sanoj Unnikrishnan
QA Contact:	Byreddy
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1351528 1351530 1371539 1392715 1392716 1392718
TreeView+	depends on / blocked

Reported:	2016-08-16 13:34 UTC by Riyas Abdulrasak
Modified:	2020-04-15 14:36 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.8.4-4
Doc Type:	Bug Fix
Doc Text:	Previously if a cluster was upgraded from Red Hat Gluster Storage 3.0.x to a version greater than or equal to 3.1.1, and any volumes had quotas enabled, attempts to run peer probe were rejected. This update ensures that peer probe can run as expected.
Clone Of:
Clones:	1371539 (view as bug list)
Environment:
Last Closed:	2017-03-23 05:45:01 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description Riyas Abdulrasak 2016-08-16 13:34:35 UTC

Description of problem:

Peer goes to rejected state after peer probing a new node to a RHGS 3.1.1 cluster. The cluster has 16 nodes already and multiple volumes.

Hostname: node_name
Uuid: e80d1a60-12ec-4fb3-a9d2-d19cf70f3dfb
State: Peer Rejected (Connected)

Diagnostic
----------------

The gluster logs from the node which we done the peer probe had below messages.

~~~~~~~~~~~

[2016-08-10 09:08:55.536503] I [MSGID: 106490] [glusterd-handler.c:2530:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: e80d1a60-12ec-4fb3-a9d2-d19cf70f3dfb
[2016-08-10 09:09:00.267679] E [MSGID: 106012] [glusterd-utils.c:2686:glusterd_compare_friend_volume] 0-management: Cksums of quota configuration of volume volname differ. local cksum = 1271429249, remote  cksum = 1405647489 on peer newnode
[2016-08-10 09:09:00.267982] I [MSGID: 106493] [glusterd-handler.c:3771:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to newnode (0), ret: 0
[2016-08-10 09:09:01.088596] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 78, Invalid argument
[2016-08-10 09:09:01.088672] E [socket.c:3019:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2016-08-10 09:09:01.383764] I [MSGID: 106493] [glusterd-rpc-ops.c:478:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: e80d1a60-12ec-4fb3-a9d2-d19cf70f3dfb, host: newnode, port: 0
[2016-08-10 09:07:43.925602] I [MSGID: 106490] [glusterd-handler.c:2875:__glusterd_handle_probe_query] 0-glusterd: Received probe from uuid: e80d1a60-12ec-4fb3-a9d2-d19cf70f3dfb
[2016-08-10 09:07:43.925754] I [MSGID: 106493] [glusterd-handler.c:2938:__glusterd_handle_probe_query] 0-glusterd: Responded to newnode, op_ret: 0, op_errno: 0, ret: 0

~~~~~~~~~~~

Action taken
------------------


1) We checked the below things with the customer based on the existing KCS and the BZs we found
  
     a) The node from which peer probe was done and the new node has the same version of RHGS , that was 3.1.1
     b) The new node or the old node was not upgraded recently.
     c) The volume in question had quota enabled and found the quota version and quota checksum differ in the two nodes.  No other difference in the volfile info was noticed.
     d)confirmed with customer that the volume in question is not a clone from any snapshot.


2) We requested the customer to follow the actions mentioned in the below knowledge base , but it didn't help

    https://access.redhat.com/solutions/1354563

3) We took the remote session , Performed the below steps

   i)   Tried the resolution in  https://access.redhat.com/solutions/1354563  &  https://access.redhat.com/solutions/2041033  found it is not working.

   ii) We performed the below steps

     a) Did peer detach of new node from the old node
     b) stopped glusterd on new node
     c) copied the quota.conf and quota.cksum from the old node.
     d) started glusterd
     e) did peer probe again
  
 Which also didn't help


Version-Release number of selected component (if applicable):

RHGS 3.1.1

How reproducible:

Not always reproducible. 


Actual results:

Peer goes to rejected state

Expected results:

Peer should be "in cluster" and "connected" state

Comment 7 Atin Mukherjee 2016-08-17 16:16:44 UTC

So our RCA is correct then

Comment 8 Atin Mukherjee 2016-08-31 04:19:03 UTC

http://review.gluster.org/15352 posted upstream for review.

Comment 14 Atin Mukherjee 2016-11-09 12:42:59 UTC

upstream mainline : http://review.gluster.org/15352
upstream 3.8 : http://review.gluster.org/15791
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/89554

Comment 16 Byreddy 2016-11-16 06:28:22 UTC

Verified this bug using the build -  glusterfs-3.8.4-5, Fix is working good.

Confirmed the fix with below steps:
===================================

1. Had two nodes having 3.0.4 build.
2. Created a simple volume with quota enabled
3. Updated to 3.1.1 build and done op-version bump up
4. Probed newly installed 3.1.1 rhgs node

Result of step-4: Peer status was in Rejected state and quota.conf version in  updated nodes was v1.1 and in newly installed node it's v1.2.

( Reported issue is reproduced )


5. Updated both 3.1.1 nodes to 3.2 and done op-version bump-up
6. Checked the quota.conf version, it's changed to v1.2 from v1.1
7. Probed new 3.2 node, probe is successful and peer status displayed correct result.


Moving to verified state based on above result.

Comment 20 errata-xmlrpc 2017-03-23 05:45:01 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.