Bug 839213

Summary: A node that was offline is not getting volume changes applied to it when it rejoins the cluster
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Patric Uebele <puebele>
Component: glusterdAssignee: krishnan parthasarathi <kparthas>
Status: CLOSED WONTFIX QA Contact: Sudhir D <sdharane>
Severity: high Docs Contact:
Priority: medium    
Version: 2.0CC: amarts, gluster-bugs, jschrode, nsathyan
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
A volume deleted in the absence of one of the peers wouldn't be removed from the cluster's list of volumes. This is because the 'import' logic of peers that rejoin the cluster is not capable of differentiating between volumes deleted and volumes added in the absence of the other (conflicting) peers. For now, we intend to manually detect it which may involve analysing cli cmd logs to get the cluster view of the volumes that 'ought' to be present. Once we arrive at this picture, we could use volume-sync to reconcile the skewed view of volumes in the cluster.
Story Points: ---
Clone Of:
: 858419 (view as bug list) Environment:
Last Closed: 2013-01-03 10:54:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 858419    

Description Patric Uebele 2012-07-11 09:00:36 UTC
Description of problem:
When you do configuration changes (adding/removing volumes, adding bricks to a volume) while a node in the trusted storage pool is down (e.g. for maintenance), these changes are not reflected on the "down" node once it rejoins the pool.

Version-Release number of selected component (if applicable):
2.0


How reproducible: Reproducible


Steps to Reproduce:
1. Shutdown one node of the pool (rhs1-3)
2. On one of the remaining nodes, change volume configs (remove volume, add volume, add brick to a volume)
3. Start the node that was down
  
Actual results:
[root@rhs1-2 ~]# gluster volume list
repvol1
distvol1
testvol2  <-- this volume was added while rhs1-3 was down

[root@rhs1-3 ~]# gluster volume list
distvol1
repvol1
testvol1  <-- this vol was deleted while rhs1-3 was down


[root@rhs1-2 ~]# gluster volume info distvol1
 
Volume Name: distvol1
Type: Distribute
Volume ID: 186892db-e18e-4c85-9dd3-38f248e26c02
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: rhs1-1:/export2
Brick2: rhs1-2:/export2
Brick3: rhs1-3:/export2
Brick4: rhs1-1:/export5  <-- this brick was added while rhs1-3 was down

[root@rhs1-3 ~]# gluster volume info distvol1
 
Volume Name: distvol1
Type: Distribute
Volume ID: 186892db-e18e-4c85-9dd3-38f248e26c02
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: rhs1-1:/export2
Brick2: rhs1-2:/export2
Brick3: rhs1-3:/export2



Expected results:
Consistent volume information across all nodes, should be synced automatically once a node rejoins


Additional info:

Comment 2 Amar Tumballi 2012-07-11 10:32:33 UTC
as long as I knew, the syncing should have happened fine.

Kaushal, when you get a chance can you have a look on this?

Comment 3 Patric Uebele 2012-07-11 12:20:58 UTC
Btw., /etc/fstab and /etc/samba/smb.conf don't get synced during rejoin, too.

Comment 4 krishnan parthasarathi 2012-11-29 10:01:28 UTC
A volume deleted in the absence of one of the peers wouldn't be removed from the cluster's list of volumes. This is because the 'import' logic of peers that rejoin the cluster is not capable of differentiating between volumes deleted and volumes added in the absence of the other (conflicting) peers.

For now, we intend to manually detect it which may involve analysing cli cmd logs to get the cluster view of the volumes that 'ought' to be present. Once we arrive at this picture, we could use volume-sync to reconcile the skewed view of volumes in the cluster.

Bricks added/deleted to a volume while some of the peers were down/unreachable, are 'imported' as they rejoin the cluster. This works fine on upstream/master.

Comment 5 Amar Tumballi 2012-11-29 10:08:14 UTC
marking it for known issues (for 3.4.0 release?). Patric, let us know if comment#4 has provided you with right information. Would like to close it after documenting, as worksforme in that case.

Comment 6 krishnan parthasarathi 2012-12-14 06:38:52 UTC
Patric, Could you let us know if you are in agreement with comment#4 ?

Comment 7 Patric Uebele 2012-12-20 09:12:21 UTC
Hi, thank you. Yes, I'm in agreement with comment #4, please document the behaviour for the time being.

Best regards,

Patric

Comment 8 Patric Uebele 2013-01-25 09:27:31 UTC
Btw., the behaviour is the same if you change volume tunables on a volume while a node is down. This leads to quite nasty incositencies.

Please document this, too.

Patric