Bug 1005043 - Inconsistent volumes in peers
Summary: Inconsistent volumes in peers
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kaushal
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-09-06 06:11 UTC by Kaushal
Modified: 2014-04-17 11:47 UTC (History)
1 user (show)

Fixed In Version: glusterfs-3.5.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-04-17 11:47:27 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Kaushal 2013-09-06 06:11:19 UTC
The volume op-versions are calculated during a volume set/reset, reading a volume from disk and importing a volume during probe or volume sync. The calculation of the volume op-version depends on the clusters op-version as some features are enabled automatically depending on the clusters op-version. We also don't store the volume op-versions persistently and don't export the volume op-versions during sync. Due to this, there can occur cases which will lead to inconsistencies in volumes in different peers. One such case is below,

Consider, a cluster made up 3 peers P1, P2 and P3, operating at op-version N. The cluster has two volumes V1 and V2, which have volume op-versions N (since volume op-version cannot be greater than cluster op-version). We have,
 Cluster-op-version = N
 V1 op-version = N
 V2 op-version = N
A set operation on V1 causes the clusters op-version to be bumped up to N+1. Assume that there exist some features that are automatically enabled on op-version N+1. The op-version of V2 remains at N as no operation has been performed on it. So,
 Cluster op-version = N+1
 V1 op-version = N+1
 V2 op-version = N
Now, we probe a new peer P4. On the new peer we will have the following op-versions,
 Cluster op-version = N+1
 V1 op-version = N+1
 V2 op-version = N+1 
This happens because we don't send volume op-versions during the sync after probe. P4 will freshly calculate the op-version of V2 (assuming features have been auto enabled due to the cluster op-version being N+1) as N+1.

Another case is when glusterd on a peer restarts. Assume P3 was restarted, glusterd will recalculate the volume op-versions during the restore state. Again, op-version of V2 will be calculated as N+1 assuming auto enabled features. This will lead to inconsistency in the volume representation in memory and on disk, as glusterd will assume the volume contains auto enabled features, but the volfiles don't contain them as they were not regenrated.

These kind of issues can be solved by persistenting the volume op-versions and sharing them during sync.

Comment 1 Anand Avati 2013-09-06 06:13:37 UTC
REVIEW: http://review.gluster.org/5568 (glusterd: Calculate volume op-versions only on set) posted (#2) for review on master by Kaushal M (kaushal)

Comment 2 Anand Avati 2013-09-06 08:43:08 UTC
REVIEW: http://review.gluster.org/5832 (glusterd: Calculate volume op-versions only on set) posted (#1) for review on release-3.4 by Kaushal M (kaushal)

Comment 3 Anand Avati 2013-09-10 06:45:30 UTC
REVIEW: http://review.gluster.org/5568 (glusterd: Calculate volume op-versions only on set) posted (#3) for review on master by Kaushal M (kaushal)

Comment 4 Anand Avati 2013-09-13 06:32:28 UTC
REVIEW: http://review.gluster.org/5906 (glusterd: Calculate volume op-versions only on set) posted (#1) for review on release-3.4 by Kaushal M (kaushal)

Comment 5 Anand Avati 2013-09-13 07:17:18 UTC
REVIEW: http://review.gluster.org/5568 (glusterd: Calculate volume op-versions only on set/reset) posted (#4) for review on master by Kaushal M (kaushal)

Comment 6 Anand Avati 2013-09-13 07:21:21 UTC
REVIEW: http://review.gluster.org/5832 (glusterd: Calculate volume op-versions only on set/reset) posted (#2) for review on release-3.4 by Kaushal M (kaushal)

Comment 7 Anand Avati 2013-09-13 19:13:00 UTC
COMMIT: http://review.gluster.org/5832 committed in release-3.4 by Anand Avati (avati) 
------
commit 536eccde0bbda0166ca2a2769069e6b9f7ecbf89
Author: Kaushal M <kaushal>
Date:   Mon Aug 12 10:43:52 2013 +0530

    glusterd: Calculate volume op-versions only on set/reset
    
      Backport of http://review.gluster.org/5568
    
    The volume op-versions are calculated during a volume set/reset, reading a
    volume from disk and importing a volume during probe or volume sync. The
    calculation of the volume op-version depends on the clusters op-version as some
    features are enabled automatically depending on the clusters op-version. We
    also don't store the volume op-versions persistently and don't export the
    volume op-versions during sync. Due to this, there can occur cases which will
    lead to inconsistencies in volumes in different peers. One such case is below,
    Consider, a cluster made up 3 peers P1, P2 and P3, operating at op-version N.
    The cluster has two volumes V1 and V2, which have volume op-versions N (since
    volume op-version cannot be greater than cluster op-version). We have,
     Cluster-op-version = N
     V1 op-version = N
     V2 op-version = N
    A set operation on V1 causes the clusters op-version to be bumped up to N+1.
    Assume that there exist some features that are automatically enabled on
    op-version N+1. The op-version of V2 remains at N as no operation has been
    performed on it. So,
     Cluster op-version = N+1
     V1 op-version = N+1
     V2 op-version = N
    Now, we probe a new peer P4. On the new peer we will have the following
    op-versions,
     Cluster op-version = N+1
     V1 op-version = N+1
     V2 op-version = N+1
    This happens because we don't send volume op-versions during the sync after
    probe. P4 will freshly calculate the op-version of V2 (assuming features have
    been auto enabled due to the cluster op-version being N+1) as N+1.
    Another case is when glusterd on a peer restarts. Assume P3 was restarted,
    glusterd will recalculate the volume op-versions during the restore state.
    Again, op-version of V2 will be calculated as N+1 assuming auto enabled
    features. This will lead to inconsistency in the volume representation in
    memory and on disk, as glusterd will assume the volume contains auto enabled
    features, but the volfiles don't contain them as they were not regenrated.
    These kind of issues can be solved by calculating the volume op-version only
    when features are enabled and disabled (ie. during volume set/reset),
    persisting the volume-op-versions and exporting/importing them.
    
    BUG: 1005043
    Change-Id: Id8bb05ba2a77e510739b3b1833f98b4d6d1fa4d7
    Signed-off-by: Kaushal M <kaushal>
    Reviewed-on: http://review.gluster.org/5832
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Anand Avati <avati>

Comment 8 Anand Avati 2013-09-30 04:40:24 UTC
COMMIT: http://review.gluster.org/5568 committed in master by Anand Avati (avati) 
------
commit 9b286d5937e7c78fd17185e9afe25e809153a265
Author: Kaushal M <kaushal>
Date:   Mon Aug 12 10:43:52 2013 +0530

    glusterd: Calculate volume op-versions only on set/reset
    
    The volume op-versions are calculated during a volume set/reset, reading a
    volume from disk and importing a volume during probe or volume sync. The
    calculation of the volume op-version depends on the clusters op-version as some
    features are enabled automatically depending on the clusters op-version. We
    also don't store the volume op-versions persistently and don't export the
    volume op-versions during sync. Due to this, there can occur cases which will
    lead to inconsistencies in volumes in different peers. One such case is below,
    
    Consider, a cluster made up 3 peers P1, P2 and P3, operating at op-version N.
    The cluster has two volumes V1 and V2, which have volume op-versions N (since
    volume op-version cannot be greater than cluster op-version). We have,
     Cluster-op-version = N
     V1 op-version = N
     V2 op-version = N
    A set operation on V1 causes the clusters op-version to be bumped up to N+1.
    Assume that there exist some features that are automatically enabled on
    op-version N+1. The op-version of V2 remains at N as no operation has been
    performed on it. So,
     Cluster op-version = N+1
     V1 op-version = N+1
     V2 op-version = N
    Now, we probe a new peer P4. On the new peer we will have the following
    op-versions,
     Cluster op-version = N+1
     V1 op-version = N+1
     V2 op-version = N+1
    This happens because we don't send volume op-versions during the sync after
    probe. P4 will freshly calculate the op-version of V2 (assuming features have
    been auto enabled due to the cluster op-version being N+1) as N+1.
    
    Another case is when glusterd on a peer restarts. Assume P3 was restarted,
    glusterd will recalculate the volume op-versions during the restore state.
    Again, op-version of V2 will be calculated as N+1 assuming auto enabled
    features. This will lead to inconsistency in the volume representation in
    memory and on disk, as glusterd will assume the volume contains auto enabled
    features, but the volfiles don't contain them as they were not regenrated.
    
    These kind of issues can be solved by calculating the volume op-version only
    when features are enabled and disabled (ie. during volume set/reset),
    persisting the volume-op-versions and exporting/importing them.
    
    Change-Id: I52de0668c92628622e85f4588fb28829a7231132
    BUG: 1005043
    Signed-off-by: Kaushal M <kaushal>
    Reviewed-on: http://review.gluster.org/5568
    Reviewed-by: Amar Tumballi <amarts>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Anand Avati <avati>

Comment 9 Niels de Vos 2014-04-17 11:47:27 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report.

glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.