Bug 1632889 - 'df' shows half as much space on volume after upgrade to RHGS 3.4
Summary: 'df' shows half as much space on volume after upgrade to RHGS 3.4
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
Assignee: Sanju
QA Contact:
URL:
Whiteboard:
: 1744950 (view as bug list)
Depends On:
Blocks: 1630997 1633242 1633479
TreeView+ depends on / blocked
 
Reported: 2018-09-25 18:43 UTC by Sanju
Modified: 2019-09-19 10:52 UTC (History)
13 users (show)

Fixed In Version: glusterfs-6.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1630997
: 1633242 1633479 (view as bug list)
Environment:
Last Closed: 2019-03-25 16:30:59 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Comment 1 Sanju 2018-09-25 18:47:27 UTC
Comment from Atin Mukherjee:

More detailed RCA:

https://review.gluster.org/#/c/glusterfs/+/19484/2/xlators/mgmt/glusterd/src/glusterd-store.c ensured that if a cluster is upgraded with volumes which didn't have shared-brick-count feature the upgrade is handled and this field is populated during the restore of volumes and bricks as part of initialization phase of glusterd. However this patch had assumed that by the time control reached to this flow brickinfo->uuid will be already populated which was a wrong assumption. brick's uuid is first tried to be restored from the store (which with default volume creation from glusterfs-3.12.2 onwards will be available), but given the volume was carried forward from glusterfs-3.8.4 or earlier versions, brickinfo file didn't have uuid and that caused statfs_fsid of every brickinfo to continue as zero value even after this patch which resulted gd_set_shared_brick_count () logic to go for a toss 

static void                                                                     
gd_set_shared_brick_count(glusterd_volinfo_t *volinfo)                               
{                                                                                    
    glusterd_brickinfo_t *brickinfo = NULL;                                          
    glusterd_brickinfo_t *trav = NULL;                                               
                                                                                     
    cds_list_for_each_entry(brickinfo, &volinfo->bricks, brick_list)                 
    {                                                                                
        if (gf_uuid_compare(brickinfo->uuid, MY_UUID))                               
            continue;                                                                
        brickinfo->fs_share_count = 0;                                               
        cds_list_for_each_entry(trav, &volinfo->bricks, brick_list)                                                                                               
        {                                                                       
            if (!gf_uuid_compare(trav->uuid, MY_UUID) &&                        
                (trav->statfs_fsid == brickinfo->statfs_fsid)) {   <=== brickinfo->statfs_fsid will be always zero and hence the fs_share_count will be equal to number of bricks what the node hosts for this volume.
              
                brickinfo->fs_share_count++;                                    
            }                                                                   
        }                                                                       
    }                                                                           
                                                                                
    return;                                                                     
} 

Now if fs_share_count is set to n compared to 1, df will report back as x/n size where x is the legitimate disk space which df should report.


While debugging this problem what we noticed is, if we restart glusterd, followed by gluster volume set operation the issue goes away. Restarting glusterd ensures the brickinfo->statfs_fsid is populated correctly because now we have persisted uuid for brickinfo. And then executing gluster volume set ensures fs_shared_count is correctly updated and volfiles are regenerated and propogated back to client.

Comment 2 Sanju 2018-09-25 18:48:03 UTC
Comment from Atin Mukherjee:

More detailed RCA:

https://review.gluster.org/#/c/glusterfs/+/19484/2/xlators/mgmt/glusterd/src/glusterd-store.c ensured that if a cluster is upgraded with volumes which didn't have shared-brick-count feature the upgrade is handled and this field is populated during the restore of volumes and bricks as part of initialization phase of glusterd. However this patch had assumed that by the time control reached to this flow brickinfo->uuid will be already populated which was a wrong assumption. brick's uuid is first tried to be restored from the store (which with default volume creation from glusterfs-3.12.2 onwards will be available), but given the volume was carried forward from glusterfs-3.8.4 or earlier versions, brickinfo file didn't have uuid and that caused statfs_fsid of every brickinfo to continue as zero value even after this patch which resulted gd_set_shared_brick_count () logic to go for a toss 

static void                                                                     
gd_set_shared_brick_count(glusterd_volinfo_t *volinfo)                               
{                                                                                    
    glusterd_brickinfo_t *brickinfo = NULL;                                          
    glusterd_brickinfo_t *trav = NULL;                                               
                                                                                     
    cds_list_for_each_entry(brickinfo, &volinfo->bricks, brick_list)                 
    {                                                                                
        if (gf_uuid_compare(brickinfo->uuid, MY_UUID))                               
            continue;                                                                
        brickinfo->fs_share_count = 0;                                               
        cds_list_for_each_entry(trav, &volinfo->bricks, brick_list)                                                                                               
        {                                                                       
            if (!gf_uuid_compare(trav->uuid, MY_UUID) &&                        
                (trav->statfs_fsid == brickinfo->statfs_fsid)) {   <=== brickinfo->statfs_fsid will be always zero and hence the fs_share_count will be equal to number of bricks what the node hosts for this volume.
              
                brickinfo->fs_share_count++;                                    
            }                                                                   
        }                                                                       
    }                                                                           
                                                                                
    return;                                                                     
} 

Now if fs_share_count is set to n compared to 1, df will report back as x/n size where x is the legitimate disk space which df should report.


While debugging this problem what we noticed is, if we restart glusterd, followed by gluster volume set operation the issue goes away. Restarting glusterd ensures the brickinfo->statfs_fsid is populated correctly because now we have persisted uuid for brickinfo. And then executing gluster volume set ensures fs_shared_count is correctly updated and volfiles are regenerated and propogated back to client.

Comment 3 Sanju 2018-09-25 18:54:21 UTC
upstream patch: https://review.gluster.org/#/c/glusterfs/+/21278

Comment 4 Worker Ant 2018-09-26 12:22:46 UTC
COMMIT: https://review.gluster.org/21278 committed in master by "Atin Mukherjee" <amukherj> with a commit message- glusterd: make sure that brickinfo->uuid is not null

Problem: After an upgrade from the version where shared-brick-count
option is not present to a version which introduced this option
causes issue at the mount point i.e, size of the volume at mount
point will be reduced by shared-brick-count value times.

Cause: shared-brick-count is equal to the number of bricks that
are sharing the file system. gd_set_shared_brick_count() calculates
the shared-brick-count value based on uuid of the node and fsid of
the brick. https://review.gluster.org/#/c/glusterfs/+/19484 handles
setting of fsid properly during an upgrade path. This patch assumed
that when the code path is reached, brickinfo->uuid is non-null.
But brickinfo->uuid is null for all the bricks, as the uuid is null
https://review.gluster.org/#/c/glusterfs/+/19484 couldn't reached the
code path to set the fsid for bricks. So, we had fsid as 0 for all
bricks, which resulted in gd_set_shared_brick_count() to calculate
shared-brick-count in a wrong way. i.e, the logic written in
gd_set_shared_brick_count() didn't work as expected since fsid is 0.

Solution: Before control reaches the code path written by
https://review.gluster.org/#/c/glusterfs/+/19484,
adding a check for whether brickinfo->uuid is null and
if brickinfo->uuid is having null value, calling
glusterd_resolve_brick will set the brickinfo->uuid to a 
proper value. When we have proper uuid, fsid for the bricks
will be set properly and shared-brick-count value will be
caluculated correctly.

Please take a look at the bug https://bugzilla.redhat.com/show_bug.cgi?id=1632889
for complete RCA

Steps followed to test the fix:
1. Created a 2 node cluster, the cluster is running with binary
which doesn't have shared-brick-count option
2. Created a 2x(2+1) volume and started it
3. Mouted the volume, checked size of volume using df
4. Upgrade to a version where shared-brick-count is introduced
(upgraded the nodes one by one i.e, stop the glusterd, upgrade the node
and start the glusterd).
5. after upgrading both the nodes, bumped up the cluster.op-version
6. At mount point, df shows the correct size for volume.

fixes: bz#1632889
Change-Id: Ib9f078aafb15e899a01086eae113270657ea916b
Signed-off-by: Sanju Rakonde <srakonde>

Comment 5 Worker Ant 2018-09-27 06:19:59 UTC
REVISION POSTED: https://review.gluster.org/21285 (glusterd: make sure that brickinfo->uuid is not null) posted (#4) for review on release-5 by Sanju Rakonde

Comment 6 Worker Ant 2018-09-27 11:04:02 UTC
REVISION POSTED: https://review.gluster.org/21288 (glusterd: make sure that brickinfo->uuid is not null) posted (#2) for review on release-4.1 by Sanju Rakonde

Comment 7 Shyamsundar 2019-03-25 16:30:59 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.

glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 8 Sanju 2019-09-19 10:52:37 UTC
*** Bug 1744950 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.