Bug 1196029 - [dht]: Failed to rebalance files when a replica-brick-set was removed
Summary: [dht]: Failed to rebalance files when a replica-brick-set was removed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.0
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
: RHGS 3.0.4
Assignee: Raghavendra G
QA Contact: Sweta Anandpara
URL:
Whiteboard:
Depends On:
Blocks: 1182947 1196615 1196775
TreeView+ depends on / blocked
 
Reported: 2015-02-25 06:36 UTC by Sweta Anandpara
Modified: 2015-05-13 17:53 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.6.0.49-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1196615 1196775 (view as bug list)
Environment:
Last Closed: 2015-03-26 06:36:29 UTC
Embargoed:


Attachments (Terms of Use)
Detailed logs (31.90 KB, text/plain)
2015-03-11 11:59 UTC, Sweta Anandpara
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0682 0 normal SHIPPED_LIVE Red Hat Storage 3.0 enhancement and bug fix update #4 2015-03-26 10:32:55 UTC

Description Sweta Anandpara 2015-02-25 06:36:22 UTC
Description of problem:
Hit this on the build 3.6.0.46-1
Had a 4*2 distribute-replicate volume 'master' which was in a geo-rep relationship with another 4*2 distribute-replicate volume 'slave'. A replica brick set was removed to make the volume 'master' 3*2, and none of the files present in the replica set were rebalanced to the other bricks. The 'status' shows the parameter 'rebalanced-files' to be 0.

Version-Release number of selected component (if applicable):
3.6.0.46-1

How reproducible:
Hit it once. Have not tried it again.

Steps that I followed.
1. Have a 4*2 distribute-replicate volume 'master' in geo-rep relationship with volume 'slave'

2. Stop the geo-rep session
gluster volume geo-rep master dhcp42-130::slave stop

3. Remove one of the replica pairs
gluster v remove-brick master replica 2 dhcp43-154:/rhs/brick2/d1 dhcp43-72:/rhs/brick2/d1 start

4. Check the status of remove-brick operation
gluster v remove-brick master replica 2 dhcp43-154:/rhs/brick2/d1 dhcp43-72:/rhs/brick2/d1 status

5. Commit the remove-brick operation to reflect the correct values 
gluster v remove-brick master replica 2 dhcp43-154:/rhs/brick2/d1 dhcp43-72:/rhs/brick2/d1 commit


Actual results:

At step4, the files present in the replica set should have got rebalanced to the remaining bricks. The status command should have showed the correct value for the parameters 'rebalanced files', 'scanned files', 'files-skipped'
Step 5, should have resulted in deletion of files from the local brick mountpoint.

Expected results:

Step4 does not show the expected movement of files due to remove-brick operation.
Step5 shows all the previously-existing-files' still present on the local removed-brick-mountpoint.

Additional info:

[root@dhcp43-154 ~]# gluster system:: execute gsec_create
Common secret pub file present at /var/lib/glusterd/geo-replication/common_secret.pem.pub
[root@dhcp43-154 ~]# gluster v geo-rep master dhcp42-130::slave create push-pem
dhcp42-130::slave is not empty. Please delete existing files in dhcp42-130::slave and retry, or use force to continue without deleting the existing files.
geo-replication command failed
[root@dhcp43-154 ~]# gluster v geo-rep master dhcp42-130::slave create push-pem force
Creating geo-replication session between master & dhcp42-130::slave has been successful
[root@dhcp43-154 ~]# gluster v geo-rep master dhcp42-130::slave status
 
MASTER NODE                          MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                STATUS         CHECKPOINT STATUS    CRAWL STATUS       
--------------------------------------------------------------------------------------------------------------------------------------------------------
dhcp43-154.lab.eng.blr.redhat.com    master        /rhs/brick1/d1    root          dhcp42-130::slave    Not Started    N/A                  N/A                
dhcp43-154.lab.eng.blr.redhat.com    master        /rhs/brick2/d1    root          dhcp42-130::slave    Not Started    N/A                  N/A                
dhcp42-182.lab.eng.blr.redhat.com    master        /rhs/brick1/d1    root          dhcp42-130::slave    Not Started    N/A                  N/A                
dhcp42-182.lab.eng.blr.redhat.com    master        /rhs/brick2/d1    root          dhcp42-130::slave    Not Started    N/A                  N/A                
dhcp42-74.lab.eng.blr.redhat.com     master        /rhs/brick1/d1    root          dhcp42-130::slave    Not Started    N/A                  N/A                
dhcp42-74.lab.eng.blr.redhat.com     master        /rhs/brick2/d1    root          dhcp42-130::slave    Not Started    N/A                  N/A                
dhcp43-72.lab.eng.blr.redhat.com     master        /rhs/brick1/d1    root          dhcp42-130::slave    Not Started    N/A                  N/A                
dhcp43-72.lab.eng.blr.redhat.com     master        /rhs/brick2/d1    root          dhcp42-130::slave    Not Started    N/A                  N/A                
[root@dhcp43-154 ~]# 
[root@dhcp43-154 ~]# gluster v geo-rep master dhcp42-130::slave start
Starting geo-replication session between master & dhcp42-130::slave has been successful
[root@dhcp43-154 ~]# gluster v geo-rep master dhcp42-130::slave status
 
MASTER NODE                          MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                STATUS             CHECKPOINT STATUS    CRAWL STATUS       
------------------------------------------------------------------------------------------------------------------------------------------------------------
dhcp43-154.lab.eng.blr.redhat.com    master        /rhs/brick1/d1    root          dhcp42-130::slave    Initializing...    N/A                  N/A                
dhcp43-154.lab.eng.blr.redhat.com    master        /rhs/brick2/d1    root          dhcp42-130::slave    Initializing...    N/A                  N/A                
dhcp42-182.lab.eng.blr.redhat.com    master        /rhs/brick1/d1    root          dhcp42-130::slave    Initializing...    N/A                  N/A                
dhcp42-182.lab.eng.blr.redhat.com    master        /rhs/brick2/d1    root          dhcp42-130::slave    Initializing...    N/A                  N/A                
dhcp43-72.lab.eng.blr.redhat.com     master        /rhs/brick1/d1    root          dhcp42-130::slave    Initializing...    N/A                  N/A                
dhcp43-72.lab.eng.blr.redhat.com     master        /rhs/brick2/d1    root          dhcp42-130::slave    Initializing...    N/A                  N/A                
dhcp42-74.lab.eng.blr.redhat.com     master        /rhs/brick1/d1    root          dhcp42-130::slave    Initializing...    N/A                  N/A                
dhcp42-74.lab.eng.blr.redhat.com     master        /rhs/brick2/d1    root          dhcp42-130::slave    Initializing...    N/A                  N/A                
[root@dhcp43-154 ~]# gluster v geo-rep master dhcp42-130::slave status
 
MASTER NODE                          MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                STATUS     CHECKPOINT STATUS    CRAWL STATUS          
-------------------------------------------------------------------------------------------------------------------------------------------------------
dhcp43-154.lab.eng.blr.redhat.com    master        /rhs/brick1/d1    root          dhcp43-93::slave     Active     N/A                  Changelog Crawl       
dhcp43-154.lab.eng.blr.redhat.com    master        /rhs/brick2/d1    root          dhcp43-93::slave     Active     N/A                  Changelog Crawl       
dhcp42-182.lab.eng.blr.redhat.com    master        /rhs/brick1/d1    root          dhcp42-19::slave     Passive    N/A                  N/A                   
dhcp42-182.lab.eng.blr.redhat.com    master        /rhs/brick2/d1    root          dhcp42-19::slave     Passive    N/A                  N/A                   
dhcp43-72.lab.eng.blr.redhat.com     master        /rhs/brick1/d1    root          dhcp42-210::slave    Passive    N/A                  N/A                   
dhcp43-72.lab.eng.blr.redhat.com     master        /rhs/brick2/d1    root          dhcp42-210::slave    Passive    N/A                  N/A                   
dhcp42-74.lab.eng.blr.redhat.com     master        /rhs/brick1/d1    root          dhcp42-130::slave    Active     N/A                  Changelog Crawl       
dhcp42-74.lab.eng.blr.redhat.com     master        /rhs/brick2/d1    root          dhcp42-130::slave    Active     N/A                  Changelog Crawl       
[root@dhcp43-154 ~]# gluster v i
 
Volume Name: master
Type: Distributed-Replicate
Volume ID: fcf732d1-81d6-42d1-8915-cc2107fd72f2
Status: Started
Snap Volume: no
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: dhcp43-154:/rhs/brick1/d1
Brick2: dhcp43-72:/rhs/brick1/d1
Brick3: dhcp42-74:/rhs/brick1/d1
Brick4: dhcp42-182:/rhs/brick1/d1
Brick5: dhcp43-154:/rhs/brick2/d1
Brick6: dhcp43-72:/rhs/brick2/d1
Brick7: dhcp42-74:/rhs/brick2/d1
Brick8: dhcp42-182:/rhs/brick2/d1
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable
[root@dhcp43-154 ~]# 
[root@dhcp43-154 ~]# 
[root@dhcp43-154 ~]# mount
/dev/mapper/vg_dhcp43154-lv_root on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/vda1 on /boot type ext4 (rw)
/dev/mapper/RHS_vg1-RHS_lv1 on /rhs/brick1 type xfs (rw,noatime,nodiratime,inode64)
/dev/mapper/RHS_vg2-RHS_lv2 on /rhs/brick2 type xfs (rw,noatime,nodiratime,inode64)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
[root@dhcp43-154 ~]# mount -t glusterfs dhcp43-154:/master /mnt/master
ERROR: Mount point does not exist
Please specify a mount point
Usage:
man 8 /sbin/mount.glusterfs
[root@dhcp43-154 ~]# mkdir /mnt/master
[root@dhcp43-154 ~]# mount -t glusterfs dhcp43-154:/master /mnt/master
[root@dhcp43-154 ~]# cd /mnt/master
[root@dhcp43-154 master]# ls -a
.     a15  a23  a31  a4   a48  a56  a64  a72  a80  a89  a97   b14  b22  b30  b39  b47  b55  b63  b71  b8   b88  b96   c19  c39  c59  c76  c91
..    a16  a24  a32  a40  a49  a57  a65  a73  a81  a9   a98   b15  b23  b31  b4   b48  b56  b64  b72  b80  b89  b97   c2   c40  c62  c79  c92
a1    a17  a25  a33  a41  a5   a58  a66  a74  a82  a90  a99   b16  b24  b32  b40  b49  b57  b65  b73  b81  b9   b98   c20  c45  c63  c80  c96
a10   a18  a26  a34  a42  a50  a59  a67  a75  a83  a91  b1    b17  b25  b33  b41  b5   b58  b66  b74  b82  b90  b99   c22  c47  c64  c81  c97
a100  a19  a27  a35  a43  a51  a6   a68  a76  a84  a92  b10   b18  b26  b34  b42  b50  b59  b67  b75  b83  b91  c100  c25  c48  c65  c82  c98
a11   a2   a28  a36  a44  a52  a60  a69  a77  a85  a93  b100  b19  b27  b35  b43  b51  b6   b68  b76  b84  b92  c12   c28  c5   c68  c85  c99
a12   a20  a29  a37  a45  a53  a61  a7   a78  a86  a94  b11   b2   b28  b36  b44  b52  b60  b69  b77  b85  b93  c13   c29  c51  c70  c86
a13   a21  a3   a38  a46  a54  a62  a70  a79  a87  a95  b12   b20  b29  b37  b45  b53  b61  b7   b78  b86  b94  c16   c30  c53  c72  c88
a14   a22  a30  a39  a47  a55  a63  a71  a8   a88  a96  b13   b21  b3   b38  b46  b54  b62  b70  b79  b87  b95  c18   c36  c54  c74  c90
[root@dhcp43-154 master]# ls -l | wc -l
248
[root@dhcp43-154 master]# cd /rh
rhev/ rhs/  
[root@dhcp43-154 master]# cd /rh
rhev/ rhs/  
[root@dhcp43-154 master]# ls -l  /rhs/brick1/d1/ | wc -l
75
[root@dhcp43-154 master]# ls -l  /rhs/brick2/d1/ | wc -l
83
[root@dhcp43-154 master]
[root@dhcp43-154 master]# ls -l  /rhs/brick2/d1/
total 0
-rw-r--r-- 2 root root 0 Feb 17 20:00 a100
-rw-r--r-- 2 root root 0 Feb 17 20:00 a13
-rw-r--r-- 2 root root 0 Feb 17 20:00 a19
-rw-r--r-- 2 root root 0 Feb 17 20:00 a2
-rw-r--r-- 2 root root 0 Feb 17 20:00 a23
-rw-r--r-- 2 root root 0 Feb 17 20:00 a30
-rw-r--r-- 2 root root 0 Feb 17 20:00 a31
-rw-r--r-- 2 root root 0 Feb 17 20:00 a35
-rw-r--r-- 2 root root 0 Feb 17 20:00 a38
-rw-r--r-- 2 root root 0 Feb 17 20:00 a41
-rw-r--r-- 2 root root 0 Feb 17 20:00 a46
-rw-r--r-- 2 root root 0 Feb 17 20:00 a47
-rw-r--r-- 2 root root 0 Feb 17 20:00 a49
-rw-r--r-- 2 root root 0 Feb 17 20:00 a50
-rw-r--r-- 2 root root 0 Feb 17 20:00 a54
-rw-r--r-- 2 root root 0 Feb 17 20:00 a56
-rw-r--r-- 2 root root 0 Feb 17 20:00 a67
-rw-r--r-- 2 root root 0 Feb 17 20:00 a69
-rw-r--r-- 2 root root 0 Feb 17 20:00 a7
-rw-r--r-- 2 root root 0 Feb 17 20:00 a93
-rw-r--r-- 2 root root 0 Feb 17 20:00 a95
-rw-r--r-- 2 root root 0 Feb 17 20:00 a99
-rw-r--r-- 2 root root 0 Feb 19 15:16 b10
-rw-r--r-- 2 root root 0 Feb 19 15:16 b11
-rw-r--r-- 2 root root 0 Feb 19 15:16 b13
-rw-r--r-- 2 root root 0 Feb 19 15:16 b19
-rw-r--r-- 2 root root 0 Feb 19 15:16 b20
-rw-r--r-- 2 root root 0 Feb 19 15:16 b25
-rw-r--r-- 2 root root 0 Feb 19 15:16 b28
-rw-r--r-- 2 root root 0 Feb 19 15:16 b33
-rw-r--r-- 2 root root 0 Feb 19 15:16 b38
-rw-r--r-- 2 root root 0 Feb 19 15:16 b40
-rw-r--r-- 2 root root 0 Feb 19 15:16 b41
-rw-r--r-- 2 root root 0 Feb 19 15:16 b42
-rw-r--r-- 2 root root 0 Feb 19 15:16 b5
-rw-r--r-- 2 root root 0 Feb 19 15:16 b50
-rw-r--r-- 2 root root 0 Feb 19 15:16 b52
-rw-r--r-- 2 root root 0 Feb 19 15:16 b53
-rw-r--r-- 2 root root 0 Feb 19 15:16 b54
-rw-r--r-- 2 root root 0 Feb 19 15:16 b58
-rw-r--r-- 2 root root 0 Feb 19 15:16 b6
-rw-r--r-- 2 root root 0 Feb 19 15:16 b7
-rw-r--r-- 2 root root 0 Feb 19 15:16 b70
-rw-r--r-- 2 root root 0 Feb 19 15:16 b72
-rw-r--r-- 2 root root 0 Feb 19 15:16 b74
-rw-r--r-- 2 root root 0 Feb 19 15:16 b75
-rw-r--r-- 2 root root 0 Feb 19 15:16 b77
-rw-r--r-- 2 root root 0 Feb 19 15:16 b79
-rw-r--r-- 2 root root 0 Feb 19 15:16 b8
-rw-r--r-- 2 root root 0 Feb 19 15:16 b80
-rw-r--r-- 2 root root 0 Feb 19 15:16 b81
-rw-r--r-- 2 root root 0 Feb 19 15:16 b82
-rw-r--r-- 2 root root 0 Feb 19 15:16 b83
-rw-r--r-- 2 root root 0 Feb 19 15:16 b84
-rw-r--r-- 2 root root 0 Feb 19 15:16 b86
-rw-r--r-- 2 root root 0 Feb 19 15:16 b87
-rw-r--r-- 2 root root 0 Feb 19 15:16 b95
-rw-r--r-- 2 root root 0 Feb 19 15:16 b96
-rw-r--r-- 2 root root 0 Feb 19 16:37 c12
-rw-r--r-- 2 root root 0 Feb 19 16:37 c19
-rw-r--r-- 2 root root 0 Feb 19 16:37 c22
-rw-r--r-- 2 root root 0 Feb 19 16:37 c25
-rw-r--r-- 2 root root 0 Feb 19 16:37 c28
-rw-r--r-- 2 root root 0 Feb 19 16:37 c29
-rw-r--r-- 2 root root 0 Feb 19 16:37 c39
-rw-r--r-- 2 root root 0 Feb 19 16:37 c5
-rw-r--r-- 2 root root 0 Feb 19 16:37 c51
-rw-r--r-- 2 root root 0 Feb 19 16:37 c53
-rw-r--r-- 2 root root 0 Feb 19 16:37 c54
-rw-r--r-- 2 root root 0 Feb 19 16:37 c59
-rw-r--r-- 2 root root 0 Feb 19 16:37 c62
-rw-r--r-- 2 root root 0 Feb 19 16:37 c65
-rw-r--r-- 2 root root 0 Feb 19 16:37 c70
-rw-r--r-- 2 root root 0 Feb 19 16:37 c72
-rw-r--r-- 2 root root 0 Feb 19 16:37 c82
-rw-r--r-- 2 root root 0 Feb 19 16:37 c85
-rw-r--r-- 2 root root 0 Feb 19 16:37 c88
-rw-r--r-- 2 root root 0 Feb 19 16:37 c90
-rw-r--r-- 2 root root 0 Feb 19 16:37 c91
-rw-r--r-- 2 root root 0 Feb 19 16:37 c92
-rw-r--r-- 2 root root 0 Feb 19 16:37 c97
-rw-r--r-- 2 root root 0 Feb 19 16:37 c98
[root@dhcp43-154 master]#

[root@dhcp43-154 master]# gluster v i
 
Volume Name: master
Type: Distributed-Replicate
Volume ID: fcf732d1-81d6-42d1-8915-cc2107fd72f2
Status: Started
Snap Volume: no
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: dhcp43-154:/rhs/brick1/d1
Brick2: dhcp43-72:/rhs/brick1/d1
Brick3: dhcp42-74:/rhs/brick1/d1
Brick4: dhcp42-182:/rhs/brick1/d1
Brick5: dhcp43-154:/rhs/brick2/d1
Brick6: dhcp43-72:/rhs/brick2/d1
Brick7: dhcp42-74:/rhs/brick2/d1
Brick8: dhcp42-182:/rhs/brick2/d1
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable
[root@dhcp43-154 master]# 
[root@dhcp43-154 master]# gluster v remove-brick master replica 2 dhcp43-154:/rhs/brick2/d1 dhcp43-72:/rhs/brick2/d1 start
volume remove-brick start: success
ID: 0af990dd-684b-4851-91c7-571615d21f08
[root@dhcp43-154 master]# gluster v remove-brick master replica 2 dhcp43-154:/rhs/brick2/d1 dhcp43-72:/rhs/brick2/d1 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes           247             0             0            completed               1.00
                               dhcp43-72                0        0Bytes           247             0             0            completed               1.00
[root@dhcp43-154 master]# ls /rhs/brick2/d1 | wc -l
82
[root@dhcp43-154 master]# ls /rhs/brick2/d1 | wc -l
82
[root@dhcp43-154 master]# gluster v i
 
Volume Name: master
Type: Distributed-Replicate
Volume ID: fcf732d1-81d6-42d1-8915-cc2107fd72f2
Status: Started
Snap Volume: no
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: dhcp43-154:/rhs/brick1/d1
Brick2: dhcp43-72:/rhs/brick1/d1
Brick3: dhcp42-74:/rhs/brick1/d1
Brick4: dhcp42-182:/rhs/brick1/d1
Brick5: dhcp43-154:/rhs/brick2/d1
Brick6: dhcp43-72:/rhs/brick2/d1
Brick7: dhcp42-74:/rhs/brick2/d1
Brick8: dhcp42-182:/rhs/brick2/d1
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable
[root@dhcp43-154 master]# gluster v remove-brick master replica 2 dhcp43-154:/rhs/brick2/d1 dhcp43-72:/rhs/brick2/d1 commit
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
volume remove-brick commit: failed: geo-replication sessions are active for the volume master.
Stop geo-replication sessions involved in this volume. Use 'volume geo-replication status' command for more info.
[root@dhcp43-154 master]# gluster v geo-rep master dhcp42-130::slave stop
Stopping geo-replication session between master & dhcp42-130::slave has been successful
[root@dhcp43-154 master]# gluster v remove-brick master replica 2 dhcp43-154:/rhs/brick2/d1 dhcp43-72:/rhs/brick2/d1 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes           247             0             0            completed               1.00
                               dhcp43-72                0        0Bytes           247             0             0            completed               1.00
[root@dhcp43-154 master]# gluster v remove-brick master replica 2 dhcp43-154:/rhs/brick2/d1 dhcp43-72:/rhs/brick2/d1 commit
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
volume remove-brick commit: success
Check the removed bricks to ensure all files are migrated.
If files with data are found on the brick path, copy them via a gluster mount point before re-purposing the removed brick. 
[root@dhcp43-154 master]# 
[root@dhcp43-154 master]# 
[root@dhcp43-154 master]# gluster v i
 
Volume Name: master
Type: Distributed-Replicate
Volume ID: fcf732d1-81d6-42d1-8915-cc2107fd72f2
Status: Started
Snap Volume: no
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: dhcp43-154:/rhs/brick1/d1
Brick2: dhcp43-72:/rhs/brick1/d1
Brick3: dhcp42-74:/rhs/brick1/d1
Brick4: dhcp42-182:/rhs/brick1/d1
Brick5: dhcp42-74:/rhs/brick2/d1
Brick6: dhcp42-182:/rhs/brick2/d1
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable
[root@dhcp43-154 master]# ls -l /rhs/brick2/d1 | wc -l
83
[root@dhcp43-154 master]# ssh dhcp42-72 'ls -l /rhs/brick2/d1 | wc -l'
ssh: connect to host dhcp42-72 port 22: No route to host
[root@dhcp43-154 master]# ssh dhcp43-72 'ls -l /rhs/brick2/d1 | wc -l'
The authenticity of host 'dhcp43-72 (10.70.43.72)' can't be established.
RSA key fingerprint is 73:fd:3a:81:fb:46:ce:d2:ba:38:d3:87:ac:02:6b:ac.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'dhcp43-72' (RSA) to the list of known hosts.
root@dhcp43-72's password: 
83
[root@dhcp43-154 master]# 
[root@dhcp43-154 master]# ls -l /rhs/brick1/d1 | wc -l
75
[root@dhcp43-154 master]# ssh 10.70.42.74 'ls -l /rhs/brick2/d1 | wc -l'
root.42.74's password: 
49
[root@dhcp43-154 master]# ssh 10.70.42.182 'ls -l /rhs/brick2/d1 | wc -l'
root.42.182's password: 
49
[root@dhcp43-154 master]# ssh 10.70.42.74 'ls -l /rhs/brick1/d1 | wc -l'
root.42.74's password: 
44
[root@dhcp43-154 master]#
[root@dhcp43-154 master]# 
[root@dhcp43-154 master]# ls -l | wc -l
166
[root@dhcp43-154 master]#

Comment 1 Sweta Anandpara 2015-02-25 07:28:12 UTC
SOS reports copied to: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1196029/

Comment 4 Triveni Rao 2015-02-26 10:15:43 UTC
Similar issue is seen with dist-rep volume and not on pure distribute.

Output below:


[root@rhsauto032 mnt]# gluster v info test2

Volume Name: test2
Type: Distribute
Volume ID: e963a35c-037f-48dc-ab7d-ce28fc1b65e0
Status: Started
Snap Volume: no
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/t2
Brick2: rhsauto034.lab.eng.blr.redhat.com:/rhs/brick1/t2
Brick3: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick2/t2
Brick4: rhsauto034.lab.eng.blr.redhat.com:/rhs/brick2/t2
Options Reconfigured:
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable
[root@rhsauto032 mnt]# gluster v remove-brick test2 rhsauto034.lab.eng.blr.redhat.com:/rhs/brick2/t2 start
volume remove-brick start: success
ID: 1b779bad-38dc-483f-bfff-b3956ce90d71
[root@rhsauto032 mnt]#
[root@rhsauto032 mnt]# gluster v remove-brick test2 rhsauto034.lab.eng.blr.redhat.com:/rhs/brick2/t2 status
                                    Node Rebalanced-files size       scanned      failures       skipped status   run time in secs
                               ---------      ----------- -----------   -----------   -----------   ----------- ------------     --------------
       rhsauto034.lab.eng.blr.redhat.com                7 70.0MB            22             0             0          in progress               6.00
[root@rhsauto032 mnt]# gluster v remove-brick test2 rhsauto034.lab.eng.blr.redhat.com:/rhs/brick2/t2 status
                                    Node Rebalanced-files size       scanned      failures       skipped status   run time in secs
                               ---------      ----------- -----------   -----------   -----------   ----------- ------------     --------------
       rhsauto034.lab.eng.blr.redhat.com               24 240.0MB            60             0             0 completed              13.00
[root@rhsauto032 mnt]#


isssue is not seen on distribute volume.





++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Remove brick on distr-rep volume re-produced the issue.


[root@rhsauto032 ~]# gluster v create dist-rep replica 2 `hostname`:/rhs/brick1/d1 rhsauto034.lab.eng.blr.redhat.com:/rhs/brick1/d1 `hostname`:/rhs/brick1/d2 rhsauto034.lab.eng.blr.redhat.com:
/rhs/brick1/d2 `hostname`:/rhs/brick1/d3 rhsauto034.lab.eng.blr.redhat.com:/rhs/brick1/d3
volume create: dist-rep: success: please start the volume to access data
[root@rhsauto032 ~]# gluster v start dist-rep
volume start: dist-rep: success
[root@rhsauto032 ~]# df
Filesystem           1K-blocks    Used Available Use% Mounted on
/dev/mapper/vg_rhsauto032-lv_root
                      17938864 1860484  15160468  11% /
tmpfs                  2027372       0   2027372   0% /dev/shm
/dev/vda1               487652   28454    433598   7% /boot
/dev/mapper/RHS_vg1-RHS_lv1
                      52135040  207888  51927152   1% /rhs/brick1
/dev/mapper/RHS_vg2-RHS_lv2
                      52135040 2296852  49838188   5% /rhs/brick2
/dev/mapper/RHS_vg3-RHS_lv3
                      52135040   33808  52101232   1% /rhs/brick3
/dev/mapper/RHS_vg4-RHS_lv4
                      52135040   33616  52101424   1% /rhs/brick4
/dev/mapper/RHS_vg5-RHS_lv5
                      52135040   33616  52101424   1% /rhs/brick5
rhsauto032.lab.eng.blr.redhat.com:test2
                     208540160 3719296 204820864   2% /mnt
[root@rhsauto032 ~]# umount /mnt
[root@rhsauto032 ~]# gluster v info dist-repo
Volume dist-repo does not exist
[root@rhsauto032 ~]# gluster v info dist-rep

Volume Name: dist-rep
Type: Distributed-Replicate
Volume ID: 8578bce5-0c66-4388-8c6d-176f17aaf83e
Status: Started
Snap Volume: no
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d1
Brick2: rhsauto034.lab.eng.blr.redhat.com:/rhs/brick1/d1
Brick3: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d2
Brick4: rhsauto034.lab.eng.blr.redhat.com:/rhs/brick1/d2
Brick5: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d3
Brick6: rhsauto034.lab.eng.blr.redhat.com:/rhs/brick1/d3
Options Reconfigured:
performance.readdir-ahead: on
snap-max-hard-limit: 256
nap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable
[root@rhsauto032 ~]# df
Filesystem           1K-blocks    Used Available Use% Mounted on
/dev/mapper/vg_rhsauto032-lv_root
                      17938864 1860456  15160496  11% /
tmpfs                  2027372       0   2027372   0% /dev/shm
/dev/vda1               487652   28454    433598   7% /boot
/dev/mapper/RHS_vg1-RHS_lv1
                      52135040  207888  51927152   1% /rhs/brick1
/dev/mapper/RHS_vg2-RHS_lv2
                      52135040 2296852  49838188   5% /rhs/brick2
/dev/mapper/RHS_vg3-RHS_lv3
                      52135040   33808  52101232   1% /rhs/brick3
/dev/mapper/RHS_vg4-RHS_lv4
                      52135040   33616  52101424   1% /rhs/brick4
/dev/mapper/RHS_vg5-RHS_lv5
                      52135040   33616  52101424   1% /rhs/brick5
[root@rhsauto032 ~]# mount -t glusterfs `hostname`:dist-rep /mnt
[root@rhsauto032 ~]# df
Filesystem           1K-blocks    Used Available Use% Mounted on
/dev/mapper/vg_rhsauto032-lv_root
                      17938864 1860416  15160536  11% /
tmpfs                  2027372       0   2027372   0% /dev/shm
/dev/vda1               487652   28454    433598   7% /boot
/dev/mapper/RHS_vg1-RHS_lv1
                      52135040  207888  51927152   1% /rhs/brick1
/dev/mapper/RHS_vg2-RHS_lv2
                      52135040 2296852  49838188   5% /rhs/brick2
/dev/mapper/RHS_vg3-RHS_lv3
                      52135040   33808  52101232   1% /rhs/brick3
/dev/mapper/RHS_vg4-RHS_lv4
                      52135040   33616  52101424   1% /rhs/brick4
/dev/mapper/RHS_vg5-RHS_lv5
                      52135040   33616  52101424   1% /rhs/brick5
rhsauto032.lab.eng.blr.redhat.com:dist-rep
                     156405120 3542144 152862976   3% /mnt
[root@rhsauto032 ~]# cd /mnt
[root@rhsauto032 mnt]# ls
[root@rhsauto032 mnt]# for i in {1..100}
> do
> dd if=/dev/urandom of=$i bs=1M count=1
> done
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.228197 s, 4.6 MB/s
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.213948 s, 4.9 MB/s
1+0 records in
1+0 records out

root@rhsauto032 mnt]# gluster v info dist-rep

Volume Name: dist-rep
Type: Distributed-Replicate
Volume ID: 8578bce5-0c66-4388-8c6d-176f17aaf83e
Status: Started
Snap Volume: no
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d1
Brick2: rhsauto034.lab.eng.blr.redhat.com:/rhs/brick1/d1
Brick3: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d2
Brick4: rhsauto034.lab.eng.blr.redhat.com:/rhs/brick1/d2
Brick5: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d3
Brick6: rhsauto034.lab.eng.blr.redhat.com:/rhs/brick1/d3
Options Reconfigured:
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable
[root@rhsauto032 mnt]# gluster v remove-brick dist-rep rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d3 rhsauto034.lab.eng.blr.redhat.com:/rhs/brick1/d3 start
volume remove-brick start: success
ID: 2898eaa5-0ed1-4951-a3ea-b5490c029c32
[root@rhsauto032 mnt]# gluster v remove-brick dist-rep rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d3 rhsauto034.lab.eng.blr.redhat.com:/rhs/brick1/d3 status
                                    Node Rebalanced-files size       scanned      failures       skipped status   run time in secs
                               ---------      ----------- -----------   -----------   -----------   ----------- ------------     --------------
                               localhost                0 0Bytes           100             0             0 completed               0.00
       rhsauto034.lab.eng.blr.redhat.com                0 0Bytes           100             0             0 completed               1.00
[root@rhsauto032 mnt]# gluster v remove-brick dist-rep rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d3 rhsauto034.lab.eng.blr.redhat.com:/rhs/brick1/d3 status
                                    Node Rebalanced-files size       scanned      failures       skipped status   run time in secs
                               ---------      ----------- -----------   -----------   -----------   ----------- ------------     --------------
                               localhost                0 0Bytes           100             0             0 completed               0.00
       rhsauto034.lab.eng.blr.redhat.com                0 0Bytes           100             0             0 completed               1.00
[root@rhsauto032 mnt]# gluster v remove-brick dist-rep rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d3 rhsauto034.lab.eng.blr.redhat.com:/rhs/brick1/d3 status
                                    Node Rebalanced-files size       scanned      failures       skipped status   run time in secs
                               ---------      ----------- -----------   -----------   -----------   ----------- ------------     --------------
                               localhost                0 0Bytes           100             0             0 completed               0.00
       rhsauto034.lab.eng.blr.redhat.com                0 0Bytes           100             0             0 completed               1.00
[root@rhsauto032 mnt]#


root@rhsauto032 mnt]# ll /rhs/brick1/d3
total 29696
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 16
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 18
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 19
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 22
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 23
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 27
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 28
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 32
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 34
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 37
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 38
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 41
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 43
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 44
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 56
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 57
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 60
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 63
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 69
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 7
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 71
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 78
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 80
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 82
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 83
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 86
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 89
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 92
-rw-r--r-- 2 root root 1048576 Feb 26 00:36 99
[root@rhsauto032 mnt]#

Comment 7 Sweta Anandpara 2015-03-11 11:58:40 UTC
Verified the above bug on the build glusterfs-3.6.0.50-1

Had a 4*2 distribute-replicate volume set up in a geo-rep relationship with another 2*2 volume. 

Removed one of the replica pairs on the master to make the volume 3*2 and it succeeded. The files got synced on the other bricks. 

Moving the bug to fixed in 3.0.4. Detailed logs are attached.

Comment 8 Sweta Anandpara 2015-03-11 11:59:07 UTC
Created attachment 1000354 [details]
Detailed logs

Comment 10 errata-xmlrpc 2015-03-26 06:36:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0682.html


Note You need to log in before you can comment on or make changes to this bug.