Bug 765308 (GLUSTER-3576)

Summary: Rebalance on Distributed-Replicate in fails and consumes more space then before
Product: [Community] GlusterFS Reporter: jaw171
Component: glusterdAssignee: Amar Tumballi <amarts>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.2.3CC: amarts, gluster-bugs, jdarcy, shylesh, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: 3.2.5qa4 Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Logs from gfs-dev-01b
none
Logs from gfs-dev-02b
none
Logs from gfs-dev-03b none

Description jaw171 2011-09-16 14:23:20 UTC
Created attachment 663

Comment 1 jaw171 2011-09-16 14:23:48 UTC
Created attachment 664

Comment 2 jaw171 2011-09-16 14:24:10 UTC
Created attachment 665

Comment 3 jaw171 2011-09-16 17:22:22 UTC
I created a volume with 3 replicated subvolumes and filled it up to 50% with 10MB files of random data.  I then added another replicated subvolume and issues a rebalance.  At first the df output shows the usage of the first 3 bricks go down and the fourth go up but then something goes wrong and the usage of the first 3 bricks goes up and beyond what they originally had.  The volume gets filled with hidden files named after each real file and its file size as as .REALNAME.FILESIZE (see below). If I delete all of those hidden files the disk usage on each subvolume goes back to exactly what it was before the rebalance.  Logs show "Transport endpoint is not connected".  it appears that one node (not always the same one) can no longer talk to itself on 127.0.0.1 and other nodes show the same entries with the real IP of that node.  I can reproduce the problem every time.  If you need more verbose logs or debug info, please provide details on how to get it.

RHEL 6.1 x64
2.6.32-131.12.1.el6.x86_64
Bricks are all 5GB with xfs.
glusterfs 3.2.3 built on Aug 23 2011 05:40:51 #Installed via the RPMs hosted on gluster.com

Volume Name: vol_mainhome
Type: Distributed-Replicate
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: gfs-dev-01b.cssd.pitt.edu:/bricks/lv_brick_01_001a
Brick2: gfs-dev-03b.cssd.pitt.edu:/bricks/lv_brick_03_001b
Brick3: gfs-dev-01b.cssd.pitt.edu:/bricks/lv_brick_01_002a
Brick4: gfs-dev-03b.cssd.pitt.edu:/bricks/lv_brick_03_002b
Brick5: gfs-dev-02b.cssd.pitt.edu:/bricks/lv_brick_02_003a
Brick6: gfs-dev-03b.cssd.pitt.edu:/bricks/lv_brick_03_003b
Brick7: gfs-dev-02b.cssd.pitt.edu:/bricks/lv_brick_02_004a # Was added with add-brick
Brick8: gfs-dev-03b.cssd.pitt.edu:/bricks/lv_brick_03_004b # Was added with add-brick

[root@gfs-dev-02b ~]# gluster volume rebalance vol_mainhome start

[root@gfs-dev-02b ~]# gluster volume rebalance vol_mainhome status
rebalance completed: rebalanced 70 files of size 716800000 (total files scanned 1010)

[root@gfs-dev-03b ~]# tail /var/log/glusterfs/etc-glusterfs-glusterd.vol.log 
[2011-09-16 12:15:05.86955] W [glusterd-rebalance.c:242:gf_glusterd_rebalance_move_data] 0-glusterfs: file got changed after we started copying /etc/glusterd/mount/vol_mainhome/foo.238
[2011-09-16 12:15:05.749901] W [glusterd-rebalance.c:251:gf_glusterd_rebalance_move_data] 0-glusterfs: file sizes are not same : /etc/glusterd/mount/vol_mainhome/foo.240
[2011-09-16 12:15:06.384116] W [glusterd-rebalance.c:242:gf_glusterd_rebalance_move_data] 0-glusterfs: file got changed after we started copying /etc/glusterd/mount/vol_mainhome/foo.490
[2011-09-16 12:15:07.30771] W [glusterd-rebalance.c:242:gf_glusterd_rebalance_move_data] 0-glusterfs: file got changed after we started copying /etc/glusterd/mount/vol_mainhome/foo.491
[2011-09-16 12:15:07.809444] W [glusterd-rebalance.c:242:gf_glusterd_rebalance_move_data] 0-glusterfs: file got changed after we started copying /etc/glusterd/mount/vol_mainhome/foo.492
[2011-09-16 12:15:08.404402] W [glusterd-rebalance.c:242:gf_glusterd_rebalance_move_data] 0-glusterfs: file got changed after we started copying /etc/glusterd/mount/vol_mainhome/foo.499
[2011-09-16 12:15:15.798743] I [glusterd-rebalance.c:460:glusterd_defrag_start] 0-rebalance: rebalance on /etc/glusterd/mount/vol_mainhome complete
[2011-09-16 12:15:16.273285] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:996)
[2011-09-16 12:27:48.287927] I [glusterd-rebalance.c:603:glusterd_rebalance_cmd_attempted_log] 0-glusterd: Received rebalance volume 3 on vol_mainhome
[2011-09-16 12:27:48.303323] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:1021)

Before:
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_system-lv_brick_03_001b  5.0G  2.6G  2.5G  51% /bricks/lv_brick_03_001b
/dev/mapper/vg_system-lv_brick_03_002b  5.0G  2.4G  2.7G  47% /bricks/lv_brick_03_002b
/dev/mapper/vg_system-lv_brick_03_003b  5.0G  2.5G  2.6G  50% /bricks/lv_brick_03_003b
/dev/mapper/vg_system-lv_brick_03_004b  5.0G   33M  5.0G   1% /bricks/lv_brick_03_004b
During:
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_system-lv_brick_03_001b  5.0G  2.4G  2.6G  49% /bricks/lv_brick_03_001b
/dev/mapper/vg_system-lv_brick_03_002b  5.0G  2.4G  2.7G  47% /bricks/lv_brick_03_002b
/dev/mapper/vg_system-lv_brick_03_003b  5.0G  2.8G  2.3G  55% /bricks/lv_brick_03_003b
/dev/mapper/vg_system-lv_brick_03_004b  5.0G  961M  4.1G  19% /bricks/lv_brick_03_004
After:
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_system-lv_brick_03_001b  5.0G  4.1G  916M  83% /bricks/lv_brick_03_001b
/dev/mapper/vg_system-lv_brick_03_002b  5.0G  3.2G  1.9G  64% /bricks/lv_brick_03_002b
/dev/mapper/vg_system-lv_brick_03_003b  5.0G  3.0G  2.1G  60% /bricks/lv_brick_03_003b
/dev/mapper/vg_system-lv_brick_03_004b  5.0G  1.9G  3.2G  38% /bricks/lv_brick_03_004b

[root@gfs-dev-04b ~]# ls -la /mnt/fuse/vol_mainhome/
-rw-r--r-- 1 root root 10240000 Sep 16 12:56 .foo.75.gfs10240000
-rw-r--r-- 1 root root 10240000 Sep 16 10:58 foo.76
-rw-r--r-- 1 root root 10240000 Sep 16 12:13 .foo.76.gfs10240000
-rw-r--r-- 1 root root 10240000 Sep 16 11:43 foo.77
-rw-r--r-- 1 root root 10240000 Sep 16 12:07 .foo.77.gfs10240000
-rw-r--r-- 1 root root 10240000 Sep 16 10:58 foo.78
-rw-r--r-- 1 root root 10240000 Sep 16 12:56 .foo.78.gfs10240000
-rw-r--r-- 1 root root 10240000 Sep 16 10:58 foo.79
-rw-r--r-- 1 root root 10240000 Sep 16 12:56 .foo.79.gfs10240000
-rw-r--r-- 1 root root 10240000 Sep 16 12:13 .foo.7.gfs10240000
-rw-r--r-- 1 root root 10240000 Sep 16 10:52 foo.8
-rw-r--r-- 1 root root 10240000 Sep 16 10:58 foo.80
-rw-r--r-- 1 root root 10240000 Sep 16 12:56 .foo.80.gfs10240000
-rw-r--r-- 1 root root 10240000 Sep 16 10:58 foo.81
-rw-r--r-- 1 root root 10240000 Sep 16 10:58 foo.82
-rw-r--r-- 1 root root 10240000 Sep 16 10:58 foo.83
-rw-r--r-- 1 root root 10240000 Sep 16 12:56 .foo.83.gfs10240000
-rw-r--r-- 1 root root 10240000 Sep 16 10:58 foo.84
-rw-r--r-- 1 root root 10240000 Sep 16 12:56 .foo.84.gfs10240000

The log entires about the file changing after the copy started are wrong, the data did not change.  I have also tried to run a self-heal (using find on a FUSE mount) between adding the new bricks and doing the rebalance but I get the same result.

I'll attach the /var/log/glusterfs directory from each server.  Note I issued the rebalance on gfs-dev-03b.

Comment 4 Amar Tumballi 2011-10-02 05:15:35 UTC
Hi,

Do you happen to still have log files on machine where you issued the rebalance operation?

I want to check the log file '/var/log/glusterfs/etc-glusterd-mount-<volname>.log'. Please share the 'warning/error' messages in the log file please. That would help to see what would have happened.

Regards,
Amar

Comment 5 jaw171 2011-10-03 10:41:52 UTC
(In reply to comment #4)
> Hi,
> 
> Do you happen to still have log files on machine where you issued the rebalance
> operation?
> 
> I want to check the log file
> '/var/log/glusterfs/etc-glusterd-mount-<volname>.log'. Please share the
> 'warning/error' messages in the log file please. That would help to see what
> would have happened.
> 
> Regards,
> Amar

I do not have the logs anymore but what I already uploaded should be the entire /var/log/glusterfs of each server.

Comment 6 Anand Avati 2011-10-10 07:57:58 UTC
CHANGE: http://review.gluster.com/552 (Change-Id: Ie13b0f2748b15c0359b7868460d4bc65ec05a217) merged in release-3.2 by Vijay Bellur (vijay)

Comment 7 Amar Tumballi 2011-10-13 07:38:19 UTC
found issue to be write-behind. You can disable write behind on the volume and try to rebalance, now it should work fine.

Comment 8 Amar Tumballi 2011-10-25 04:03:26 UTC
bug 765476 should fix these issues in release-3.2 branch. The fixes are already in master branch.


patches at : http://review.gluster.com/613 and http://review.gluster.com/614

Comment 9 Amar Tumballi 2011-10-28 10:00:25 UTC
This particular bug, which talks about consuming more space after a rebalance is fixed as per the commit http://review.gluster.com/552

more fixes will follow with bug 765476

Comment 10 shylesh 2011-11-02 08:53:58 UTC
After rebalance no new files are created now.