765308 – (GLUSTER-3576) Rebalance on Distributed-Replicate in fails and consumes more space then before

Bug 765308 (GLUSTER-3576) - Rebalance on Distributed-Replicate in fails and consumes more space then before

Summary: Rebalance on Distributed-Replicate in fails and consumes more space then before

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	GLUSTER-3576
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	3.2.3
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Amar Tumballi
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-09-16 17:22 UTC by jaw171
Modified:	2013-12-19 00:06 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:	3.2.5qa4
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Logs from gfs-dev-01b (7.11 KB, application/x-compressed-tar) 2011-09-16 14:23 UTC, jaw171	no flags	Details
Logs from gfs-dev-02b (5.97 KB, application/x-compressed-tar) 2011-09-16 14:23 UTC, jaw171	no flags	Details
Logs from gfs-dev-03b (16.69 KB, application/x-compressed-tar) 2011-09-16 14:24 UTC, jaw171	no flags	Details
View All

Description jaw171 2011-09-16 14:23:20 UTC

Created attachment 663

Comment 1 jaw171 2011-09-16 14:23:48 UTC

Created attachment 664

Comment 2 jaw171 2011-09-16 14:24:10 UTC

Created attachment 665

Comment 3 jaw171 2011-09-16 17:22:22 UTC

I created a volume with 3 replicated subvolumes and filled it up to 50% with 10MB files of random data.  I then added another replicated subvolume and issues a rebalance.  At first the df output shows the usage of the first 3 bricks go down and the fourth go up but then something goes wrong and the usage of the first 3 bricks goes up and beyond what they originally had.  The volume gets filled with hidden files named after each real file and its file size as as .REALNAME.FILESIZE (see below). If I delete all of those hidden files the disk usage on each subvolume goes back to exactly what it was before the rebalance.  Logs show "Transport endpoint is not connected".  it appears that one node (not always the same one) can no longer talk to itself on 127.0.0.1 and other nodes show the same entries with the real IP of that node.  I can reproduce the problem every time.  If you need more verbose logs or debug info, please provide details on how to get it.

RHEL 6.1 x64
2.6.32-131.12.1.el6.x86_64
Bricks are all 5GB with xfs.
glusterfs 3.2.3 built on Aug 23 2011 05:40:51 #Installed via the RPMs hosted on gluster.com

Volume Name: vol_mainhome
Type: Distributed-Replicate
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: gfs-dev-01b.cssd.pitt.edu:/bricks/lv_brick_01_001a
Brick2: gfs-dev-03b.cssd.pitt.edu:/bricks/lv_brick_03_001b
Brick3: gfs-dev-01b.cssd.pitt.edu:/bricks/lv_brick_01_002a
Brick4: gfs-dev-03b.cssd.pitt.edu:/bricks/lv_brick_03_002b
Brick5: gfs-dev-02b.cssd.pitt.edu:/bricks/lv_brick_02_003a
Brick6: gfs-dev-03b.cssd.pitt.edu:/bricks/lv_brick_03_003b
Brick7: gfs-dev-02b.cssd.pitt.edu:/bricks/lv_brick_02_004a # Was added with add-brick
Brick8: gfs-dev-03b.cssd.pitt.edu:/bricks/lv_brick_03_004b # Was added with add-brick

[root@gfs-dev-02b ~]# gluster volume rebalance vol_mainhome start

[root@gfs-dev-02b ~]# gluster volume rebalance vol_mainhome status
rebalance completed: rebalanced 70 files of size 716800000 (total files scanned 1010)

[root@gfs-dev-03b ~]# tail /var/log/glusterfs/etc-glusterfs-glusterd.vol.log 
[2011-09-16 12:15:05.86955] W [glusterd-rebalance.c:242:gf_glusterd_rebalance_move_data] 0-glusterfs: file got changed after we started copying /etc/glusterd/mount/vol_mainhome/foo.238
[2011-09-16 12:15:05.749901] W [glusterd-rebalance.c:251:gf_glusterd_rebalance_move_data] 0-glusterfs: file sizes are not same : /etc/glusterd/mount/vol_mainhome/foo.240
[2011-09-16 12:15:06.384116] W [glusterd-rebalance.c:242:gf_glusterd_rebalance_move_data] 0-glusterfs: file got changed after we started copying /etc/glusterd/mount/vol_mainhome/foo.490
[2011-09-16 12:15:07.30771] W [glusterd-rebalance.c:242:gf_glusterd_rebalance_move_data] 0-glusterfs: file got changed after we started copying /etc/glusterd/mount/vol_mainhome/foo.491
[2011-09-16 12:15:07.809444] W [glusterd-rebalance.c:242:gf_glusterd_rebalance_move_data] 0-glusterfs: file got changed after we started copying /etc/glusterd/mount/vol_mainhome/foo.492
[2011-09-16 12:15:08.404402] W [glusterd-rebalance.c:242:gf_glusterd_rebalance_move_data] 0-glusterfs: file got changed after we started copying /etc/glusterd/mount/vol_mainhome/foo.499
[2011-09-16 12:15:15.798743] I [glusterd-rebalance.c:460:glusterd_defrag_start] 0-rebalance: rebalance on /etc/glusterd/mount/vol_mainhome complete
[2011-09-16 12:15:16.273285] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:996)
[2011-09-16 12:27:48.287927] I [glusterd-rebalance.c:603:glusterd_rebalance_cmd_attempted_log] 0-glusterd: Received rebalance volume 3 on vol_mainhome
[2011-09-16 12:27:48.303323] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:1021)

Before:
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_system-lv_brick_03_001b  5.0G  2.6G  2.5G  51% /bricks/lv_brick_03_001b
/dev/mapper/vg_system-lv_brick_03_002b  5.0G  2.4G  2.7G  47% /bricks/lv_brick_03_002b
/dev/mapper/vg_system-lv_brick_03_003b  5.0G  2.5G  2.6G  50% /bricks/lv_brick_03_003b
/dev/mapper/vg_system-lv_brick_03_004b  5.0G   33M  5.0G   1% /bricks/lv_brick_03_004b
During:
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_system-lv_brick_03_001b  5.0G  2.4G  2.6G  49% /bricks/lv_brick_03_001b
/dev/mapper/vg_system-lv_brick_03_002b  5.0G  2.4G  2.7G  47% /bricks/lv_brick_03_002b
/dev/mapper/vg_system-lv_brick_03_003b  5.0G  2.8G  2.3G  55% /bricks/lv_brick_03_003b
/dev/mapper/vg_system-lv_brick_03_004b  5.0G  961M  4.1G  19% /bricks/lv_brick_03_004
After:
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_system-lv_brick_03_001b  5.0G  4.1G  916M  83% /bricks/lv_brick_03_001b
/dev/mapper/vg_system-lv_brick_03_002b  5.0G  3.2G  1.9G  64% /bricks/lv_brick_03_002b
/dev/mapper/vg_system-lv_brick_03_003b  5.0G  3.0G  2.1G  60% /bricks/lv_brick_03_003b
/dev/mapper/vg_system-lv_brick_03_004b  5.0G  1.9G  3.2G  38% /bricks/lv_brick_03_004b

[root@gfs-dev-04b ~]# ls -la /mnt/fuse/vol_mainhome/
-rw-r--r-- 1 root root 10240000 Sep 16 12:56 .foo.75.gfs10240000
-rw-r--r-- 1 root root 10240000 Sep 16 10:58 foo.76
-rw-r--r-- 1 root root 10240000 Sep 16 12:13 .foo.76.gfs10240000
-rw-r--r-- 1 root root 10240000 Sep 16 11:43 foo.77
-rw-r--r-- 1 root root 10240000 Sep 16 12:07 .foo.77.gfs10240000
-rw-r--r-- 1 root root 10240000 Sep 16 10:58 foo.78
-rw-r--r-- 1 root root 10240000 Sep 16 12:56 .foo.78.gfs10240000
-rw-r--r-- 1 root root 10240000 Sep 16 10:58 foo.79
-rw-r--r-- 1 root root 10240000 Sep 16 12:56 .foo.79.gfs10240000
-rw-r--r-- 1 root root 10240000 Sep 16 12:13 .foo.7.gfs10240000
-rw-r--r-- 1 root root 10240000 Sep 16 10:52 foo.8
-rw-r--r-- 1 root root 10240000 Sep 16 10:58 foo.80
-rw-r--r-- 1 root root 10240000 Sep 16 12:56 .foo.80.gfs10240000
-rw-r--r-- 1 root root 10240000 Sep 16 10:58 foo.81
-rw-r--r-- 1 root root 10240000 Sep 16 10:58 foo.82
-rw-r--r-- 1 root root 10240000 Sep 16 10:58 foo.83
-rw-r--r-- 1 root root 10240000 Sep 16 12:56 .foo.83.gfs10240000
-rw-r--r-- 1 root root 10240000 Sep 16 10:58 foo.84
-rw-r--r-- 1 root root 10240000 Sep 16 12:56 .foo.84.gfs10240000

The log entires about the file changing after the copy started are wrong, the data did not change.  I have also tried to run a self-heal (using find on a FUSE mount) between adding the new bricks and doing the rebalance but I get the same result.

I'll attach the /var/log/glusterfs directory from each server.  Note I issued the rebalance on gfs-dev-03b.

Comment 4 Amar Tumballi 2011-10-02 05:15:35 UTC

Hi,

Do you happen to still have log files on machine where you issued the rebalance operation?

I want to check the log file '/var/log/glusterfs/etc-glusterd-mount-<volname>.log'. Please share the 'warning/error' messages in the log file please. That would help to see what would have happened.

Regards,
Amar

Comment 5 jaw171 2011-10-03 10:41:52 UTC

(In reply to comment #4)
> Hi,
> 
> Do you happen to still have log files on machine where you issued the rebalance
> operation?
> 
> I want to check the log file
> '/var/log/glusterfs/etc-glusterd-mount-<volname>.log'. Please share the
> 'warning/error' messages in the log file please. That would help to see what
> would have happened.
> 
> Regards,
> Amar

I do not have the logs anymore but what I already uploaded should be the entire /var/log/glusterfs of each server.

Comment 6 Anand Avati 2011-10-10 07:57:58 UTC

CHANGE: http://review.gluster.com/552 (Change-Id: Ie13b0f2748b15c0359b7868460d4bc65ec05a217) merged in release-3.2 by Vijay Bellur (vijay)

Comment 7 Amar Tumballi 2011-10-13 07:38:19 UTC

found issue to be write-behind. You can disable write behind on the volume and try to rebalance, now it should work fine.

Comment 8 Amar Tumballi 2011-10-25 04:03:26 UTC

bug 765476 should fix these issues in release-3.2 branch. The fixes are already in master branch.


patches at : http://review.gluster.com/613 and http://review.gluster.com/614

Comment 9 Amar Tumballi 2011-10-28 10:00:25 UTC

This particular bug, which talks about consuming more space after a rebalance is fixed as per the commit http://review.gluster.com/552

more fixes will follow with bug 765476

Comment 10 shylesh 2011-11-02 08:53:58 UTC

After rebalance no new files are created now.

Note You need to log in before you can comment on or make changes to this bug.