Bug 763121 (GLUSTER-1389) - scale-n-defrag fails to work as expected in 3.0.5
Summary: scale-n-defrag fails to work as expected in 3.0.5
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-1389
Product: GlusterFS
Classification: Community
Component: distribute
Version: 3.0.5
Hardware: All
OS: Linux
urgent
medium
Target Milestone: ---
Assignee: Amar Tumballi
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-08-17 22:08 UTC by Harshavardhana
Modified: 2015-12-01 16:45 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Harshavardhana 2010-08-17 19:26:57 UTC
New insights :-

With distribute having only single top level directory scale-n-defrag doesn't work at all. 

With distribute having multiple top level directories scale-n-defrag works. 

It even works with 3.0.4 and also 3.0.5

Seems like an odd behaviour, but even more strange is that we have no extended attributes written to the backend on client13.

Comment 1 Harshavardhana 2010-08-17 22:08:45 UTC
Adding additional nodes to distribute volume and running scale-n-defrag doesn't work on CentOS 5 nodes. 

New server clearly shows back-end being hosed as it never actually has its scaled extended attribute written to it. 

Existing servers 2 server one client "dht" after running defrag

[root@client14 ~]# getfattr -d -m. /mnt/gfs2/ -e hex
getfattr: Removing leading '/' from absolute path names
# file: mnt/gfs2
trusted.glusterfs.dht=0x0000000100000000000000007ffffffe
trusted.posix1.gen=0x4c4f942d00000001

[root@client15 ~]# getfattr -d -m. /mnt/gfs1 -e hex
getfattr: Removing leading '/' from absolute path names
# file: mnt/gfs1
trusted.glusterfs.dht=0x00000001000000007fffffffffffffff
trusted.posix1.gen=0x4c4f944400000001


New server after running defrag, should have got another trusted.glusterfs.dht 

[root@client13 ~]# getfattr -d -m. /mnt/gfs2/ -e hex
getfattr: Removing leading '/' from absolute path names
# file: mnt/gfs2
trusted.posix1.gen=0x4c6aed8f00000001

Only directory structure is sanitized, files are not moved at all. 

Options used in distribute for defragmenting were

"option lookup-unhashed on"
"option unhashed-sticky-bit on"

Works with 3.0.4

Comment 2 Amar Tumballi 2010-09-01 09:21:38 UTC
scale-n-defrag scripts doesn't work if 'sticky bit' (mode 01000) is not shown in stat information of the file. This can happen if 'stat-prefetch' is present in the volume file.

need to check that.

Comment 3 Amar Tumballi 2010-10-04 02:11:49 UTC
Anyways, with 3.1.0 release, 'gluster volume rebalance' seems to be stabilized. Do we need to address this anymore in 3.0.x version?

Comment 4 Harshavardhana 2010-10-15 22:32:55 UTC
(In reply to comment #3)
> Anyways, with 3.1.0 release, 'gluster volume rebalance' seems to be stabilized.
> Do we need to address this anymore in 3.0.x version?

Ye we need until all the customers have migrated to 3.1, i don't think we are doing end of life yet for 3.0.x. Its just that recommendation should be 3.1, if they are willing to stick with 3.0.x then why not.

Comment 5 Amar Tumballi 2010-10-19 05:59:31 UTC
Just noticed few behaviors of scale-n-defrag script,

One should not give 'directory_to_be_scaled' as mountpoint itself, as the 'setfattr -x' (ie, removexattr()) on root will cause stale mount point, after which to recover the mount point, one needs remounting in 3.0.x releases.

If directory to be scaled is a subdirectory inside mount point, everything works smoothly.


Note You need to log in before you can comment on or make changes to this bug.