Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 764758 (GLUSTER-3026)

Summary:	add-brick to somehow sync the change in pathinfo map in the client mount
Product:	[Community] GlusterFS	Reporter:	mohitanchlia
Component:	core	Assignee:	Venky Shankar <vshankar>
Status:	CLOSED WORKSFORME	QA Contact:
Severity:	low	Docs Contact:
Priority:	high
Version:	3.1.3	CC:	amarts, gluster-bugs, jdarcy
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-03-15 11:30:20 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description mohitanchlia 2011-06-14 18:41:37 UTC

I added a brick using add-brick and ran fix-layout + rebalance thinking that new files will be load balanced into the new brick. But it didn't. I looked at the client pathinfo xattr attribute and it seemed not have changed.

I then remounted client and that fixed the issue. So it looks like when one adds a brick client doesn't know about it. This means you can't dynamically add the brick and make is usable. One need to take a downtime and remount the gluster fs.

It will be good to sync the changes or do periodic syncs from client to server or may be server to clients. You know better :)

Comment 1 Anand Avati 2011-06-15 00:25:52 UTC

Venky,
Can you look into this while you're at fixing pathinfo at various levels?

Avati

Comment 2 Venky Shankar 2011-06-17 06:31:32 UTC

(In reply to comment #0)
> I added a brick using add-brick and ran fix-layout + rebalance thinking that
> new files will be load balanced into the new brick. But it didn't. I looked at
> the client pathinfo xattr attribute and it seemed not have changed.
> 
> I then remounted client and that fixed the issue. So it looks like when one
> adds a brick client doesn't know about it. This means you can't dynamically add
> the brick and make is usable. One need to take a downtime and remount the
> gluster fs.
> 
> It will be good to sync the changes or do periodic syncs from client to server
> or may be server to clients. You know better :)

Hey Mohit,

What is your setup ? Did you run migrate-data ?

I ran a 2X2 distribute-replicate setup, then added a third brick (pair). Then ran fix-layout and migrate-data and i could see the files getting spread out in the dht layout.

-Venky

Comment 3 Jeff Darcy 2011-06-17 10:29:03 UTC

Venky: this started with an IRC conversation between myself and Mohit, so I might be able to clarify.  The issue seems to be not that the rebalance doesn't migrate data properly - as you point out, it does - but clients might retain stale layout-map data until they unmount/remount.  IIRC, the initial symptom was that a client was using this stale data for hashing, and would initially "guess wrong" about which brick held a particular file.  It would still find the file eventually, but it would also create a superfluous linkfile from the "wrong" brick X to the "right" brick Y - superfluous because anyone with the current layout map would know to look on Y in the first place.  (In general, it might be nice to have a way to prune linkfiles after reconfiguration, but that's not the subject of this bug.)  It was only after he noticed the extra linkfiles that Mohit examined trusted.glusterfs.pathinfo on the directory and saw that the info it returned was out of date.  Mohit, did I manage to get that right?

Comment 4 mohitanchlia 2011-06-17 13:50:33 UTC

(In reply to comment #3)
> Venky: this started with an IRC conversation between myself and Mohit, so I
> might be able to clarify.  The issue seems to be not that the rebalance doesn't
> migrate data properly - as you point out, it does - but clients might retain
> stale layout-map data until they unmount/remount.  IIRC, the initial symptom
> was that a client was using this stale data for hashing, and would initially
> "guess wrong" about which brick held a particular file.  It would still find
> the file eventually, but it would also create a superfluous linkfile from the
> "wrong" brick X to the "right" brick Y - superfluous because anyone with the
> current layout map would know to look on Y in the first place.  (In general, it
> might be nice to have a way to prune linkfiles after reconfiguration, but
> that's not the subject of this bug.)  It was only after he noticed the extra
> linkfiles that Mohit examined trusted.glusterfs.pathinfo on the directory and
> saw that the info it returned was out of date.  Mohit, did I manage to get that
> right?

Thanks Jeff!! That's perfect. I couldn't have explained it better. Yes that is the problem I had. It started creating link files. I had umount/mount the client in order to make is sane. And then I ran rebalance which corrected everything. This is painful because most likely you will add new bricks as you go along.

Comment 5 Venky Shankar 2011-06-20 09:59:09 UTC

> 
> Thanks Jeff!! That's perfect. I couldn't have explained it better. Yes that is
> the problem I had. It started creating link files. I had umount/mount the
> client in order to make is sane. And then I ran rebalance which corrected
> everything. This is painful because most likely you will add new bricks as you
> go along.

You get the link files because after adding a brick (and running fix-layout/rebalance), a file may get hashed to a new location in the layout; but the client remembers the subvolume to which that file was hashed earlier and sends out two request to the back-end if the cached subvolume and the hashed subvolume (which the file will now hash to) differs - here is when you get the ' Link: ' in the pathinfo xattr.

Ideally, the rebalance should keep only a single copy and the client should update itself with the new location of the file in the layout - as Jeff pointed out.

I'll look into this while i fix other bits with pathinfo.

Comment 6 Venky Shankar 2011-07-06 02:54:34 UTC

Mohit, can you try out the patch mentioned in bug 764013 (comment #15 for 3.1). This should fix the stale entries that you see in pathinfo xattr in client.

For the link files thing , i have more insight of it now. We create the link files when you run rebalance +  migrate-data; this link file gets created in the newly hashed subvolume for the file. Then for all files that have a link file entry in the file system we copy the data from the original file to a temp file and then 'rename' the temp file it to the original one. But we have a check where we prevent data movement from a node with higher disk space to a node with lesses disk space. Here is where both link file and the original file exist in the FS.

You can override this disk-space check using 'force' option iff you are sure you have enough disk space in the nodes that were in the layout before you added the new brick (in case a files gets hashed into those)

Comment 7 mohitanchlia 2011-07-06 13:14:51 UTC

(In reply to comment #6)
> Mohit, can you try out the patch mentioned in bug 764013 (comment #15 for 3.1).
> This should fix the stale entries that you see in pathinfo xattr in client.
> For the link files thing , i have more insight of it now. We create the link
> files when you run rebalance +  migrate-data; this link file gets created in
> the newly hashed subvolume for the file. Then for all files that have a link
> file entry in the file system we copy the data from the original file to a temp
> file and then 'rename' the temp file it to the original one. But we have a
> check where we prevent data movement from a node with higher disk space to a
> node with lesses disk space. Here is where both link file and the original file
> exist in the FS.
> You can override this disk-space check using 'force' option iff you are sure
> you have enough disk space in the nodes that were in the layout before you
> added the new brick (in case a files gets hashed into those)

Thanks! Regarding linked files I was testing against only 10-20 files of few bytes each. And each brick has 100s of Gigs space.

Comment 8 mohitanchlia 2011-07-21 13:47:48 UTC

(In reply to comment #7)
> (In reply to comment #6)
> > Mohit, can you try out the patch mentioned in bug 764013 (comment #15 for 3.1).
> > This should fix the stale entries that you see in pathinfo xattr in client.
> > For the link files thing , i have more insight of it now. We create the link
> > files when you run rebalance +  migrate-data; this link file gets created in
> > the newly hashed subvolume for the file. Then for all files that have a link
> > file entry in the file system we copy the data from the original file to a temp
> > file and then 'rename' the temp file it to the original one. But we have a
> > check where we prevent data movement from a node with higher disk space to a
> > node with lesses disk space. Here is where both link file and the original file
> > exist in the FS.
> > You can override this disk-space check using 'force' option iff you are sure
> > you have enough disk space in the nodes that were in the layout before you
> > added the new brick (in case a files gets hashed into those)
> Thanks! Regarding linked files I was testing against only 10-20 files of few
> bytes each. And each brick has 100s of Gigs space.

Is this also going to be fixed? I have 100 times more disk space.

Also, is that patch part of 3.2.2?

Comment 9 Amar Tumballi 2013-03-15 11:30:20 UTC

with the latest codebase (3.4.0*+), and trying 'rebalance force' option, we are never hitting the issue.