Bug 762672 (GLUSTER-940)

Summary: ESTALE on / when one of the servers is restarted
Product: [Community] GlusterFS Reporter: Vikas Gorur <vikas>
Component: unclassifiedAssignee: shishir gowda <sgowda>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: high    
Version: 3.0.4CC: amarts, gluster-bugs, Niu.ZGlinux, nsathyan, rabhat, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Vikas Gorur 2010-05-21 22:10:44 UTC
This bug has been seen a couple of times in the wild.

Scenario #1:

A pure-distribute setup with 6 servers. One of the server machine goes down and another machine assumes its responsibility. It starts its own GlusterFS server process and starts exporting the same LUNs that the now-dead server was exporting. Client starts seeing LOOKUP / => ESTALE

Scenario #2:

4-server distribute+replicate setup. One of the servers is shut down, its disk taken out and replaced with a blank one. GlusterFS started again, and self-heal triggered from the client. Client starts seeing LOOKUP / => ESTALE.

Comment 1 Vikas Gorur 2010-06-11 18:16:15 UTC
I happened to reproduce this inadvertently myself. The client volume file had a mistake and distribute's two subvolumes were identical (total 2 subvolumes). Mounting and remounting multiple times still led to the error ESTALE.

Comment 2 Niu Zhenguo 2010-08-30 04:31:35 UTC
glusterfs will get a different inode number from the new disk, when cbk, there should check the new inode number with the cached one, if it's not matched. errno will set to ESTALE. so if you want to replace a machine or a disk.you should flush cache before that. (In reply to comment #0)
> This bug has been seen a couple of times in the wild.
> 
> Scenario #1:
> 
> A pure-distribute setup with 6 servers. One of the server machine goes down and
> another machine assumes its responsibility. It starts its own GlusterFS server
> process and starts exporting the same LUNs that the now-dead server was
> exporting. Client starts seeing LOOKUP / => ESTALE
> 
> Scenario #2:
> 
> 4-server distribute+replicate setup. One of the servers is shut down, its disk
> taken out and replaced with a blank one. GlusterFS started again, and self-heal
> triggered from the client. Client starts seeing LOOKUP / => ESTALE.

Comment 3 Niu Zhenguo 2010-08-31 00:47:13 UTC
sorry, it's not inode number, st_dev changed, client_lookup_cbk will check it.

Comment 4 shishir gowda 2010-10-05 09:22:43 UTC
gfid changes invalidates this bug