Bug 763869 (GLUSTER-2137)

Summary: dhtafr - self heal after renaming directory
Product: [Community] GlusterFS Reporter: Lakshmipathi G <lakshmipathi>
Component: replicateAssignee: shishir gowda <sgowda>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: 3.1.1CC: gluster-bugs, nsathyan, rabhat, vijay, vs
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: fuse
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Lakshmipathi G 2010-11-22 08:33:42 UTC
from user mailing list - (3.1.1qa9 has this issue too)
----------------
I am using Distribute translator over the replicate . My setup is as
follows.



Servers – S1,S2,S3,S4

Replicate R1 over S1 and S2.

Replicate R2 over S3 and S4.

Distribute D1 over R1 and R2



Clients are put on 2 machine C1 and C2 with the above setup.



During tests im facing a specific issue. Putting down the scenario below.



-          Create a Directory /Dir1 in the root filesystem on client.

-          Create a file[I used touch] /Dir1/a.txt.

-          Shutdown server S1.

-          Move the Directory /Dir1 to /Dir2.

-          Create a new Directory /Dir1.

-          Create a new File /Dir1/b.txt.

-          Now Start Server S1.

-          Do an “ls –la” on the client to initiate the afr – autohealing.


Below are the results I am observing:

-          The new Directory Dir1 sometimes shows both a.txt and b.txt .
Expected is just b.txt.

-          The new Directory Dir1 shows only a.txt . The b.txt file is
entirely missing.



These are random results.



My expected result is :

                /Dir1/b.txt

                /Dir2/a.txt



Please have look . Seems like autoheal is not able to figureout the actual
events. Also do let me know if my configs are wrong.



Tx

Vikas



Client Config file:

--------------------



volume 172.26.98.24-1

    type protocol/client

    option transport-type tcp

    option remote-host 172.26.98.24

    option transport.socket.nodelay on

    option transport.remote-port 6996

    option remote-subvolume brickex

end-volume



volume 172.26.98.25-1

    type protocol/client

    option transport-type tcp

    option remote-host 172.26.98.25

    option transport.socket.nodelay on

    option transport.remote-port 6996

    option remote-subvolume brickex

end-volume



volume 172.26.98.26-1

    type protocol/client

    option transport-type tcp

    option remote-host 172.26.98.26

    option transport.socket.nodelay on

    option transport.remote-port 6996

    option remote-subvolume brickex

end-volume



volume 172.26.98.27-1

    type protocol/client

    option transport-type tcp

    option remote-host 172.26.98.27

    option transport.socket.nodelay on

    option transport.remote-port 6996

    option remote-subvolume brickex

end-volume



volume replicate-1

    type cluster/replicate

    subvolumes 172.26.98.24-1 172.26.98.25-1

end-volume



volume replicate-2

    type cluster/replicate

    subvolumes 172.26.98.26-1 172.26.98.27-1

end-volume



volume distribute-1

    type cluster/distribute

    lookup-unhashed yes

    subvolumes replicate-1 replicate-2

end-volume



#volume stripe

#    type cluster/stripe

#    option block-size 1MB

#    subvolumes replicate-1 replicate-2 replicate-3

#end-volume



volume writebehind

    type performance/write-behind

    option cache-size 4MB

    subvolumes distribute-1

end-volume


=====================

Comment 1 Vijay Bellur 2011-03-25 08:11:52 UTC
PATCH: http://patches.gluster.com/patch/6552 in master (Process dir/link from other subvol if error in dht_readdir)

Comment 2 Raghavendra Bhat 2011-04-13 08:39:20 UTC
Checked with 3.1.3. dir1 was empty after self healing. With master dir1 showed b.txt and dir2 showed a.txt.