Bug 762616 (GLUSTER-884)

Summary: I/O errors after fixed split brain and successfully completed self heal
Product: [Community] GlusterFS Reporter: Simone Gotti <simone.gotti>
Component: replicateAssignee: Pavan Vilas Sondur <pavan>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: 3.0.3CC: gluster-bugs, pavan
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTP Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Simone Gotti 2010-05-04 22:01:07 UTC
Hi,

I was testing a split brain recovery scenario with a simple replicate.

I noticed that after creating a split brain situation and fixing it by removing a file from one of the servers, the self heal started and completed, but on the next file accesses I was getting I/O errors as before.

Doing various experiments I noticed that it's probably related to inodes caching in fuse.

This looks confirmed because:

*) Unmounting and remounting the client the errors goes away.

*) Dropping the client vm caches the errors goes away (echo 2 > /proc/sys/vm/drop_caches or echo 3 > /proc/sys/vm/drop_caches to remove also pagecache).


Looking at the code looks like when there's a split brain situation the flag AFR_ICTX_SPLIT_BRAIN_MASK is setted on the inode ctx for this xlator but it's never removed after self heal was completed successfully.

I did a simple tests patch against master branch that fixed this bug, but I'm not expert with glusterfs internals so I can miss something on the patch behavior and consequences.


== How to reproduce ==

2 servers: server1 providing brick1, server2 providing brick2.
On the client 1 replicate volume on these 2 bricks. Nothing else to avoid other kinds of caching.

volume client1
  type protocol/client
  option transport-type tcp 
  option remote-host server1
  option transport.socket.remote-port 6996
  option remote-subvolume brick1
end-volume

volume client2
  type protocol/client
  option transport-type tcp
  option remote-host server2
  option transport.socket.remote-port 6996
  option remote-subvolume brick2
end-volume

volume replicate
  type cluster/replicate
  subvolumes client1 client2
end-volume


=== Create a split brain situation ===

*) start server1 and server2.

*) Create a file (echo "something" > file1)

*) Stop server1

*) echo "something" > file1

*) Start server1, Stop server2

*) echo "something" > file1

*) Start server2

*) echo "something" > file1 => you'll get I/O errors due to split brain.


=== Fix split brain removing the file on server1 ===

*) Stop server1

*) remove file1 on the server

*) Start server1


=== Test if split brain is fixed ===

*) echo "something" > file1 => From the logs you'll see that self heal is triggered and started.

Note: You'll get I/O errors as the first open happens before self heal completion, is this the wanted and right behavior?

*) Wait self heal completion

*) echo "something" > file1 => From the logs you'll get I/O errors, from traces self heal starts again (without fixing nothing as everything is ok)

try again and again and you'll get I/O errors (if the caches aren't removed due to memory reclaim).


=== Flush inodes cache ===

*) echo 2 > /proc/sys/vm/drop_caches 

*) echo "something" > file1 => No I/O errors (OK!)



Note: this happens in release-3.0 and master (not tested release-2.0)

Comment 1 Anand Avati 2010-05-10 01:41:49 UTC
PATCH: http://patches.gluster.com/patch/3231 in master (Unset split-brain flags in afr_self_heal_completion_cbk if self heal completes successfully.)

Comment 2 Anand Avati 2010-06-01 04:24:13 UTC
PATCH: http://patches.gluster.com/patch/3352 in release-3.0 (Unset split-brain flags in afr_self_heal_completion_cbk if self heal completes successfully.)