Bug 762616 (GLUSTER-884) - I/O errors after fixed split brain and successfully completed self heal
Summary: I/O errors after fixed split brain and successfully completed self heal
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-884
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.0.3
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Pavan Vilas Sondur
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-04 22:01 UTC by Simone Gotti
Modified: 2015-12-01 16:45 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: RTP
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Simone Gotti 2010-05-04 22:01:07 UTC
Hi,

I was testing a split brain recovery scenario with a simple replicate.

I noticed that after creating a split brain situation and fixing it by removing a file from one of the servers, the self heal started and completed, but on the next file accesses I was getting I/O errors as before.

Doing various experiments I noticed that it's probably related to inodes caching in fuse.

This looks confirmed because:

*) Unmounting and remounting the client the errors goes away.

*) Dropping the client vm caches the errors goes away (echo 2 > /proc/sys/vm/drop_caches or echo 3 > /proc/sys/vm/drop_caches to remove also pagecache).


Looking at the code looks like when there's a split brain situation the flag AFR_ICTX_SPLIT_BRAIN_MASK is setted on the inode ctx for this xlator but it's never removed after self heal was completed successfully.

I did a simple tests patch against master branch that fixed this bug, but I'm not expert with glusterfs internals so I can miss something on the patch behavior and consequences.


== How to reproduce ==

2 servers: server1 providing brick1, server2 providing brick2.
On the client 1 replicate volume on these 2 bricks. Nothing else to avoid other kinds of caching.

volume client1
  type protocol/client
  option transport-type tcp 
  option remote-host server1
  option transport.socket.remote-port 6996
  option remote-subvolume brick1
end-volume

volume client2
  type protocol/client
  option transport-type tcp
  option remote-host server2
  option transport.socket.remote-port 6996
  option remote-subvolume brick2
end-volume

volume replicate
  type cluster/replicate
  subvolumes client1 client2
end-volume


=== Create a split brain situation ===

*) start server1 and server2.

*) Create a file (echo "something" > file1)

*) Stop server1

*) echo "something" > file1

*) Start server1, Stop server2

*) echo "something" > file1

*) Start server2

*) echo "something" > file1 => you'll get I/O errors due to split brain.


=== Fix split brain removing the file on server1 ===

*) Stop server1

*) remove file1 on the server

*) Start server1


=== Test if split brain is fixed ===

*) echo "something" > file1 => From the logs you'll see that self heal is triggered and started.

Note: You'll get I/O errors as the first open happens before self heal completion, is this the wanted and right behavior?

*) Wait self heal completion

*) echo "something" > file1 => From the logs you'll get I/O errors, from traces self heal starts again (without fixing nothing as everything is ok)

try again and again and you'll get I/O errors (if the caches aren't removed due to memory reclaim).


=== Flush inodes cache ===

*) echo 2 > /proc/sys/vm/drop_caches 

*) echo "something" > file1 => No I/O errors (OK!)



Note: this happens in release-3.0 and master (not tested release-2.0)

Comment 1 Anand Avati 2010-05-10 01:41:49 UTC
PATCH: http://patches.gluster.com/patch/3231 in master (Unset split-brain flags in afr_self_heal_completion_cbk if self heal completes successfully.)

Comment 2 Anand Avati 2010-06-01 04:24:13 UTC
PATCH: http://patches.gluster.com/patch/3352 in release-3.0 (Unset split-brain flags in afr_self_heal_completion_cbk if self heal completes successfully.)


Note You need to log in before you can comment on or make changes to this bug.