Bug 1291538

Summary: GlusterFS is not durable against power outage
Product: [Community] GlusterFS Reporter: Stanislav German-Evtushenko <ginermail>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.7.12CC: amukherj, bugs, pkarampu, ravishankar
Target Milestone: ---Keywords: Reopened, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-06-22 10:30:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Stanislav German-Evtushenko 2015-12-15 05:41:37 UTC
Description of problem:
- GlusterFS is not durable against power outage. In case of power loss it always leads to split brain.

How reproducible:
- Always


Steps to Reproduce:
1. Create cluster: 3 nodes, 1 volume, 3 bricks, 3 replicas.
2. Mount the volume somewhere and start writing many files.
3. Switch power off for one if nodes (stop forcefully in case of virtual machine).
4. Start the node again.
5. Run: gluster volume heal volumename full
6. Check: gluster volume heal volumename info split-brain

Actual results:
- cat: /mnt/gluster-volume/filename.log: Input/output error

Expected results:
- If we have 3 replicas and 2 of them are identical (quorum) then it should be healed automatically and not leading to Input/output error.


Additional info:

- Check file size on all nodes:

$ ls -ln /data/glusterfs/volume/filename.log
gluster.1: -rw-r--r-- 2 106 114 9869 Dec 15 04:47 /data/glusterfs/volume/filename.log
gluster.2: -rw-r--r-- 2 106 114 10008 Dec 15 04:49 /data/glusterfs/volume/filename.log
gluster.3: -rw-r--r-- 2 106 114 10008 Dec 15 04:49 /data/glusterfs/volume/filename.log


- Get attr on all nodes:

$ getfattr -m . -d -e hex /data/glusterfs/volume/filename.log

gluster.1:  # file: /data/glusterfs/volume/filename.log
            trusted.afr.rpaas-client-21=0x000000000000000000000000
            trusted.afr.rpaas-client-22=0x000000000000000000000000
            trusted.afr.rpaas-client-23=0x000000000000000000000000
            trusted.gfid=0x21b7709eca5e481ab2b9e5d73e219b03
 
gluster.2:  # file: /data/glusterfs/volume/filename.log
            trusted.afr.rpaas-client-21=0x000000000000000000000000
            trusted.afr.rpaas-client-22=0x000000000000000000000000
            trusted.afr.rpaas-client-23=0x000000000000000000000000
            trusted.gfid=0x21b7709eca5e481ab2b9e5d73e219b03
 
gluster.3:  # file: /data/glusterfs/volume/filename.log
            trusted.afr.rpaas-client-21=0x000000000000000000000000
            trusted.afr.rpaas-client-22=0x000000000000000000000000
            trusted.afr.rpaas-client-23=0x000000000000000000000000
            trusted.gfid=0x21b7709eca5e481ab2b9e5d73e219b03

Comment 1 Stanislav German-Evtushenko 2016-02-12 01:56:52 UTC
This never happens if a host is rebooted (sudo reboot) and only happens when a host experiences sudden stop due to crash or power outage.

Comment 2 Niels de Vos 2016-06-17 15:57:15 UTC
This bug is getting closed because the 3.5 is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.

Comment 3 Pranith Kumar K 2016-06-22 10:13:09 UTC
We would like to confirm this is fixed in afr-v2. Seems to be fixed, but will close it after confirmation.

Comment 4 Ravishankar N 2016-06-22 10:30:04 UTC
In afr-v2 (which is available in glusterfs-3.6 onwards), if there are no pending afr xattrs on a file but there is a size mismatch, it will choose the bigger file as source and trigger heals instead of returning EIO. Hence closing this bug.