Bug 1291538 - GlusterFS is not durable against power outage
GlusterFS is not durable against power outage
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: replicate (Show other bugs)
3.7.12
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Pranith Kumar K
: Reopened, Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-12-15 00:41 EST by Stanislav German-Evtushenko
Modified: 2016-06-22 06:30 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-06-22 06:30:04 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Stanislav German-Evtushenko 2015-12-15 00:41:37 EST
Description of problem:
- GlusterFS is not durable against power outage. In case of power loss it always leads to split brain.

How reproducible:
- Always


Steps to Reproduce:
1. Create cluster: 3 nodes, 1 volume, 3 bricks, 3 replicas.
2. Mount the volume somewhere and start writing many files.
3. Switch power off for one if nodes (stop forcefully in case of virtual machine).
4. Start the node again.
5. Run: gluster volume heal volumename full
6. Check: gluster volume heal volumename info split-brain

Actual results:
- cat: /mnt/gluster-volume/filename.log: Input/output error

Expected results:
- If we have 3 replicas and 2 of them are identical (quorum) then it should be healed automatically and not leading to Input/output error.


Additional info:

- Check file size on all nodes:

$ ls -ln /data/glusterfs/volume/filename.log
gluster.1: -rw-r--r-- 2 106 114 9869 Dec 15 04:47 /data/glusterfs/volume/filename.log
gluster.2: -rw-r--r-- 2 106 114 10008 Dec 15 04:49 /data/glusterfs/volume/filename.log
gluster.3: -rw-r--r-- 2 106 114 10008 Dec 15 04:49 /data/glusterfs/volume/filename.log


- Get attr on all nodes:

$ getfattr -m . -d -e hex /data/glusterfs/volume/filename.log

gluster.1:  # file: /data/glusterfs/volume/filename.log
            trusted.afr.rpaas-client-21=0x000000000000000000000000
            trusted.afr.rpaas-client-22=0x000000000000000000000000
            trusted.afr.rpaas-client-23=0x000000000000000000000000
            trusted.gfid=0x21b7709eca5e481ab2b9e5d73e219b03
 
gluster.2:  # file: /data/glusterfs/volume/filename.log
            trusted.afr.rpaas-client-21=0x000000000000000000000000
            trusted.afr.rpaas-client-22=0x000000000000000000000000
            trusted.afr.rpaas-client-23=0x000000000000000000000000
            trusted.gfid=0x21b7709eca5e481ab2b9e5d73e219b03
 
gluster.3:  # file: /data/glusterfs/volume/filename.log
            trusted.afr.rpaas-client-21=0x000000000000000000000000
            trusted.afr.rpaas-client-22=0x000000000000000000000000
            trusted.afr.rpaas-client-23=0x000000000000000000000000
            trusted.gfid=0x21b7709eca5e481ab2b9e5d73e219b03
Comment 1 Stanislav German-Evtushenko 2016-02-11 20:56:52 EST
This never happens if a host is rebooted (sudo reboot) and only happens when a host experiences sudden stop due to crash or power outage.
Comment 2 Niels de Vos 2016-06-17 11:57:15 EDT
This bug is getting closed because the 3.5 is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.
Comment 3 Pranith Kumar K 2016-06-22 06:13:09 EDT
We would like to confirm this is fixed in afr-v2. Seems to be fixed, but will close it after confirmation.
Comment 4 Ravishankar N 2016-06-22 06:30:04 EDT
In afr-v2 (which is available in glusterfs-3.6 onwards), if there are no pending afr xattrs on a file but there is a size mismatch, it will choose the bigger file as source and trigger heals instead of returning EIO. Hence closing this bug.

Note You need to log in before you can comment on or make changes to this bug.