Bug 1291538 - GlusterFS is not durable against power outage
Summary: GlusterFS is not durable against power outage
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.7.12
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-12-15 05:41 UTC by Stanislav German-Evtushenko
Modified: 2016-06-22 10:30 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-06-22 10:30:04 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Stanislav German-Evtushenko 2015-12-15 05:41:37 UTC
Description of problem:
- GlusterFS is not durable against power outage. In case of power loss it always leads to split brain.

How reproducible:
- Always


Steps to Reproduce:
1. Create cluster: 3 nodes, 1 volume, 3 bricks, 3 replicas.
2. Mount the volume somewhere and start writing many files.
3. Switch power off for one if nodes (stop forcefully in case of virtual machine).
4. Start the node again.
5. Run: gluster volume heal volumename full
6. Check: gluster volume heal volumename info split-brain

Actual results:
- cat: /mnt/gluster-volume/filename.log: Input/output error

Expected results:
- If we have 3 replicas and 2 of them are identical (quorum) then it should be healed automatically and not leading to Input/output error.


Additional info:

- Check file size on all nodes:

$ ls -ln /data/glusterfs/volume/filename.log
gluster.1: -rw-r--r-- 2 106 114 9869 Dec 15 04:47 /data/glusterfs/volume/filename.log
gluster.2: -rw-r--r-- 2 106 114 10008 Dec 15 04:49 /data/glusterfs/volume/filename.log
gluster.3: -rw-r--r-- 2 106 114 10008 Dec 15 04:49 /data/glusterfs/volume/filename.log


- Get attr on all nodes:

$ getfattr -m . -d -e hex /data/glusterfs/volume/filename.log

gluster.1:  # file: /data/glusterfs/volume/filename.log
            trusted.afr.rpaas-client-21=0x000000000000000000000000
            trusted.afr.rpaas-client-22=0x000000000000000000000000
            trusted.afr.rpaas-client-23=0x000000000000000000000000
            trusted.gfid=0x21b7709eca5e481ab2b9e5d73e219b03
 
gluster.2:  # file: /data/glusterfs/volume/filename.log
            trusted.afr.rpaas-client-21=0x000000000000000000000000
            trusted.afr.rpaas-client-22=0x000000000000000000000000
            trusted.afr.rpaas-client-23=0x000000000000000000000000
            trusted.gfid=0x21b7709eca5e481ab2b9e5d73e219b03
 
gluster.3:  # file: /data/glusterfs/volume/filename.log
            trusted.afr.rpaas-client-21=0x000000000000000000000000
            trusted.afr.rpaas-client-22=0x000000000000000000000000
            trusted.afr.rpaas-client-23=0x000000000000000000000000
            trusted.gfid=0x21b7709eca5e481ab2b9e5d73e219b03

Comment 1 Stanislav German-Evtushenko 2016-02-12 01:56:52 UTC
This never happens if a host is rebooted (sudo reboot) and only happens when a host experiences sudden stop due to crash or power outage.

Comment 2 Niels de Vos 2016-06-17 15:57:15 UTC
This bug is getting closed because the 3.5 is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.

Comment 3 Pranith Kumar K 2016-06-22 10:13:09 UTC
We would like to confirm this is fixed in afr-v2. Seems to be fixed, but will close it after confirmation.

Comment 4 Ravishankar N 2016-06-22 10:30:04 UTC
In afr-v2 (which is available in glusterfs-3.6 onwards), if there are no pending afr xattrs on a file but there is a size mismatch, it will choose the bigger file as source and trigger heals instead of returning EIO. Hence closing this bug.


Note You need to log in before you can comment on or make changes to this bug.