Bug 810233 - Metadata split brain while running tests on cifs mount
Summary: Metadata split brain while running tests on cifs mount
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: pre-2.0
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-04-05 12:38 UTC by Ujjwala
Modified: 2015-12-01 16:45 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-05-04 12:27:16 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Mount log (1.20 MB, application/x-gzip)
2012-04-05 12:38 UTC, Ujjwala
no flags Details

Description Ujjwala 2012-04-05 12:38:51 UTC
Created attachment 575393 [details]
Mount log

Description of problem:
On a Dis-rep volume, When I run bonnie or iozone on the cifs mount,
the replicate bricks get into metadata splitbrain.

Attached is the mount log

Version-Release number of selected component (if applicable):
3.3.0 qa33


How reproducible:
Always


Steps to Reproduce:
1. create a dis-rep volume with 4 bricks
2. Configure Samba on the client machine:
Install samba
Add the following in the /etc/samba/smb.conf
[dis-rep]
comment = Samba config for replicate volume
printable = no
writable = yes
path = /mnt/test
guest ok = yes
browseable = yes
read only = no

set the smb password for root user, using smbpasswd command

2. start the samba service : service smb start
3. On the client machine: 
do a fuse mount of the dis-rep volume on /mnt/test (path given in smb.conf file)
Do a cifs mount using the following command:
mount -t cifs client_IP:/dis-rep /mnt/cifs
It will prompt for password, enter the one you set in step 2.

4. Run the tool bonnie on the cifs mount /mnt/cifs and check the fuse mount log (mnt-test.log)
  
Actual results:
brick1:
[root@gqac001 ~]# getfattr -d -m . -e hex /home/bricks/test/b1
getfattr: Removing leading '/' from absolute path names
# file: home/bricks/test/b1
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.test-client-0=0x000000000000000000000000
trusted.afr.test-client-1=0x000000000000000200000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000000000007ffffffe
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size=0x0000001401319000
trusted.glusterfs.volume-id=0x9c6055b0dba94c5586e1a6008d9c6ef5

Brick2 (replicate);
[root@gqac002 ~]# getfattr -d -m . -e hex /home/bricks/test/b1
getfattr: Removing leading '/' from absolute path names
# file: home/bricks/test/b1
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.test-client-0=0x000000000000000200000000
trusted.afr.test-client-1=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000000000007ffffffe
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size=0x0000001601309000
trusted.glusterfs.volume-id=0x9c6055b0dba94c5586e1a6008d9c6ef5

Brick3:
[root@gqac001 ~]# getfattr -d -m . -e hex /home/bricks/test/b2
getfattr: Removing leading '/' from absolute path names
# file: home/bricks/test/b2
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.test-client-2=0x000000000000000000000000
trusted.afr.test-client-3=0x000000000000000200000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x00000001000000007fffffffffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size=0x00000012d2ebf000
trusted.glusterfs.volume-id=0x9c6055b0dba94c5586e1a6008d9c6ef5

Brick4: (replicate)
[root@gqac002 ~]# getfattr -d -m . -e hex /home/bricks/test/b2
getfattr: Removing leading '/' from absolute path names
# file: home/bricks/test/b2
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.test-client-2=0x000000000000000200000000
trusted.afr.test-client-3=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x00000001000000007fffffffffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size=0x00000012d2ebf000
trusted.glusterfs.volume-id=0x9c6055b0dba94c5586e1a6008d9c6ef5



Expected results:


Additional info:
Mount Log:
[2012-04-05 17:21:15.172337] I [afr-common.c:1329:afr_launch_self_heal] 0-test-replicate-1: background  meta-data self-heal triggered. path: /, reason: lookup detected pending operations
[2012-04-05 17:21:15.172941] I [afr-common.c:1329:afr_launch_self_heal] 0-test-replicate-0: background  meta-data self-heal triggered. path: /, reason: lookup detected pending operations
[2012-04-05 17:21:15.173227] E [afr-self-heal-metadata.c:490:afr_sh_metadata_fix] 0-test-replicate-1: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
[2012-04-05 17:21:15.173548] I [afr-self-heal-metadata.c:65:afr_sh_metadata_done] 0-test-replicate-1: split-brain detected, aborting selfheal of /
[2012-04-05 17:21:15.173568] E [afr-self-heal-common.c:2042:afr_self_heal_completion_cbk] 0-test-replicate-1: background  meta-data self-heal failed on /
[2012-04-05 17:21:15.177423] E [afr-self-heal-metadata.c:490:afr_sh_metadata_fix] 0-test-replicate-0: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
[2012-04-05 17:21:15.184934] I [afr-self-heal-metadata.c:65:afr_sh_metadata_done] 0-test-replicate-0: split-brain detected, aborting selfheal of /
[2012-04-05 17:21:15.184956] E [afr-self-heal-common.c:2042:afr_self_heal_completion_cbk] 0-test-replicate-0: background  meta-data self-heal failed on /
[2012-04-05 17:21:16.218548] I [afr-common.c:1329:afr_launch_self_heal] 0-test-replicate-1: background  meta-data self-heal triggered. path: /, reason: lookup detected pending operations
[2012-04-05 17:21:16.219266] I [afr-common.c:1329:afr_launch_self_heal] 0-test-replicate-0: background  meta-data self-heal triggered. path: /, reason: lookup detected pending operations
[2012-04-05 17:21:16.219500] E [afr-self-heal-metadata.c:490:afr_sh_metadata_fix] 0-test-replicate-1: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
[2012-04-05 17:21:16.219942] I [afr-self-heal-metadata.c:65:afr_sh_metadata_done] 0-test-replicate-1: split-brain detected, aborting selfheal of /
[2012-04-05 17:21:16.219963] E [afr-self-heal-common.c:2042:afr_self_heal_completion_cbk] 0-test-replicate-1: background  meta-data self-heal failed on /
[2012-04-05 17:21:16.235920] E [afr-self-heal-metadata.c:490:afr_sh_metadata_fix] 0-test-replicate-0: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
[2012-04-05 17:21:16.237513] I [afr-self-heal-metadata.c:65:afr_sh_metadata_done] 0-test-replicate-0: split-brain detected, aborting selfheal of /
[2012-04-05 17:21:16.237554] E [afr-self-heal-common.c:2042:afr_self_heal_completion_cbk] 0-test-replicate-0: background  meta-data self-heal failed on /

Comment 1 Louis Zuckerman 2012-04-05 12:49:46 UTC
Please verify the following...

1) The permissions & owner & group of the brick directories
(/home/bricks/test/b*) are all equal everywhere.

2) If there is a lost+found subdirectory in the brick directory please delete
it.
(This folder is created by mkfs when formatting a new ext3/4 filesystem and can
be safely removed.  It will be automatically recreated by e2fsck if/when
needed.)


After ensuring both of the above items please try the test again & report
results here.

Comment 2 Ujjwala 2012-04-25 08:26:32 UTC
Tested it on qa38 build and it is working fine. I don't see the any split brain.


Note You need to log in before you can comment on or make changes to this bug.