Created attachment 575393 [details] Mount log Description of problem: On a Dis-rep volume, When I run bonnie or iozone on the cifs mount, the replicate bricks get into metadata splitbrain. Attached is the mount log Version-Release number of selected component (if applicable): 3.3.0 qa33 How reproducible: Always Steps to Reproduce: 1. create a dis-rep volume with 4 bricks 2. Configure Samba on the client machine: Install samba Add the following in the /etc/samba/smb.conf [dis-rep] comment = Samba config for replicate volume printable = no writable = yes path = /mnt/test guest ok = yes browseable = yes read only = no set the smb password for root user, using smbpasswd command 2. start the samba service : service smb start 3. On the client machine: do a fuse mount of the dis-rep volume on /mnt/test (path given in smb.conf file) Do a cifs mount using the following command: mount -t cifs client_IP:/dis-rep /mnt/cifs It will prompt for password, enter the one you set in step 2. 4. Run the tool bonnie on the cifs mount /mnt/cifs and check the fuse mount log (mnt-test.log) Actual results: brick1: [root@gqac001 ~]# getfattr -d -m . -e hex /home/bricks/test/b1 getfattr: Removing leading '/' from absolute path names # file: home/bricks/test/b1 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.test-client-0=0x000000000000000000000000 trusted.afr.test-client-1=0x000000000000000200000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000000000007ffffffe trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.size=0x0000001401319000 trusted.glusterfs.volume-id=0x9c6055b0dba94c5586e1a6008d9c6ef5 Brick2 (replicate); [root@gqac002 ~]# getfattr -d -m . -e hex /home/bricks/test/b1 getfattr: Removing leading '/' from absolute path names # file: home/bricks/test/b1 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.test-client-0=0x000000000000000200000000 trusted.afr.test-client-1=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000000000007ffffffe trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.size=0x0000001601309000 trusted.glusterfs.volume-id=0x9c6055b0dba94c5586e1a6008d9c6ef5 Brick3: [root@gqac001 ~]# getfattr -d -m . -e hex /home/bricks/test/b2 getfattr: Removing leading '/' from absolute path names # file: home/bricks/test/b2 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.test-client-2=0x000000000000000000000000 trusted.afr.test-client-3=0x000000000000000200000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000007fffffffffffffff trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.size=0x00000012d2ebf000 trusted.glusterfs.volume-id=0x9c6055b0dba94c5586e1a6008d9c6ef5 Brick4: (replicate) [root@gqac002 ~]# getfattr -d -m . -e hex /home/bricks/test/b2 getfattr: Removing leading '/' from absolute path names # file: home/bricks/test/b2 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.test-client-2=0x000000000000000200000000 trusted.afr.test-client-3=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000007fffffffffffffff trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.size=0x00000012d2ebf000 trusted.glusterfs.volume-id=0x9c6055b0dba94c5586e1a6008d9c6ef5 Expected results: Additional info: Mount Log: [2012-04-05 17:21:15.172337] I [afr-common.c:1329:afr_launch_self_heal] 0-test-replicate-1: background meta-data self-heal triggered. path: /, reason: lookup detected pending operations [2012-04-05 17:21:15.172941] I [afr-common.c:1329:afr_launch_self_heal] 0-test-replicate-0: background meta-data self-heal triggered. path: /, reason: lookup detected pending operations [2012-04-05 17:21:15.173227] E [afr-self-heal-metadata.c:490:afr_sh_metadata_fix] 0-test-replicate-1: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2012-04-05 17:21:15.173548] I [afr-self-heal-metadata.c:65:afr_sh_metadata_done] 0-test-replicate-1: split-brain detected, aborting selfheal of / [2012-04-05 17:21:15.173568] E [afr-self-heal-common.c:2042:afr_self_heal_completion_cbk] 0-test-replicate-1: background meta-data self-heal failed on / [2012-04-05 17:21:15.177423] E [afr-self-heal-metadata.c:490:afr_sh_metadata_fix] 0-test-replicate-0: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2012-04-05 17:21:15.184934] I [afr-self-heal-metadata.c:65:afr_sh_metadata_done] 0-test-replicate-0: split-brain detected, aborting selfheal of / [2012-04-05 17:21:15.184956] E [afr-self-heal-common.c:2042:afr_self_heal_completion_cbk] 0-test-replicate-0: background meta-data self-heal failed on / [2012-04-05 17:21:16.218548] I [afr-common.c:1329:afr_launch_self_heal] 0-test-replicate-1: background meta-data self-heal triggered. path: /, reason: lookup detected pending operations [2012-04-05 17:21:16.219266] I [afr-common.c:1329:afr_launch_self_heal] 0-test-replicate-0: background meta-data self-heal triggered. path: /, reason: lookup detected pending operations [2012-04-05 17:21:16.219500] E [afr-self-heal-metadata.c:490:afr_sh_metadata_fix] 0-test-replicate-1: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2012-04-05 17:21:16.219942] I [afr-self-heal-metadata.c:65:afr_sh_metadata_done] 0-test-replicate-1: split-brain detected, aborting selfheal of / [2012-04-05 17:21:16.219963] E [afr-self-heal-common.c:2042:afr_self_heal_completion_cbk] 0-test-replicate-1: background meta-data self-heal failed on / [2012-04-05 17:21:16.235920] E [afr-self-heal-metadata.c:490:afr_sh_metadata_fix] 0-test-replicate-0: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2012-04-05 17:21:16.237513] I [afr-self-heal-metadata.c:65:afr_sh_metadata_done] 0-test-replicate-0: split-brain detected, aborting selfheal of / [2012-04-05 17:21:16.237554] E [afr-self-heal-common.c:2042:afr_self_heal_completion_cbk] 0-test-replicate-0: background meta-data self-heal failed on /
Please verify the following... 1) The permissions & owner & group of the brick directories (/home/bricks/test/b*) are all equal everywhere. 2) If there is a lost+found subdirectory in the brick directory please delete it. (This folder is created by mkfs when formatting a new ext3/4 filesystem and can be safely removed. It will be automatically recreated by e2fsck if/when needed.) After ensuring both of the above items please try the test again & report results here.
Tested it on qa38 build and it is working fine. I don't see the any split brain.