Bug 763664 (GLUSTER-1932)

Summary: Self healing did not happen
Product: [Community] GlusterFS Reporter: Jacob Shucart <jacob>
Component: coreAssignee: Amar Tumballi <amarts>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: urgent    
Version: 3.1-betaCC: gluster-bugs, platform, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Windows   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Jacob Shucart 2010-10-12 15:37:44 UTC
Just an interesting extra piece of information, I just went to the Gluster mount on the client system and did an "rm" to clean up the directory.  On the GSP side, I noticed that diskA is now empty, diskB has all of the data in it, and the client-side mount is now hung.

Comment 1 Jacob Shucart 2010-10-12 15:51:14 UTC
I unmounted and remounted the volume using NFS.  I started up my script again.  Data now goes to:

diskA
diskC
diskD

Data should be mirrored between diskA and diskB, but no data gets written to diskB(it still does to diskA).

Now when I go in to Platform to look at my volume, it shows up as "red" and when I try to open it, I get a timeout.

I reboot the GSP, remount it to the client system, and now everything seems to work as expected.

Comment 2 Jacob Shucart 2010-10-12 18:35:53 UTC
Servers in a mirror pair have different data following dynamic expansion of volume.  Self healing did not occur as expected.

1. Created a VM with 4 hard disks.
2. Created a Gluster volume in distributed mirror setup with 2 hard disks.
3. Started a script that creates 100MB files for testing on a client system that was mounting the Gluster volume using Gluster native NFS.
4. While script was running, I then went to add the other 2 servers to the distribute+mirror setup, and hit update.
5. The NFS mount point was hung and did not resume.

I then looked at the filesystem in GSP and saw:

diskA: -rw-r--r-- 1 root root 100M Oct 12 18:16 test4
diskB: -rw-r--r-- 1 root root  71M Oct 12 18:15 test4

Their extended attributes appear the same:

[root@jacobgsp310 sys2]# attr -l test4
Attribute "gfid" has a 16 byte value for test4
Attribute "afr.jacobtest1-client-0" has a 12 byte value for test4
Attribute "afr.jacobtest1-client-1" has a 12 byte value for test4

[root@jacobgsp310 sys1]# attr -l test4
Attribute "gfid" has a 16 byte value for test4
Attribute "afr.jacobtest1-client-0" has a 12 byte value for test4
Attribute "afr.jacobtest1-client-1" has a 12 byte value for test4

I would have thought that looking at the file through the Gluster mount point would have caused a healing to fix the file, but it did not.  Is there any other information that you need from me?

Comment 3 Balamurugan Arumugam 2010-10-27 09:10:15 UTC
Happened only in interim QA build, has not happened with 3.1.0 GA. Need to be marked as resolved.