Description of problem: When a file/dir has been created on host A while host B was down, bringing up B (with proactive full self-healing on) replicates the object, but not its contents. Version-Release number of selected component (if applicable): 3.3.1 (F18 3.3.1-10) How reproducible: Always Steps to Reproduce: 1. Create a glusterfs volume on two empty directories: gv0 replica 2 A:/replica B:/replica 2. Stop daemon on host B 3. On host A: mount -t glusterfs A:gv0 /mnt 4. On host A: mkdir /mnt/dir 5. On host A: echo test /mnt/dir/file 6. On Host B: start daemon, do not mount volume 7. On host B: sleep 10; ls /replica --> dir (OK) 8. On host B: ls /replica/dir --> empty (KO) 9. On host B: gluster volume gv0 heal; sleep 10; ls /replica/dir --> file (OK) 10. On host B: ls -l /replica/dir --> -rw-r--r-- 2 root root 0 Mar 4 15:15 file --> (KO: file is empty) 11: On host B: gluster volume gv0 heal; sleep 10; ls -l /replica/dir --> -rw-r--r-- 2 root root 5 Mar 4 15:16 file --> (OK) Actual results: See above Expected results: Initial self-healing properly replicates all objects and contents recursively, including newly replicated (sub)dirs/files. Additional info: - This problem is important increases the "chances" to lose data since the later are not replicated as soon as the host goes up, as expected from a replication scheme. This is particularly true if host B does not "work" with the new data for a long time after power-up. - In the above "Steps to reproduce", waiting longer between actions does not alter the results. - The above example shows that each consecutive healing request replicates a single level of new contents. Thanks for investigating.
the same as bug: https://bugzilla.redhat.com/show_bug.cgi?id=852741 ?
I don't think so: 852741 is about self-healing not starting. This one assumes self-healing starts, updates changed objects and create new ones, but does not process contents of new objects.
hi Patrick, There are two types of self-heals done by self-heal-daemon. One is called full crawl which is similar to find command execution on the mount point. Other one is healing only the files that need healing. Basically the bricks remember all the files/dirs that need self-heal but there is no way to figure out in which order the heal needs to happen. If one creates a dir 'd' and a file 'f' inside 'd' with some data in it. For healing this correctly self-heal on 'd' must be attempted before self-heal on 'f'. But if self-heal on 'f' is attempted before 'd' only 'd' is self-healed and the file 'f' is scheduled for healing after 10 minutes. After which you should see that the file f is also healed properly. So in general if the directory depth is x we need atleast x+1 attempts of self-heal by self-heal daemon to completely heal the data. These attempts will be done automatically. If you want to trigger the heal just once and want to make sure the data is healed completely please execute 'volume heal <volname> full' manually. We are going to document this behavior and close the bug for now. Please feel free to ask if you need more information. Thanks for logging the bug.
Understood, thanks for the explanation. But this means that, if not a bug, this "feature" is at least a caveat and leaves glusterfs not realy usable for RAID-like replication, I'm afraid. I tried to look through the code to find a solution, but it's too "unsequential" to be completely understood without some documentation/explanation. But I wonder if it should be possible to automatically reschedule an "immediate" (i.e.: before 10 minutes) self-healing if an object cannot be healed (for the underlying reason) during a pass? Or if the unhealed objects could be pushed for later processing?
BTW, would it be possible to have a config. param. telling to schedule a full crawl at daemon start-up? This can be a new feature request, actually :-)
Patrick, full crawl is extremely expensive. So instead we are coming up with algorithms to decrease the number of crawls as in https://bugzilla.redhat.com/show_bug.cgi?id=969384. Let me know if that would suffice? Pranith
This seems already better ! In addition, would it be possible to, as proposed in comment 4, flag an "incomplete" crawl to schedule the next one immediately after completion (i.e.: without waiting 10 minutes)? I think such a solution might be satisfactory. Thanks for your support.
That not waiting for 10 minutes thing is already present on upstream and should be released in 3.6. if we don't find any issues.
Thanks a lot :-)
The version that this bug has been reported against, does not get any updates from the Gluster Community anymore. Please verify if this report is still valid against a current (3.4, 3.5 or 3.6) release and update the version, or close this bug. If there has been no update before 9 December 2014, this bug will get automatocally closed.