Bug 1115748
Summary: | Bricks are unsync after recevery even if heal says everything is fine | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Miloš Kozák <milos.kozak> | ||||||
Component: | replicate | Assignee: | Ravishankar N <ravishankar> | ||||||
Status: | CLOSED DUPLICATE | QA Contact: | |||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 3.5.1 | CC: | gluster-bugs, pkarampu, ravishankar | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2014-07-22 05:14:56 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Miloš Kozák
2014-07-03 03:36:08 UTC
Thanks for the bug report Milos. I think the attachments were missed out. Could you please upload them? Also some questions if you don't mind: 1."gluster annonces that everything is healed"-->So 'gluster volume heal volname info' now shows zero entries (as opposed to the 'possibly undergoing heal' you previously noticed in the mails)? 2.Could you also check the extended attributes of the file on the bricks; you can paste the output of the below command on both bricks. getfattr -d -m . -e hex /dist1/brick/fs/<dd_file_name> Created attachment 914444 [details]
Logs
Hi, 1. You are right, it now does not indicate the 'possibly undergoing healing' after a while. Right after VMs are reconnected it indicates that for sure. In other words it actually seems, that everything is OK, but data are on disks inconsistent.. This situaction suprised me yesterday when I did these tests quite a lot. The previous day it was according to my email. Only change I can see is that I cleaned up everything and rebooted systems.. 2. [root@node1 ~]# getfattr -d -m . -e hex /dist1/brick/fs/break getfattr: Removing leading '/' from absolute path names # file: dist1/brick/fs/break trusted.afr.vg0-client-0=0x000000000000000000000000 trusted.afr.vg0-client-1=0x000000000000000000000000 trusted.gfid=0x051ca514aeb54942ae15ffa05de59a82 [root@node2 ~]# getfattr -d -m . -e hex /dist1/brick/fs/break getfattr: Removing leading '/' from absolute path names # file: dist1/brick/fs/break trusted.afr.vg0-client-0=0x000000000000000000000000 trusted.afr.vg0-client-1=0x000000000000000000000000 trusted.gfid=0x051ca514aeb54942ae15ffa05de59a82 hi Milos, How are you confirming that the data on disks is in-consistent? Pranith Hi, first I noticed by df command, that made me discover a bit more. ls -l shows proper length, but du does not. On the behalf of your question I calculated md5: [root@node1 ~]# md5sum /dist1/brick/fs/* d8b61b2c0025919d5321461045c8226f /dist1/brick/fs/break d8b61b2c0025919d5321461045c8226f /dist1/brick/fs/node1 d8b61b2c0025919d5321461045c8226f /dist1/brick/fs/node2 [root@node1 ~]# ls -l /dist1/brick/fs/* -rw-r--r-- 2 root root 524288000 Jul 3 05:16 /dist1/brick/fs/break -rw-r--r-- 2 root root 524288000 Jul 3 05:10 /dist1/brick/fs/node1 -rw-r--r-- 2 root root 524288000 Jul 3 05:10 /dist1/brick/fs/node2 [root@node1 ~]# du /dist1/brick/fs/* 116352 /dist1/brick/fs/break 512000 /dist1/brick/fs/node1 512000 /dist1/brick/fs/node2 [root@node2 ~]# md5sum /dist1/brick/fs/ break .glusterfs/ node1 node2 [root@node2 ~]# md5sum /dist1/brick/fs/* d8b61b2c0025919d5321461045c8226f /dist1/brick/fs/break d8b61b2c0025919d5321461045c8226f /dist1/brick/fs/node1 d8b61b2c0025919d5321461045c8226f /dist1/brick/fs/node2 [root@node2 ~]# ls -l /dist1/brick/fs/* -rw-r--r-- 2 root root 524288000 Jul 3 05:16 /dist1/brick/fs/break -rw-r--r-- 2 root root 524288000 Jul 3 05:10 /dist1/brick/fs/node1 -rw-r--r-- 2 root root 524288000 Jul 3 05:10 /dist1/brick/fs/node2 [root@node2 ~]# du /dist1/brick/fs/* 512000 /dist1/brick/fs/break 512000 /dist1/brick/fs/node1 512000 /dist1/brick/fs/node2 Oh snap.. how can be md5 same when `du` does not fit? Do you think that it is only du (synthetic test) related problem? Now you are talking, I think this is sparse file, i.e. file with holes. So basically when a file with holes is healed it is not healing the holes properly. Detailed steps to re-create the issue, mainly the kind of images(qcow2?) you used for creating the images etc would help. Should I test it with fallocate or do you have any other suggestion how to allocate not sparse file? There is no harm done for data consistency. Because of this, we are using extra disk space because self-heal is not retaining the hole. All I wanted to know is, how did you create the VM file? The VM files were created using truncate -S XXG command. The same way were created files for data that are mounted as /dist1/brick (/dev/sdb) Created attachment 914626 [details]
another test
Test with regular files not sparse ones
I am not sure if I got your last comment properly, but how can container of VM influence the healing process? Anyway I could reproduce this error with ssh copy. Procedure is basically the same as described originally but the difference is the healing stucks as mentioned in email forum. I enclosed logs are dumps as well Doesn't look like a sparse file issue- during my tests, a sparse file created on node 1 (from mount point) when node 2 was down was successfully healed to the latter. Not much of info on test1 and test2 logs. The glustershd log in test2:node1 actually indicate successful heal. Some observations from test3 logs: 1.prints.txt shows pending counts for AFR extended attributes of both files. The file named 'node2' with gfid=0x04f688ebda444739bc2e932000dab387 is the one which shows 'possibly under heal' 2.Both nodes' glustershd logs show this repeatedly: I [afr-self-heald.c:1687:afr_dir_exclusive_crawl] 0-vg0-replicate-0: Another crawl is in progress for vg0-client-0 3.brick log contains muliple warnings: [2014-07-03 23:30:19.816435] W [inodelk.c:392:pl_inodelk_log_cleanup] 0-vg0-server: releasing lock on 04f688eb-da44-4739-bc2e-932000dab387 held by {client=0xd00180, pid=-1 lk-owner=08449eb0ec7f0000} Hi, thank you for insight. However, I am not sure if I can deduct anything out of it? Does it mean this is different bug? What makes me wonder is you write that according to the logs the file was healed, but gluster volume heal indicates otherwise.. Do you want me to generate another tests? The bug seems to be what you described in the mail, i.e the heal is not able to proceed. Only test2 logs show heal completed. test3 logs don't. Can you generate statedumps when the heal happens to be hung? I see from the command history logs that you have run the statedump command but here's a quick howto anyway: 1.mkdir -p /var/run/gluster (if dir does not exist) 2.gluster volume statedump <volume name> (gives statedump of bricks) 3.`kill -USR1 <pid of self-heal daemon process)` Do steps 2 and 3 twice, with an interval of say 1 minute in between. Do this when the heal info shows "possibly undergoing...." and attach the dumps that are created. Also, an strace of the self-heal daemon would be helpful (you had provided strace of the brick process earlier) as that is the one that does self-heal. Milos, never mind the state dump, I reproduced the issue on 3.5.1 release and it happens that the fix http://review.gluster.org/#/c/8187/ (BZ# 1113894) has recently been merged in the 3.5 branch. It should be available in the 3.5.2 release. With this fix, the heal does not hang anymore. Test procedure:Same as in bug description, except I used iptable rules to block traffic between the 2 nodes. After I unblocking traffic, heal info shows "possibly undergoing heal". If you can test with the fix and confirm, I'll close the bug. Thanks! Closing the bug as it is a manifestation of the same issue described in BZ 1113894. Note: How I tested: 1.Create 1x2 replica on 2 nodes, fuse mount it on node 1. 2.create file using dd from fuse mount on node 1. 3.On node 1,while dd is going on,block all traffic from/to node 2: iptables -A INPUT -p all -s <node 2's IP> -j REJECT iptables -A OUTPUT -p all -d <node 2's IP> -j REJECT 4.stop dd process 5.unblock traffic: iptables -F 6.run `gluster v heal <volname> info` It should not give 'possibly undergoing heal' *** This bug has been marked as a duplicate of bug 1113894 *** |