Bug 802030
Summary: | self-heal is not triggered by glustershd even when the all the bricks are online | ||||||
---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Shwetha Panduranga <shwetha.h.panduranga> | ||||
Component: | replicate | Assignee: | Pranith Kumar K <pkarampu> | ||||
Status: | CLOSED CANTFIX | QA Contact: | |||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | mainline | CC: | gluster-bugs | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2012-04-01 06:23:45 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Shwetha reproduced the bug, the self-heal is completed successfully and the xattrs are removed. Even the md5sum matches on both the files but the source file has du 49G whereas the sink has du 6.5G. ls -l shows 49G on both source and sink. This issue happens because self-heal does an equivalent of 'truncate -s 49G' on the sink file and then does the diff self-heal. Since afr could not differentiate between the hole on sink and all '0' data on source, it does not do any writes to the sink file so at the end of the self-heal the sink file's du is still 6.5G no extra data is synced from source to sink. There is no way to find out if the file is a hole or 0 filled. We can't fix this issue. Since it does not affect the functionality it should be fine. |
Created attachment 569079 [details] glustershd log glustershd reports: "no active sinks for performing self-heal on file" even when all the bricks are online. Hence self-heal is not triggered. Version-Release number of selected component (if applicable): 3.3.0qa26 How reproducible: often Steps to Reproduce: 1.create a distribute-replicate volume (2 x 3, 200GB total available space). Start the volume 2.Create 2 gluster mounts and one nfs mount 3.Start dd on one of the gluster mount and nfs mount 4.dd if=/dev/zero of=gfsf1 bs=1M count=51200 on gluster mount 5.dd if=/dev/zero of=nfsf1 bs=1M count=51200 on nfs mount 6.enable quota on the volume 7.set the quota limit-usage to some value (The quota limit-usage was set to 150GB.) 8.Bring down 2 bricks (one from each replicate pair) 9.dd still in progress on both mounts. 10.Bring back the bricks. Actual results: ------------------ glustershd didn't trigger self-heal on files and after some amount of time(approximately 30 minutes)the xattrs of the file become all zeros .But the file size differs on the bricks. [03/10/12 - 21:58:57 root@APP-SERVER1 ~]# gluster volume start dstore1 force Starting volume dstore1 has been successful [03/10/12 - 22:00:30 root@APP-SERVER1 ~]# gluster volume status dstore1 Status of volume: dstore1 Brick Port Online Pid ------------------------------------------------------------------------------ 192.168.2.35:/export1 24009 Y 8806 ------------------------------------------------------------------------------ 192.168.2.36:/export1 24009 Y 23655 ------------------------------------------------------------------------------ 192.168.2.37:/export1 24009 Y 13195 ------------------------------------------------------------------------------ 192.168.2.35:/export2 24010 Y 8811 ------------------------------------------------------------------------------ 192.168.2.36:/export2 24010 Y 23660 ------------------------------------------------------------------------------ 192.168.2.37:/export2 24010 Y 13200 Brick1:- ------- # file: export1/nfsf1 trusted.afr.dstore1-client-0=0x000000000000000000000000 trusted.afr.dstore1-client-1=0x000000000000000000000000 trusted.afr.dstore1-client-2=0x000000000000000000000000 trusted.gfid=0x0e53a4d9c45340f0b9bd085c70a6af41 trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x000000019b411000 Brick2:- ------ # file: export1/nfsf1 trusted.afr.dstore1-client-0=0x000569e30000000000000000 trusted.afr.dstore1-client-1=0x000000000000000000000000 trusted.afr.dstore1-client-2=0x000000000000000000000000 trusted.gfid=0x0e53a4d9c45340f0b9bd085c70a6af41 trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x0000000c16982000 Brick3:- ----- # file: export1/nfsf1 trusted.afr.dstore1-client-0=0x000569e30000000000000000 trusted.afr.dstore1-client-1=0x000000000000000000000000 trusted.afr.dstore1-client-2=0x000000000000000000000000 trusted.gfid=0x0e53a4d9c45340f0b9bd085c70a6af41 trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x0000000c16982000 [03/10/12 - 22:54:21 root@APP-SERVER3 ~]# du -h /export1/nfsf1 49G /export1/nfsf1 [03/10/12 - 22:55:11 root@APP-SERVER3 ~]# getfattr -m . -d -e hex /export1/nfsf1 getfattr: Removing leading '/' from absolute path names # file: export1/nfsf1 trusted.afr.dstore1-client-0=0x000000000000000000000000 trusted.afr.dstore1-client-1=0x000000000000000000000000 trusted.afr.dstore1-client-2=0x000000000000000000000000 trusted.gfid=0x0e53a4d9c45340f0b9bd085c70a6af41 trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x0000000c16982000 [03/10/12 - 22:54:37 root@APP-SERVER2 ~]# du -h /export1/nfsf1 49G /export1/nfsf1 [03/10/12 - 22:55:06 root@APP-SERVER2 ~]# [03/10/12 - 22:58:28 root@APP-SERVER2 ~]# getfattr -m . -d -e hex /export1/nfsf1 getfattr: Removing leading '/' from absolute path names # file: export1/nfsf1 trusted.afr.dstore1-client-0=0x000000000000000000000000 trusted.afr.dstore1-client-1=0x000000000000000000000000 trusted.afr.dstore1-client-2=0x000000000000000000000000 trusted.gfid=0x0e53a4d9c45340f0b9bd085c70a6af41 trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x0000000c16982000 [03/10/12 - 22:54:47 root@APP-SERVER1 ~]# du -h /export1/nfsf1 6.5G /export1/nfsf1 [03/10/12 - 22:54:53 root@APP-SERVER1 ~]# getfattr -m . -d -e hex /export1/nfsf1 getfattr: Removing leading '/' from absolute path names # file: export1/nfsf1 trusted.afr.dstore1-client-0=0x000000000000000000000000 trusted.afr.dstore1-client-1=0x000000000000000000000000 trusted.afr.dstore1-client-2=0x000000000000000000000000 trusted.gfid=0x0e53a4d9c45340f0b9bd085c70a6af41 trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x000000019b411000 glustershd logs:- ------------------ [2012-03-10 22:00:33.613995] I [client-handshake.c:1308:client_setvolume_cbk] 0-dstore1-client-0: clnt-lk-version = 1, server-lk-version = 0 [2012-03-10 22:00:33.614040] I [client-handshake.c:1334:client_setvolume_cbk] 0-dstore1-client-0: Connected to 192.168.2.35:24009, attached to remote volume '/export1' . [2012-03-10 22:00:33.614141] I [afr-self-heald.c:1048:afr_start_crawl] 0-dstore1-replicate-0: starting crawl 1 for dstore1-client-0 [2012-03-10 22:00:33.623052] W [client.c:1990:client_rpc_notify] 0-dstore1-client-3: Cancelling the grace timer [2012-03-10 22:00:33.623224] I [afr-self-heald.c:949:afr_dir_crawl] 0-dstore1-replicate-1: Crawl completed on dstore1-client-5 [2012-03-10 22:00:33.623621] I [client-handshake.c:1533:select_server_supported_programs] 0-dstore1-client-3: Using Program GlusterFS 3.3.0qa26, Num (1298437), Version (330) [2012-03-10 22:00:33.627940] I [client-handshake.c:1308:client_setvolume_cbk] 0-dstore1-client-3: clnt-lk-version = 1, server-lk-version = 0 [2012-03-10 22:00:33.627971] I [client-handshake.c:1334:client_setvolume_cbk] 0-dstore1-client-3: Connected to 192.168.2.35:24010, attached to remote volume '/export2' . [2012-03-10 22:00:33.628201] I [afr-self-heald.c:1048:afr_start_crawl] 0-dstore1-replicate-1: starting crawl 1 for dstore1-client-3 [2012-03-10 22:00:33.637846] I [afr-self-heald.c:890:afr_find_child_position] 0-dstore1-replicate-0: child dstore1-client-0 is remote [2012-03-10 22:00:33.638076] W [client.c:1990:client_rpc_notify] 0-dstore1-client-1: Cancelling the grace timer [2012-03-10 22:00:33.639141] I [client-handshake.c:1533:select_server_supported_programs] 0-dstore1-client-1: Using Program GlusterFS 3.3.0qa26, Num (1298437), Version (330) [2012-03-10 22:00:33.640539] I [afr-self-heald.c:890:afr_find_child_position] 0-dstore1-replicate-1: child dstore1-client-3 is remote [2012-03-10 22:00:33.641984] I [client-handshake.c:1308:client_setvolume_cbk] 0-dstore1-client-1: clnt-lk-version = 1, server-lk-version = 0 [2012-03-10 22:00:33.642028] I [client-handshake.c:1334:client_setvolume_cbk] 0-dstore1-client-1: Connected to 192.168.2.36:24009, attached to remote volume '/export1' . [2012-03-10 22:00:33.642128] I [afr-self-heald.c:1048:afr_start_crawl] 0-dstore1-replicate-0: starting crawl 1 for dstore1-client-1 [2012-03-10 22:00:33.645176] W [client.c:1990:client_rpc_notify] 0-dstore1-client-4: Cancelling the grace timer [2012-03-10 22:00:33.647412] I [client-handshake.c:1533:select_server_supported_programs] 0-dstore1-client-4: Using Program GlusterFS 3.3.0qa26, Num (1298437), Version (330) [2012-03-10 22:00:33.648519] I [afr-self-heald.c:890:afr_find_child_position] 0-dstore1-replicate-0: child dstore1-client-1 is remote [2012-03-10 22:00:33.648788] I [client-handshake.c:1308:client_setvolume_cbk] 0-dstore1-client-4: clnt-lk-version = 1, server-lk-version = 0 [2012-03-10 22:00:33.648823] I [client-handshake.c:1334:client_setvolume_cbk] 0-dstore1-client-4: Connected to 192.168.2.36:24010, attached to remote volume '/export2'. [2012-03-10 22:00:33.648912] I [afr-self-heald.c:1048:afr_start_crawl] 0-dstore1-replicate-1: starting crawl 1 for dstore1-client-4 [2012-03-10 22:00:33.654426] I [afr-self-heald.c:890:afr_find_child_position] 0-dstore1-replicate-1: child dstore1-client-4 is remote [2012-03-10 22:00:33.664607] I [afr-self-heald.c:949:afr_dir_crawl] 0-dstore1-replicate-0: Crawl completed on dstore1-client-2 [2012-03-10 22:10:33.744321] I [afr-self-heald.c:1048:afr_start_crawl] 0-dstore1-replicate-1: starting crawl 1 for dstore1-client-5 [2012-03-10 22:10:33.744787] I [afr-self-heald.c:1048:afr_start_crawl] 0-dstore1-replicate-0: starting crawl 1 for dstore1-client-2 [2012-03-10 22:10:33.744861] I [afr-self-heald.c:1048:afr_start_crawl] 0-dstore1-replicate-0: starting crawl 1 for dstore1-client-0 [2012-03-10 22:10:33.744939] I [afr-self-heald.c:1048:afr_start_crawl] 0-dstore1-replicate-1: starting crawl 1 for dstore1-client-3 [2012-03-10 22:10:33.745160] I [afr-self-heald.c:1048:afr_start_crawl] 0-dstore1-replicate-0: starting crawl 1 for dstore1-client-1 [2012-03-10 22:10:33.745265] I [afr-self-heald.c:1048:afr_start_crawl] 0-dstore1-replicate-1: starting crawl 1 for dstore1-client-4 [2012-03-10 22:10:33.745474] I [afr-self-heald.c:890:afr_find_child_position] 0-dstore1-replicate-0: child dstore1-client-2 is local [2012-03-10 22:10:33.745678] I [afr-self-heald.c:890:afr_find_child_position] 0-dstore1-replicate-1: child dstore1-client-5 is local [2012-03-10 22:10:33.748841] I [afr-common.c:1313:afr_launch_self_heal] 0-dstore1-replicate-1: background data self-heal triggered. path: , reason: lookup detected pending operations [2012-03-10 22:10:33.750670] I [afr-common.c:1313:afr_launch_self_heal] 0-dstore1-replicate-0: background data self-heal triggered. path: , reason: lookup detected pending operations [2012-03-10 22:10:33.754028] I [afr-self-heal-data.c:738:afr_sh_data_fix] 0-dstore1-replicate-1: no active sinks for performing self-heal on file [2012-03-10 22:10:33.755123] I [afr-self-heal-common.c:2037:afr_self_heal_completion_cbk] 0-dstore1-replicate-1: background data self-heal completed on