Bug 821056 - file does not remain in sync after self-heal
Summary: file does not remain in sync after self-heal
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.1.7
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 853684 858501 895528
TreeView+ depends on / blocked
 
Reported: 2012-05-11 17:28 UTC by Joe Julian
Modified: 2015-12-01 16:45 UTC (History)
4 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 853684 (view as bug list)
Environment:
Last Closed: 2013-07-24 17:35:53 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
client state dump (21.61 KB, application/octet-stream)
2012-05-11 22:10 UTC, Joe Julian
no flags Details
ewcs2:/var/share/glusterd/d_vmimages state dump (112.62 KB, application/octet-stream)
2012-05-11 22:10 UTC, Joe Julian
no flags Details
ewcs4:/var/share/glusterd/d_vmimages state dump (100.14 KB, application/octet-stream)
2012-05-11 22:11 UTC, Joe Julian
no flags Details
ewcs7:/var/share/glusterd/d_vmimages state dump (93.32 KB, application/octet-stream)
2012-05-11 22:11 UTC, Joe Julian
no flags Details

Description Joe Julian 2012-05-11 17:28:06 UTC
I  host raw vm images on:
Volume Name: vmimages
Type: Distributed-Replicate
Status: Started
Number of Bricks: 4 x 3 = 12
Transport-type: tcp
Bricks:
Brick1: ewcs2:/var/spool/glusterfs/a_vmimages
Brick2: ewcs4:/var/spool/glusterfs/a_vmimages
Brick3: ewcs7:/var/spool/glusterfs/a_vmimages
Brick4: ewcs2:/var/spool/glusterfs/b_vmimages
Brick5: ewcs4:/var/spool/glusterfs/b_vmimages
Brick6: ewcs7:/var/spool/glusterfs/b_vmimages
Brick7: ewcs2:/var/spool/glusterfs/c_vmimages
Brick8: ewcs4:/var/spool/glusterfs/c_vmimages
Brick9: ewcs7:/var/spool/glusterfs/c_vmimages
Brick10: ewcs2:/var/spool/glusterfs/d_vmimages
Brick11: ewcs4:/var/spool/glusterfs/d_vmimages
Brick12: ewcs7:/var/spool/glusterfs/d_vmimages

After replacing ewcs7:/var/spool/glusterfs/c_vmimages due to a disk failure, I ran a stat() against the files (vm images) in that volume. These VM's were live at the time. The background self-heal was scheduled and completed. The extended attributes, however, showed pending.

before:
/var/spool/glusterfs/d_vmimages/intranet.img
trusted.afr.vmimages-client-11=0x0000accd0000000000000000

log:
[2012-05-11 08:24:28.756410] I [afr-common.c:790:afr_lookup_done] 1-vmimages-replicate-3: background  data self-heal triggered. path: /intranet.img
[2012-05-11 08:30:10.230581] I [afr-self-heal-algorithm.c:532:sh_diff_loop_driver_done] 1-vmimages-replicate-3: diff self-heal on /intranet.img: 56 blocks of 49152 were different (0.11%)
[2012-05-11 08:30:10.459175] I [afr-self-heal-common.c:1536:afr_self_heal_completion_cbk] 1-vmimages-replicate-3: background  data self-heal completed on /intranet.img

after:
/var/spool/glusterfs/d_vmimages/intranet.img
trusted.afr.vmimages-client-11=0x000000f80000000000000000

Trying again with client-log level debug led me to this anomaly:
[2012-05-11 09:35:04.900303] D [afr-self-heal-data.c:573:afr_sh_data_sync_prepare] 1-vmimages-replicate-3: self-healing file /intranet.img from subvolume vmimages-client-9 to 1 other

Why is it self-healing from vmimages-client-9 when it's vmimages-client-11 that's dirty?

Comment 1 Joe Julian 2012-05-11 22:10:00 UTC
Created attachment 583940 [details]
client state dump

Comment 2 Joe Julian 2012-05-11 22:10:54 UTC
Created attachment 583941 [details]
ewcs2:/var/share/glusterd/d_vmimages state dump

Comment 3 Joe Julian 2012-05-11 22:11:31 UTC
Created attachment 583942 [details]
ewcs4:/var/share/glusterd/d_vmimages state dump

Comment 4 Joe Julian 2012-05-11 22:11:57 UTC
Created attachment 583943 [details]
ewcs7:/var/share/glusterd/d_vmimages state dump

Comment 5 Joe Julian 2012-05-11 22:20:06 UTC
We think that it's possible that the changed drive may be a red herring. I didn't start looking for dirty attributes until that happened, but I did down ewcs7 for an upgrade on the 3rd, so it's possible that's when it started.

intranet.img isn't the only file that's showing this behavior. Here are the non-zero xattrs remaining:

ewcs{2,4}:
# file: var/spool/glusterfs/a_vmimages/blog.img
trusted.afr.vmimages-client-2=0x000036910000000000000000

# file: var/spool/glusterfs/a_vmimages/mysql1.img
trusted.afr.vmimages-client-2=0x0000ab3b0000000000000000

# file: var/spool/glusterfs/c_vmimages/bazaar.img
trusted.afr.vmimages-client-8=0x00002d720000000000000000

# file: var/spool/glusterfs/d_vmimages/intranet.img
trusted.afr.vmimages-client-11=0x0000273a0000000000000000

# file: var/spool/glusterfs/d_vmimages/squid1.img
trusted.afr.vmimages-client-11=0x00009d510000000000000000

# file: var/spool/glusterfs/d_vmimages/web2.img
trusted.afr.vmimages-client-11=0x00009ff00000000000000000

ewcs7 is all 0

I also realized that these may be relevant:
[2012-05-11 09:32:55.844205] D [client3_1-fops.c:4573:client3_1_finodelk] 0-vmimages-client-2: (144): failed to get fd ctx. EBADFD
[2012-05-11 09:32:55.844807] D [client3_1-fops.c:4573:client3_1_finodelk] 0-vmimages-client-2: (156): failed to get fd ctx. EBADFD
[2012-05-11 09:32:55.844916] D [afr-lk-common.c:410:transaction_lk_op] 0-vmimages-replicate-0: lk op is for a transaction
[2012-05-11 09:32:55.845259] D [afr-lk-common.c:410:transaction_lk_op] 0-vmimages-replicate-0: lk op is for a transaction
[2012-05-11 09:32:55.882763] D [afr-lk-common.c:410:transaction_lk_op] 0-vmimages-replicate-0: lk op is for a transaction
[2012-05-11 09:32:55.882928] D [afr-transaction.c:973:afr_post_nonblocking_inodelk_cbk] 0-vmimages-replicate-0: Non blocking inodelks failed. Proceeding to blocking
[2012-05-11 09:32:55.882981] D [afr-transaction.c:973:afr_post_nonblocking_inodelk_cbk] 0-vmimages-replicate-0: Non blocking inodelks failed. Proceeding to blocking
[2012-05-11 09:32:55.883365] D [client3_1-fops.c:4573:client3_1_finodelk] 0-vmimages-client-2: (144): failed to get fd ctx. EBADFD
[2012-05-11 09:32:55.883386] D [afr-lk-common.c:987:afr_lock_blocking] 0-vmimages-replicate-0: we're done locking
[2012-05-11 09:32:55.883402] D [afr-transaction.c:953:afr_post_blocking_inodelk_cbk] 0-vmimages-replicate-0: Blocking inodelks done. Proceeding to FOP
[2012-05-11 09:32:56.119427] D [client3_1-fops.c:4573:client3_1_finodelk] 0-vmimages-client-11: (135): failed to get fd ctx. EBADFD
[2012-05-11 09:32:56.119641] D [afr-lk-common.c:410:transaction_lk_op] 0-vmimages-replicate-0: lk op is for a transaction
[2012-05-11 09:32:56.120151] D [afr-lk-common.c:410:transaction_lk_op] 0-vmimages-replicate-3: lk op is for a transaction
[2012-05-11 09:32:56.120334] D [afr-lk-common.c:410:transaction_lk_op] 0-vmimages-replicate-3: lk op is for a transaction
[2012-05-11 09:32:56.120450] D [client3_1-fops.c:4573:client3_1_finodelk] 0-vmimages-client-2: (156): failed to get fd ctx. EBADFD
[2012-05-11 09:32:56.120487] D [afr-lk-common.c:987:afr_lock_blocking] 0-vmimages-replicate-0: we're done locking
[2012-05-11 09:32:56.120503] D [afr-transaction.c:953:afr_post_blocking_inodelk_cbk] 0-vmimages-replicate-0: Blocking inodelks done. Proceeding to FOP
[2012-05-11 09:32:56.120721] D [afr-transaction.c:973:afr_post_nonblocking_inodelk_cbk] 0-vmimages-replicate-3: Non blocking inodelks failed. Proceeding to blocking
[2012-05-11 09:32:56.121838] D [client3_1-fops.c:4573:client3_1_finodelk] 0-vmimages-client-11: (135): failed to get fd ctx. EBADFD
[2012-05-11 09:32:56.121894] D [afr-lk-common.c:987:afr_lock_blocking] 0-vmimages-replicate-3: we're done locking
[2012-05-11 09:32:56.121919] D [afr-transaction.c:953:afr_post_blocking_inodelk_cbk] 0-vmimages-replicate-3: Blocking inodelks done. Proceeding to FOP
[2012-05-11 09:32:56.252114] D [client3_1-fops.c:4573:client3_1_finodelk] 0-vmimages-client-11: (135): failed to get fd ctx. EBADFD
[2012-05-11 09:32:56.252441] D [afr-lk-common.c:410:transaction_lk_op] 0-vmimages-replicate-3: lk op is for a transaction
[2012-05-11 09:32:56.252876] D [afr-transaction.c:973:afr_post_nonblocking_inodelk_cbk] 0-vmimages-replicate-3: Non blocking inodelks failed. Proceeding to blocking
[2012-05-11 09:32:56.252958] D [afr-lk-common.c:410:transaction_lk_op] 0-vmimages-replicate-3: lk op is for a transaction
[2012-05-11 09:32:56.253320] D [client3_1-fops.c:4573:client3_1_finodelk] 0-vmimages-client-11: (135): failed to get fd ctx. EBADFD
[2012-05-11 09:32:56.253343] D [afr-lk-common.c:987:afr_lock_blocking] 0-vmimages-replicate-3: we're done locking
[2012-05-11 09:32:56.253367] D [afr-transaction.c:953:afr_post_blocking_inodelk_cbk] 0-vmimages-replicate-3: Blocking inodelks done. Proceeding to FOP
[2012-05-11 09:32:56.314070] D [client3_1-fops.c:4573:client3_1_finodelk] 0-vmimages-client-11: (159): failed to get fd ctx. EBADFD
[2012-05-11 09:32:56.314354] D [afr-lk-common.c:410:transaction_lk_op] 0-vmimages-replicate-3: lk op is for a transaction
[2012-05-11 09:32:56.314827] D [afr-transaction.c:973:afr_post_nonblocking_inodelk_cbk] 0-vmimages-replicate-3: Non blocking inodelks failed. Proceeding to blocking
[2012-05-11 09:32:56.315231] D [client3_1-fops.c:4573:client3_1_finodelk] 0-vmimages-client-11: (159): failed to get fd ctx. EBADFD
[2012-05-11 09:32:56.315253] D [afr-lk-common.c:987:afr_lock_blocking] 0-vmimages-replicate-3: we're done locking
[2012-05-11 09:32:56.315269] D [afr-transaction.c:953:afr_post_blocking_inodelk_cbk] 0-vmimages-replicate-3: Blocking inodelks done. Proceeding to FOP

Comment 6 Pranith Kumar K 2012-06-13 08:56:36 UTC
Seems like re-open of the fd after you brought the brick back up did not work fine. It keeps giving EBADFD for the fd. That could be the reason for why the sync is never complete. Writes are not happening on the new brick. How did you perform the disk replace? Did you bring the brick down?

Comment 7 Pranith Kumar K 2012-06-13 09:29:17 UTC
I just found that there is a problem in re-open of a file in case of disconnect and re-connect, in client xlator. If at the time of re-open after the brick reconnects, if the file is not present on the brick (since the disk was replaced I am assuming there are no files present on the disk at the time of bringing the brick up) the re-open fails and further operations that happen on the file using that fd will now happen on only one brick (assuming 2 replica). So the file will never be in sync as long as the fd is open. With hosting VMs this a problem.

Comment 8 Joe Julian 2012-06-13 14:09:11 UTC
That sounds like that's it.  When the drive failed, I killed glusterfsd for that brick, replaced the drive (partitioned, formatted, etc.), then resterted the brick with "gluster volume start vmimages force".

Comment 9 Vijay Bellur 2013-02-03 19:34:14 UTC
CHANGE: http://review.gluster.org/4357 (protocol/client: Add fdctx back to saved-list after reopen) merged in master by Anand Avati (avati)

Comment 10 Vijay Bellur 2013-02-03 19:34:31 UTC
CHANGE: http://review.gluster.org/4358 (protocol/client: Periodically attempt reopens) merged in master by Anand Avati (avati)

Comment 11 Vijay Bellur 2013-02-03 19:35:09 UTC
CHANGE: http://review.gluster.org/4386 (tests: Added util functions) merged in master by Anand Avati (avati)

Comment 12 Vijay Bellur 2013-02-03 19:36:03 UTC
CHANGE: http://review.gluster.org/4387 (test: test re-opens of open-fds) merged in master by Anand Avati (avati)

Comment 13 Vijay Bellur 2013-02-05 06:12:05 UTC
CHANGE: http://review.gluster.org/4464 (protocol/client: Avoid double free of frame) merged in master by Anand Avati (avati)

Comment 14 Joe Julian 2013-02-18 17:30:46 UTC
Please backport to release-3.3

Comment 15 Vijay Bellur 2013-02-20 00:09:22 UTC
CHANGE: http://review.gluster.org/4540 (Tests: Disable open-behind) merged in master by Anand Avati (avati)


Note You need to log in before you can comment on or make changes to this bug.