Bug 872703 - sticky-pointer with no trusted.dht.linkto after a replace-brick commit force, heal full migration
Summary: sticky-pointer with no trusted.dht.linkto after a replace-brick commit force,...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-11-02 19:05 UTC by Joe Julian
Modified: 2014-12-14 19:40 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-12-14 19:40:29 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Joe Julian 2012-11-02 19:05:36 UTC
This is actually 3.3.1. There is no option for 3.3.1 in the version list.

After replacing a brick using the proposed replace-brick...commit force followed by a heal...full, the client showed 2 dentries for files that had a dht sticky-pointer. On the new brick, the 0 length mode 1000 files did not have the trusted.dht.linkto attribute.

Deleting these broken stickies did not resolve the problem as they were just healed back in the same broken way.

This is a replica 3 volume (4x3) and the migration was for the 2nd replica subvolume in the 3rd distribute subvolume.

Comment 1 Pranith Kumar K 2012-11-05 07:01:37 UTC
Joe,
    I tried re-creating the issue as per our discussion. It is not happening for me :-(. We must be missing some detail. Let me know if I did something wrong or if you find any more information about this issue.

[root@pranithk-laptop r2]# getfattr -d -m . -e hex /gfs/r2_?/10
getfattr: Removing leading '/' from absolute path names
# file: gfs/r2_0/10
trusted.afr.r2-client-0=0x000000000000000000000000
trusted.afr.r2-client-1=0x000000000000000000000000
trusted.afr.r2-client-2=0x000000000000000000000000
trusted.gfid=0x5fd4bd0068e540bba53097f545325af3

# file: gfs/r2_1/10
trusted.afr.r2-client-0=0x000000000000000000000000
trusted.afr.r2-client-1=0x000000000000000000000000
trusted.afr.r2-client-2=0x000000000000000000000000
trusted.gfid=0x5fd4bd0068e540bba53097f545325af3

# file: gfs/r2_2/10
trusted.afr.r2-client-0=0x000000000000000000000000
trusted.afr.r2-client-1=0x000000000000000000000000
trusted.afr.r2-client-2=0x000000000000000000000000
trusted.gfid=0x5fd4bd0068e540bba53097f545325af3

# file: gfs/r2_3/10
trusted.afr.r2-client-3=0x000000000000000000000000
trusted.afr.r2-client-4=0x000000000000000000000000
trusted.afr.r2-client-5=0x000000000000000000000000
trusted.gfid=0x5fd4bd0068e540bba53097f545325af3
trusted.glusterfs.dht.linkto=0x72322d7265706c69636174652d3000

# file: gfs/r2_4/10
trusted.afr.r2-client-3=0x000000000000000000000000
trusted.afr.r2-client-4=0x000000000000000000000000
trusted.afr.r2-client-5=0x000000000000000000000000
trusted.gfid=0x5fd4bd0068e540bba53097f545325af3
trusted.glusterfs.dht.linkto=0x72322d7265706c69636174652d3000

# file: gfs/r2_5/10
trusted.afr.r2-client-3=0x000000000000000000000000
trusted.afr.r2-client-4=0x000000000000000000000000
trusted.afr.r2-client-5=0x000000000000000000000000
trusted.gfid=0x5fd4bd0068e540bba53097f545325af3
trusted.glusterfs.dht.linkto=0x72322d7265706c69636174652d3000

[root@pranithk-laptop r2]# rm -f /gfs/r2_3/10
[root@pranithk-laptop r2]# getfattr -d -m . -e hex /gfs/r2_?/10
getfattr: Removing leading '/' from absolute path names
# file: gfs/r2_0/10
trusted.afr.r2-client-0=0x000000000000000000000000
trusted.afr.r2-client-1=0x000000000000000000000000
trusted.afr.r2-client-2=0x000000000000000000000000
trusted.gfid=0x5fd4bd0068e540bba53097f545325af3

# file: gfs/r2_1/10
trusted.afr.r2-client-0=0x000000000000000000000000
trusted.afr.r2-client-1=0x000000000000000000000000
trusted.afr.r2-client-2=0x000000000000000000000000
trusted.gfid=0x5fd4bd0068e540bba53097f545325af3

# file: gfs/r2_2/10
trusted.afr.r2-client-0=0x000000000000000000000000
trusted.afr.r2-client-1=0x000000000000000000000000
trusted.afr.r2-client-2=0x000000000000000000000000
trusted.gfid=0x5fd4bd0068e540bba53097f545325af3

# file: gfs/r2_4/10
trusted.afr.r2-client-3=0x000000000000000000000000
trusted.afr.r2-client-4=0x000000000000000000000000
trusted.afr.r2-client-5=0x000000000000000000000000
trusted.gfid=0x5fd4bd0068e540bba53097f545325af3
trusted.glusterfs.dht.linkto=0x72322d7265706c69636174652d3000

# file: gfs/r2_5/10
trusted.afr.r2-client-3=0x000000000000000000000000
trusted.afr.r2-client-4=0x000000000000000000000000
trusted.afr.r2-client-5=0x000000000000000000000000
trusted.gfid=0x5fd4bd0068e540bba53097f545325af3
trusted.glusterfs.dht.linkto=0x72322d7265706c69636174652d3000

[root@pranithk-laptop r2]# gluster volume heal r2 full
Heal operation on volume r2 has been successful
[root@pranithk-laptop r2]# getfattr -d -m . -e hex /gfs/r2_?/10
getfattr: Removing leading '/' from absolute path names
# file: gfs/r2_0/10
trusted.afr.r2-client-0=0x000000000000000000000000
trusted.afr.r2-client-1=0x000000000000000000000000
trusted.afr.r2-client-2=0x000000000000000000000000
trusted.gfid=0x5fd4bd0068e540bba53097f545325af3

# file: gfs/r2_1/10
trusted.afr.r2-client-0=0x000000000000000000000000
trusted.afr.r2-client-1=0x000000000000000000000000
trusted.afr.r2-client-2=0x000000000000000000000000
trusted.gfid=0x5fd4bd0068e540bba53097f545325af3

# file: gfs/r2_2/10
trusted.afr.r2-client-0=0x000000000000000000000000
trusted.afr.r2-client-1=0x000000000000000000000000
trusted.afr.r2-client-2=0x000000000000000000000000
trusted.gfid=0x5fd4bd0068e540bba53097f545325af3

# file: gfs/r2_3/10
trusted.afr.r2-client-3=0x000000000000000000000000
trusted.afr.r2-client-4=0x000000000000000000000000
trusted.afr.r2-client-5=0x000000000000000000000000
trusted.gfid=0x5fd4bd0068e540bba53097f545325af3
trusted.glusterfs.dht.linkto=0x72322d7265706c69636174652d3000

# file: gfs/r2_4/10
trusted.afr.r2-client-3=0x000000000000000000000000
trusted.afr.r2-client-4=0x000000000000000000000000
trusted.afr.r2-client-5=0x000000000000000000000000
trusted.gfid=0x5fd4bd0068e540bba53097f545325af3
trusted.glusterfs.dht.linkto=0x72322d7265706c69636174652d3000

# file: gfs/r2_5/10
trusted.afr.r2-client-3=0x000000000000000000000000
trusted.afr.r2-client-4=0x000000000000000000000000
trusted.afr.r2-client-5=0x000000000000000000000000
trusted.gfid=0x5fd4bd0068e540bba53097f545325af3
trusted.glusterfs.dht.linkto=0x72322d7265706c69636174652d3000

Comment 2 Pranith Kumar K 2012-11-05 08:38:04 UTC
Joe,
    I had forgotten to delete the hardlink in the attempt in the prev comment. I tried again with deleting the hardlink as well. It worked fine.

Pranith.

Comment 3 Joe Julian 2012-11-07 08:01:08 UTC
Volume Name: share1
Type: Distributed-Replicate
Volume ID: 9fbd655f-e060-41b4-9597-3a1ec2e41509
Status: Started
Number of Bricks: 4 x 3 = 12
Transport-type: tcp
Bricks:
Brick1: ewcs2:/var/spool/glusterfs/a_share1
Brick2: ewcs10:/data/glusterfs/share1/a
Brick3: ewcs7:/var/spool/glusterfs/a_share1
Brick4: ewcs2:/var/spool/glusterfs/b_share1
Brick5: ewcs10:/data/glusterfs/share1/b
Brick6: ewcs7:/var/spool/glusterfs/b_share1
Brick7: ewcs2:/var/spool/glusterfs/c_share1
Brick8: ewcs10:/data/glusterfs/share1/c
Brick9: ewcs7:/var/spool/glusterfs/c_share1
Brick10: ewcs2:/var/spool/glusterfs/d_share1
Brick11: ewcs10:/data/glusterfs/share1/d
Brick12: ewcs7:/var/spool/glusterfs/d_share1
Options Reconfigured:
performance.cache-size: 8MB
performance.io-cache: off
nfs.disable: on
nfs.rpc-auth-allow: on

# ls -l public/Installs/openssl-0.9.7e/crypto/cast
total 125
---------T 1 root root     0 Nov  3 12:36 casttest.c
---------T 1 root root     0 Nov  3 12:36 casttest.c

ewcs10:
# file: /data/share1/c/public/Installs/openssl-0.9.7e/crypto/cast/casttest.c
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a7661725f73706f6f6c5f743a733000
trusted.afr.share1-client-6=0x000000000000000000000000
trusted.afr.share1-client-7=0x000000000000000000000000
trusted.afr.share1-client-8=0x000000000000000000000000
trusted.gfid=0x732e66cd9496413da32bad8a47a1b6c7

# file: /data/share1/d/public/Installs/openssl-0.9.7e/crypto/cast/casttest.c
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a64656661756c745f743a733000
trusted.afr.share1-client-10=0x000000000000000000000000
trusted.afr.share1-client-11=0x000000000000000000000000
trusted.afr.share1-client-9=0x000000000000000000000000
trusted.afr.share1-io-threads=0x000000000000000000000000
trusted.afr.share1-replace-brick=0x000000000000000000000000
trusted.gfid=0x732e66cd9496413da32bad8a47a1b6c7
trusted.share1-posix.gen=0x4de74e2900000540

# file: c/.glusterfs/73/2e/732e66cd-9496-413d-a32b-ad8a47a1b6c7
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a7661725f73706f6f6c5f743a733000
trusted.afr.share1-client-6=0x000000000000000000000000
trusted.afr.share1-client-7=0x000000000000000000000000
trusted.afr.share1-client-8=0x000000000000000000000000
trusted.gfid=0x732e66cd9496413da32bad8a47a1b6c7

# file: d/.glusterfs/73/2e/732e66cd-9496-413d-a32b-ad8a47a1b6c7
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a64656661756c745f743a733000
trusted.afr.share1-client-10=0x000000000000000000000000
trusted.afr.share1-client-11=0x000000000000000000000000
trusted.afr.share1-client-9=0x000000000000000000000000
trusted.afr.share1-io-threads=0x000000000000000000000000
trusted.afr.share1-replace-brick=0x000000000000000000000000
trusted.gfid=0x732e66cd9496413da32bad8a47a1b6c7
trusted.share1-posix.gen=0x4de74e2900000540

ewcs2 and ewcs7 match with:
# file: c_share1/public/Installs/openssl-0.9.7e/crypto/cast/casttest.c
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a7661725f73706f6f6c5f743a733000
trusted.afr.share1-client-6=0x000000000000000000000000
trusted.afr.share1-client-7=0x000000000000000000000000
trusted.afr.share1-client-8=0x000000000000000000000000
trusted.gfid=0x732e66cd9496413da32bad8a47a1b6c7

# file: d_share1/public/Installs/openssl-0.9.7e/crypto/cast/casttest.c
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a7661725f73706f6f6c5f743a733000
trusted.afr.share1-client-10=0x000000000000000000000000
trusted.afr.share1-client-11=0x000000000000000000000000
trusted.afr.share1-client-9=0x000000000000000000000000
trusted.gfid=0x732e66cd9496413da32bad8a47a1b6c7
trusted.share1-posix.gen=0x4de74e2900000540

# file: c_share1/.glusterfs/73/2e/732e66cd-9496-413d-a32b-ad8a47a1b6c7
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a7661725f73706f6f6c5f743a733000
trusted.afr.share1-client-6=0x000000000000000000000000
trusted.afr.share1-client-7=0x000000000000000000000000
trusted.afr.share1-client-8=0x000000000000000000000000
trusted.gfid=0x732e66cd9496413da32bad8a47a1b6c7

# file: d_share1/.glusterfs/73/2e/732e66cd-9496-413d-a32b-ad8a47a1b6c7
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a7661725f73706f6f6c5f743a733000
trusted.afr.share1-client-10=0x000000000000000000000000
trusted.afr.share1-client-11=0x000000000000000000000000
trusted.afr.share1-client-9=0x000000000000000000000000
trusted.gfid=0x732e66cd9496413da32bad8a47a1b6c7
trusted.share1-posix.gen=0x4de74e2900000540

Comment 4 Joe Julian 2012-11-07 08:06:47 UTC
Almost forgot to indicate which ones are which (identical on all three replicas):
  File: `c/public/Installs/openssl-0.9.7e/crypto/cast/casttest.c'
  Size: 0         	Blocks: 0          IO Block: 4096   regular empty file
Device: fd37h/64823d	Inode: 1251        Links: 2
Access: (1000/---------T)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2011-12-28 01:48:32.648356000 -0800
Modify: 2012-11-03 12:36:37.896017000 -0700
Change: 2012-11-03 12:36:37.899813131 -0700
  File: `d/public/Installs/openssl-0.9.7e/crypto/cast/casttest.c'
  Size: 7496      	Blocks: 16         IO Block: 4096   regular file
Device: fd19h/64793d	Inode: 6292381     Links: 2
Access: (0664/-rw-rw-r--)  Uid: (    0/    root)   Gid: (    3/     sys)
Access: 2012-11-01 01:23:19.828815123 -0700
Modify: 2012-10-01 15:02:10.489055000 -0700
Change: 2012-10-19 06:20:58.713815692 -0700

Comment 5 Joe Julian 2012-11-07 08:18:55 UTC
Log entries for that file:
bricks/data-glusterfs-share1-c.log.1352028876:[2012-11-03 12:37:06.728311] I [server3_1-fops.c:1183:server_link_cbk] 0-share1-server: 63354: LINK /public/Installs/openssl-0.9.7e/test/casttest.c (b26b9d9a-e1f6-4df4-bc84-23dbcac91a2f) ==> -1 (File exists)
bricks/data-glusterfs-share1-c.log.1352028876:[2012-11-03 12:37:06.822605] I [server3_1-fops.c:1183:server_link_cbk] 0-share1-server: 63549: LINK /public/Installs/openssl-0.9.7e/test/casttest.c (b26b9d9a-e1f6-4df4-bc84-23dbcac91a2f) ==> -1 (File exists)
glustershd.log:[2012-10-29 16:49:12.182185] W [client3_1-fops.c:2457:client3_1_link_cbk] 0-share1-client-8: remote operation failed: File exists (00000000-0000-0000-0000-000000000000 -> /public/Installs/openssl-0.9.7e/test/casttest.c)

That last entry was the heal..full after the replace-brick..commit force.

Comment 6 Pranith Kumar K 2013-01-08 09:00:47 UTC
Joe,
   Considering the file casttest.c does not have dht.linkto xattr on all of 'c/public/Installs/openssl-0.9.7e/crypto/cast/casttest.c', the problem is not with Replace-brick/self-heal. Some how we got into a situation where the files were not set with dht xattr. Please let us know if you have any information about this.

Pranith.

Comment 7 Niels de Vos 2014-11-27 14:54:04 UTC
The version that this bug has been reported against, does not get any updates from the Gluster Community anymore. Please verify if this report is still valid against a current (3.4, 3.5 or 3.6) release and update the version, or close this bug.

If there has been no update before 9 December 2014, this bug will get automatocally closed.


Note You need to log in before you can comment on or make changes to this bug.