Bug 1056550 - Dist-geo-rep : after hardlink syncing to slave, arequal on slave mount failed with short read for some files.
Summary: Dist-geo-rep : after hardlink syncing to slave, arequal on slave mount failed...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: geo-replication
Version: 2.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: RHGS 3.0.0
Assignee: Venky Shankar
QA Contact: Bhaskar Bandari
URL:
Whiteboard:
Depends On:
Blocks: 1126368
TreeView+ depends on / blocked
 
Reported: 2014-01-22 12:51 UTC by Vijaykumar Koppad
Modified: 2015-05-13 16:59 UTC (History)
10 users (show)

Fixed In Version: glusterfs-3.6.0.18-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1126368 (view as bug list)
Environment:
Last Closed: 2014-09-22 19:32:03 UTC
Embargoed:


Attachments (Terms of Use)
client log file on which the arequal failed. (74 bytes, text/plain)
2014-01-22 12:56 UTC, Vijaykumar Koppad
no flags Details
sosreports of the all the machine involved in the volume. (60 bytes, text/plain)
2014-01-22 15:17 UTC, Vijaykumar Koppad
no flags Details
slave brick log file. (1.11 MB, text/x-log)
2014-01-31 09:24 UTC, Vijaykumar Koppad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:1278 0 normal SHIPPED_LIVE Red Hat Storage Server 3.0 bug fix and enhancement update 2014-09-22 23:26:55 UTC

Description Vijaykumar Koppad 2014-01-22 12:51:47 UTC
Description of problem:after hardlink syncing to slave, arequal on slave failed with short read for some files. 

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
# ./arequal-checksum  /mnt/slave
md5sum: /mnt/slave/level00/level10/level20/level30/level40/level50/level60/level70/level80/level90/hardlink_to_files/52df9997%%J1LWI41EU7: No data available
/mnt/slave/level00/level10/level20/level30/level40/level50/level60/level70/level80/level90/hardlink_to_files/52df9997%%J1LWI41EU7: short read
ftw (/mnt/slave) returned -1 (Success), terminating
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Geo-rep client logs for that file 52df9997%%J1LWI41EU7  on slave, 

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
# grep 52df9997%%J1LWI41EU7  /var/log/glusterfs/geo-replication-slaves/205daa96-ff92-45be-96bc-3e6bd5c0f631\:gluster%3A%2F%2F127.0.0.1%3Aslave.gluster.log
[2014-01-22 10:31:06.323927] W [client-rpc-fops.c:256:client3_3_mknod_cbk] 0-slave-client-0: remote operation failed: File exists. Path: <gfid:1c6e53ad-56ac-40b6-bf01-4994c0648493>/52df9997%%J1LWI41EU7
[2014-01-22 10:31:06.324255] W [client-rpc-fops.c:256:client3_3_mknod_cbk] 0-slave-client-1: remote operation failed: File exists. Path: <gfid:1c6e53ad-56ac-40b6-bf01-4994c0648493>/52df9997%%J1LWI41EU7
[2014-01-22 10:31:06.324300] I [fuse-bridge.c:3516:fuse_auxgfid_newentry_cbk] 0-fuse-aux-gfid-mount: failed to create the entry <gfid:1c6e53ad-56ac-40b6-bf01-4994c0648493>/52df9997%%J1LWI41EU7 with gfid (a69bd8bf-8d1f-4b9f-95d9-d6296ed7befb): File exists
[2014-01-22 10:31:06.324329] W [fuse-bridge.c:1628:fuse_err_cbk] 0-glusterfs-fuse: 1046: MKNOD() <gfid:1c6e53ad-56ac-40b6-bf01-4994c0648493>/52df9997%%J1LWI41EU7 => -1 (File exists)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>




Version-Release number of selected component (if applicable):glusterfs-server-3.4.0.57rhs-1


How reproducible: Didn't try to reproduce. 


Steps to Reproduce:
1. create and start a geo-rep relationship between master(dist-rep) and slave(dist-rep)
2.create some data on master using the command, "./crefi.py -n 10 --multi  -b 10 -d 10 --random --max=500K --min=10  /mnt/master/" and let it sync
3.create symlinks to those files "./crefi.py -n 10 --multi  -b 10 -d 10 --random --max=500K --min=10   --fop=symlink /mnt/master/" and let them sync 
4.stop geo-rep session
5.create hardlink to the regular files created "./crefi.py -n 10 --multi  -b 10 -d 10 --random --max=500K --min=10   --fop=hardlink /mnt/master/"
6. start geo-rep session 
5. Check the geo-rep log files.


Actual results:  arequal on slave mount failed with short read for some files. 


Expected results: it shouldn't fail on read on slave.


Additional info:

Comment 1 Vijaykumar Koppad 2014-01-22 12:56:11 UTC
Created attachment 853859 [details]
client log file on which the arequal failed.

Comment 3 Vijaykumar Koppad 2014-01-22 15:17:06 UTC
Created attachment 853917 [details]
sosreports of the all the machine involved in the volume.

Comment 4 Vijaykumar Koppad 2014-01-29 08:47:26 UTC
This is happening consistently. Happened again in the build glusterfs-server-3.4.0.58rhs-1.

Comment 5 Vijaykumar Koppad 2014-01-31 07:32:58 UTC
This is also happening in both hybrid crawl and changelog crawl.

This was error log of arequal,
ftw (-p) returned -1 (No data available), terminating


client logs,

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2014-01-31 07:17:36.210805] I [fuse-bridge.c:4811:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13
[2014-01-31 07:17:36.210905] I [client-handshake.c:450:client_set_lk_version_cbk] 0-slave-client-11: Server lk version = 1
[2014-01-31 07:18:30.338235] E [dht-helper.c:777:dht_migration_complete_check_task] 0-slave-dht: /level05/level15/level25/level35/level45/level55/level65/level75/level85/52ea540f%%QMDHDDXP81: failed to get the 'linkto' xattr No data available
[2014-01-31 07:18:30.338365] W [fuse-bridge.c:1134:fuse_attr_cbk] 0-glusterfs-fuse: 21456: STAT() /level05/level15/level25/level35/level45/level55/level65/level75/level85/52ea540f%%QMDHDDXP81 => -1 (No data available)

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

steps,

1. create and start a geo-rep relationship between master(dist-rep) and slave(dist-rep)
2.create some data on master using the command, "./crefi.py -n 10 --multi  -b 10 -d 10 --random --max=500K --min=10  /mnt/master/" and let it sync
3.create symlinks to those files "./crefi.py -n 10 --multi  -b 10 -d 10 --random --max=500K --min=10   --fop=symlink /mnt/master/" and let them sync 

4.create hardlink to the regular files created "./crefi.py -n 10 --multi  -b 10 -d 10 --random --max=500K --min=10   --fop=hardlink /mnt/master/"
5. run arequal-checksum on slave mount point.

Comment 6 Venky Shankar 2014-01-31 09:00:06 UTC
Vijaykumar,

Was rebalance invoked on the volume by any chance? Also, can you please upload/paste the brick logs where the file in question resides?

Comment 7 Vijaykumar Koppad 2014-01-31 09:21:49 UTC
There was no rebalance happening. It was plain start of the geo-rep session and syncing regular files and hardlinks.

Comment 8 Vijaykumar Koppad 2014-01-31 09:24:08 UTC
Created attachment 857753 [details]
slave brick log file.

attaching slave brick log file, where the file in question was residing.

Comment 9 Venky Shankar 2014-01-31 10:02:35 UTC
Thanks Vijaykumar,

I suspect we synced the sticky but file (not sure). For hardlinks, in a DHT volume, we would create a sticky bit file (with the link to attribute pointing to the correct subvolume if hashed and cached were different for that file). During hybrid crawl, there would be a race b/w syncing of the actual hardlink and the sticky bit file. If the sticky bit file gets synced, (which is what I see in logs you've given [failed to get the 'linkto' xattr No data available]), it may result in this issue.

Can you confirm if we've sticky bit files on the bricks without link-to xattrs.

Comment 10 Venky Shankar 2014-02-02 23:59:32 UTC
Looks like it's the issue mentioned in Comment #9.

Master:
ls -l 
/bricks/master_brick*/level04/level14/level24/level34/level44/level54/level64/level74/level84/level94/hardlink_to_files/52ea580b%%65F0TGQK1P
---------T 2 41940 5065    0 Jan 31 12:46 
/bricks/master_brick1/level04/level14/level24/level34/level44/level54/level64/level74/level84/level94/hardlink_to_files/52ea580b%%65F0TGQK1P
-rwx--xrwx 3 41940 5065 1767 Jan 31 12:29 
/bricks/master_brick9/level04/level14/level24/level34/level44/level54/level64/level74/level84/level94/hardlink_to_files/52ea580b%%65F0TGQK1P

ls -l 
/bricks/master_brick*/level04/level14/level24/level34/level44/level54/level64/level74/level84/level94/hardlink_to_files/52ea580b%%65F0TGQK1P
-rwx--xrwx 3 41940 5065 1767 Jan 31 12:29 
/bricks/master_brick10/level04/level14/level24/level34/level44/level54/level64/level74/level84/level94/hardlink_to_files/52ea580b%%65F0TGQK1P
---------T 2 41940 5065    0 Jan 31 12:46 
/bricks/master_brick2/level04/level14/level24/level34/level44/level54/level64/level74/level84/level94/hardlink_to_files/52ea580b%%65F0TGQK1P

Slave:
root@redlemon ~]# ls -l 
/bricks/slave_brick*/level04/level14/level24/level34/level44/level54/level64/level74/level84/level94/hardlink_to_files/52ea580b%%65F0TGQK1P
---------T 2 41940 41940 0 Jan 31 12:46 
/bricks/slave_brick11/level04/level14/level24/level34/level44/level54/level64/level74/level84/level94/hardlink_to_files/52ea580b%%65F0TGQK1P

ls -l 
/bricks/slave_brick*/level04/level14/level24/level34/level44/level54/level64/level74/level84/level94/hardlink_to_files/52ea580b%%65F0TGQK1P
---------T 2 41940 41940 0 Jan 31 12:46 
/bricks/slave_brick12/level04/level14/level24/level34/level44/level54/level64/level74/level84/level94/hardlink_to_files/52ea580b%%65F0TGQK1P


Vijaykumar,

Can you confirm that this does _not_ happen in changelog mode. I see the file names coming up in xsync changelog, which implies there were some files that were synced in the xsync mode (which has this issue).

If it's only during xsync mode then the fix is here: http://review.gluster.org/#/c/6792/

Comment 11 Vijaykumar Koppad 2014-02-03 09:05:41 UTC
If I try it in the changelog mode, I am hitting Bug 1003020, which crashes gsyncd and restarted gsyncd will sync the rest of the hardlinks through hybrid crawl(xsync) which is resulting in the above issue. I am able to hit this consistently with 6x2 volume, but this behavior is inconsistent with 2x2 volume.

Comment 12 Venky Shankar 2014-02-03 09:22:13 UTC
Vijaykumar,

Maybe you can try this in a pure replicated volume on the slave.

Comment 13 Vijaykumar Koppad 2014-06-14 12:29:45 UTC
This has happened in the build glusterfs-3.6.0.16-1.el6rhs, with 6x2 volume.

Comment 14 Vijaykumar Koppad 2014-07-18 07:39:11 UTC
This has happened in cascaded setup on slave level2 volume while syncing hardlinks. There are more number of file on slave level 2 volume than master and slave level 1 volume. This has happened in slave level 2 volume only. 
===============================================================================
file count on master is 17456
file count on slave is 17489
===============================================================================

There was error while calculating md5sum
===============================================================================
Calculating  slave checksum ...

Failed to get the checksum of slave with following error
md5sum: /tmp/tmpZUlbzy/thread3/level01/level11/53c7ad33%%TI64COMAMS: No data available
/tmp/tmpZUlbzy/thread3/level01/level11/53c7ad33%%TI64COMAMS: short read
ftw (-p) returned -1 (Success), terminating
===============================================================================

There are few files with 2 entries in the directory and we can also see stick bit files on the mount point.
===============================================================================
# ls /mnt/slave/thread0/level02/level12/level22/level32/hardlink_to_files/ -l
total 8
---------T 1 root  root     0 Jul 17 18:08 53c7c386%%0OUTYNSNBL
-r-------- 2 60664  2735 1266 Jul 17 16:32 53c7c386%%5UI8FJ3P3V
---------T 1 root  root     0 Jul 17 18:08 53c7c386%%7323VONN1K
-rw--wxrwx 2 50486 51232 1461 Jul 17 16:41 53c7c386%%OZV5T9I51D
---------T 1 root  root     0 Jul 17 18:08 53c7c387%%1M171U4F6V
---------T 1 root  root     0 Jul 17 18:08 53c7c387%%2O0FVVBHUZ
--wx-wx--x 2 42173 37786 1222 Jul 17 16:32 53c7c387%%67QTB5HYS3
---xr-xrwx 2  7886 62050 1514 Jul 17 16:41 53c7c387%%7B9NWNYBGV
---xr-xrwx 2  7886 62050 1514 Jul 17 16:41 53c7c387%%7B9NWNYBGV
---------T 1 root  root     0 Jul 17 18:08 53c7c387%%9F3CMK6ZLX
---------T 1 root  root     0 Jul 17 18:08 53c7c387%%SM0CONAEGX

# ls /mnt/slave/thread0/level02/level12/level22/level32/hardlink_to_files/53c7c387%%7B9NWNYBGV -l
---------T 1 root root 0 Jul 17 18:08 /mnt/slave/thread0/level02/level12/level22/level32/hardlink_to_files/53c7c387%%7B9NWNYBGV
===============================================================================

In above paste, there is file "53c7c387%%7B9NWNYBGV" which has 2 entries and also there are some file with sticky bit. 


In the intermediate master (slave level 1 volume) the active node which has the sticky bit for the file 53c7c386%%0OUTYNSNBL has the entry in changelogs like this 
=============================================================================
# grep -r "d90aff2a-d55f-454f-9794-df4eefd1b82d" *
1f8a8e6b046b00c682675ebf692f5968/.processed/CHANGELOG.1405600673:E d90aff2a-d55f-454f-9794-df4eefd1b82d MKNOD 33280 0 0 28571791-a541-4ab2-8e38-ca5924308b57%2F53c7c386%25%250OUTYNSNBL
1f8a8e6b046b00c682675ebf692f5968/.processed/CHANGELOG.1405600673:M d90aff2a-d55f-454f-9794-df4eefd1b82d NULL
==============================================================================

This changelog entry shouldn't be there in the node which has sticky bit file for the file 53c7c386%%0OUTYNSNBL

Comment 15 Vijaykumar Koppad 2014-07-18 09:44:27 UTC
Seems like, issue mentioned in the comment 14 is because some other issue, though effects are same. Hence its being tracked with this Bug 1121059

Comment 16 Venky Shankar 2014-08-14 06:40:18 UTC
This got merged before 3.0 was branched out, so it was already present in 3.0 branch.

Comment 17 Vijaykumar Koppad 2014-08-19 08:33:43 UTC
verified on the build glusterfs-3.6.0.27. Tried couple of time, didn't observe any issues mentioned in the description.

Comment 21 errata-xmlrpc 2014-09-22 19:32:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html


Note You need to log in before you can comment on or make changes to this bug.