1109482 – Dist-geo-rep : while creating hardlinks on master, snapshot with geo-rep, resulted in few hardlink give read error on slave side.

Bug 1109482 - Dist-geo-rep : while creating hardlinks on master, snapshot with geo-rep, resulted in few hardlink give read error on slave side.

Summary: Dist-geo-rep : while creating hardlinks on master, snapshot with geo-rep, res...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	rhgs-3.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Bug Updates Notification Mailing List
QA Contact:	amainkar
Docs Contact:
URL:
Whiteboard:	consistency
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-06-14 11:04 UTC by Vijaykumar Koppad
Modified:	2018-04-16 15:57 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-04-16 15:57:57 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
sosreport of the master and slave nodes. (60 bytes, text/plain) 2014-06-14 11:15 UTC, Vijaykumar Koppad	no flags	Details
View All

Description Vijaykumar Koppad 2014-06-14 11:04:02 UTC

Description of problem: while creating hardlinks on master, snapshot with geo-rep, resulted in few hardlink give read error on slave side. 

arequal-checksum gave this error
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Calculating  slave checksum ...

md5sum: /tmp/tmpyfYjDH/thread0/level00/level10/level20/level30/level40/level50/level60/level70/level80/level90/hardlink_to_files/539c1fb9%%S7NZ3IENGZ: No data available

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

corresponding client logs say
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2014-06-14 10:39:52.011137] W [client-rpc-fops.c:1155:client3_3_fgetxattr_cbk] 0-slave-client-9: remote operation failed: No data available
[2014-06-14 10:39:52.011207] E [dht-helper.c:778:dht_migration_complete_check_task] 0-slave-dht: (null): failed to get the 'linkto' xattr No data available
[2014-06-14 10:39:52.011283] W [fuse-bridge.c:2157:fuse_readv_cbk] 0-glusterfs-fuse: 390: READ => -1 (No data available)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


Version-Release number of selected component (if applicable): glusterfs-3.6.0.16-1.el6rhs


How reproducible: didn't try to reproduce. 
 

Steps to Reproduce:
1. create geo-rep relationship between master and slave.
2. create data on master using the command "crefi -T 10 -n 5 --multi -b 10 -d 10 --random --min=1K --max=10K   /mnt/master/"
3. truncate all the data using the command "crefi -T 10 -n 5 --multi -b 10 -d 10 --random --min=1K --max=10K  --fop=truncate /mnt/master/"
4. while truncating the data, pause the geo-rep , take snap of slave and master and resume geo-rep.
5. After it completes syncing, create hardlinks using the command "crefi -T 10 -n 5 --multi -b 10 -d 10 --random --min=1K --max=10K  --fop=hardlink /mnt/master/"
6. while creating hardlinks, pause the geo-rep , take snap of slave and master and resume geo-rep.
7. Check the checksum of master and slave


Actual results: read on the few hardlinks failed with read error


Expected results: It shouldn't give read error on hardlinks after the syncing to slave. 


Additional info:

Comment 2 Vijaykumar Koppad 2014-06-14 11:15:29 UTC

Created attachment 908743 [details]
sosreport of the master and slave nodes.

Comment 3 Vijaykumar Koppad 2014-06-14 11:38:05 UTC

Stat of the file from mount-point

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
stat /tmp/tmpyfYjDH/thread0/level00/level10/level20/level30/level40/level50/level60/level70/level80/level90/hardlink_to_files/539c1fb9%%S7NZ3IENGZ 
  File: `/tmp/tmpyfYjDH/thread0/level00/level10/level20/level30/level40/level50/level60/level70/level80/level90/hardlink_to_files/539c1fb9%%S7NZ3IENGZ'
  Size: 0               Blocks: 0          IO Block: 131072 regular empty file
Device: 22h/34d Inode: 11524165653306002252  Links: 1
Access: (1000/---------T)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2014-06-14 15:45:12.164004818 +0530
Modify: 2014-06-14 15:45:12.164004818 +0530
Change: 2014-06-14 15:45:12.164004818 +0530

getfattr of the file in question from the slave backend bricks,

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[root@redmoon ~]# find /bricks/ | grep 539c1fb9%%S7NZ3IENGZ
/bricks/brick3/slave_b9/thread0/level00/level10/level20/level30/level40/level50/level60/level70/level80/level90/hardlink_to_files/539c1fb9%%S7NZ3IENGZ
[root@redmoon ~]# getfattr -d -m . -e hex /bricks/brick3/slave_b9/thread0/level00/level10/level20/level30/level40/level50/level60/level70/level80/level90/hardlink_to_files/539c1fb9%%S7NZ3IENGZ
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick3/slave_b9/thread0/level00/level10/level20/level30/level40/level50/level60/level70/level80/level90/hardlink_to_files/539c1fb9%%S7NZ3IENGZ
trusted.gfid=0xb18d20e85e734e2f9fee0f9aa20fcb4c

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[root@redcloud ~]# find /bricks/ | grep 539c1fb9%%S7NZ3IENGZ/bricks/brick3/slave_b10/thread0/level00/level10/level20/level30/level40/level50/level60/level70/level80/level90/hardlink_to_files/539c1fb9%%S7NZ3IENGZ
[root@redcloud ~]# getfattr -d -m . -e hex bricks/brick3/slave_b10/thread0/level00/level10/level20/level30/level40/level50/level60/level70/level80/level90/hardlink_to_files/539c1fb9%%S7NZ3IENGZ
getfattr: bricks/brick3/slave_b10/thread0/level00/level10/level20/level30/level40/level50/level60/level70/level80/level90/hardlink_to_files/539c1fb9%%S7NZ3IENGZ: No such file or directory

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Comment 4 Venky Shankar 2014-06-25 15:13:21 UTC

This looks like a side effect of capturing mknod() even if it's an internal fop. I see these in the logs:

------------------------------------------------------------------------------
vshankar@h3ckers-pride ~/sos/slave/redmoon-2014061416251402743305/var/log/glusterfs/geo-replication-slaves
 % grep -r '539c1fb9%%S7NZ3IENGZ' *
4d739b65-cd7b-49f3-902a-439653061bc8:gluster%3A%2F%2F127.0.0.1%3Aslave.gluster.log:[2014-06-14 10:15:30.633623] W [client-rpc-fops.c:240:client3_3_mknod_cbk] 0-slave-client-8: remote operation failed: File exists. Path: <gfid:ef718d6a-1b4e-4b3a-9000-9262500b5b23>/539c1fb9%%S7NZ3IENGZ
4d739b65-cd7b-49f3-902a-439653061bc8:gluster%3A%2F%2F127.0.0.1%3Aslave.gluster.log:[2014-06-14 10:15:30.634099] W [client-rpc-fops.c:240:client3_3_mknod_cbk] 0-slave-client-9: remote operation failed: File exists. Path: <gfid:ef718d6a-1b4e-4b3a-9000-9262500b5b23>/539c1fb9%%S7NZ3IENGZ
4d739b65-cd7b-49f3-902a-439653061bc8:gluster%3A%2F%2F127.0.0.1%3Aslave.gluster.log:[2014-06-14 10:17:02.033865] W [client-rpc-fops.c:240:client3_3_mknod_cbk] 0-slave-client-8: remote operation failed: File exists. Path: <gfid:ef718d6a-1b4e-4b3a-9000-9262500b5b23>/539c1fb9%%S7NZ3IENGZ
4d739b65-cd7b-49f3-902a-439653061bc8:gluster%3A%2F%2F127.0.0.1%3Aslave.gluster.log:[2014-06-14 10:17:02.034364] W [client-rpc-fops.c:240:client3_3_mknod_cbk] 0-slave-client-9: remote operation failed: File exists. Path: <gfid:ef718d6a-1b4e-4b3a-9000-9262500b5b23>/539c1fb9%%S7NZ3IENGZ
------------------------------------------------------------------------------

File "539c1fb9%%S7NZ3IENGZ" is a hardlink but the slave logs shows as mknod(). Though this is a file exist case, the first mknod() would have been successful.

Kotresh's patch to capture self-heal traffic ignores mknod() if it's an internal fop, thereby only the rename() call getting captured in changelog.

Comment 5 Vijaykumar Koppad 2014-06-25 15:16:34 UTC

I tried with the build glusterfs-3.6.0.22-1.el6rhs once, I was not able to hit it. I'll try some more runs with 22 and update.

Note You need to log in before you can comment on or make changes to this bug.