Bug 994957 - Dist-geo-rep: deletes (rm -rf) are not synced to slave volume.
Summary: Dist-geo-rep: deletes (rm -rf) are not synced to slave volume.
Keywords:
Status: CLOSED DUPLICATE of bug 996132
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: geo-replication
Version: 2.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Venky Shankar
QA Contact: Sudhir D
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-08-08 09:59 UTC by M S Vishwanath Bhat
Modified: 2016-06-01 01:56 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-08-19 02:56:16 UTC
Embargoed:


Attachments (Terms of Use)

Description M S Vishwanath Bhat 2013-08-08 09:59:02 UTC
Description of problem:
I was running some tests for the gfid verification and after I'm done with the tests, I executed rm -rf * on the master volume. And these are not sycned to slave even after about day. The status detail always shows the same number of deletes pending. They aren't really propagated to the slave.

Version-Release number of selected component (if applicable):
glusterfs-3.4.0.17rhs-1.el6rhs.x86_64

How reproducible:
Hit once. Not sure if 100% reproducible.

Steps to Reproduce:
1. Create and start a geo-rep session between 2*2 dist-rep master volume and 2*2 dist-rep slave volume.
2. I did some of tests for gfid-verification. Which included lot of create/delete/rename create and delete and recreate with the same name. rename to new location, mv to a existing file etc etc
3. Then after all that, I deleted all contents of master. "rm -rf /mnt/master/*"

Actual results:
The deletes are not propagated/synced to the slave. They seem to be in a deadlock. Status detail always keeps showing the same number of deletes pending.


Expected results:
Deletes should be propagated and synced to the slave.


Additional info:
I don't see any error messages in the geo-replication logs. But on the slave where the syncing is happening, I saw lot of these warning messages in the auxiliary mount log file.


[2013-08-07 16:36:33.248637] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 0-slave-client-2: remote operation failed: No such file or directory
[2013-08-07 16:36:33.248987] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 0-slave-client-3: remote operation failed: No such file or directory
[2013-08-07 16:36:33.249042] W [fuse-bridge.c:1688:fuse_unlink_cbk] 0-glusterfs-fuse: 1155916: RMDIR() <gfid:6acc4466-6209-4f2d-ad70-a1eaee5ff53a>/batman-adv => -1 (No such file or directory)
[2013-08-07 16:36:33.263543] W [fuse-bridge.c:1688:fuse_unlink_cbk] 0-glusterfs-fuse: 1155921: UNLINK() <gfid:11daca56-c10d-4d1e-a39b-46e167e7a6e7>/bnep.h_rename => -1 (No such file or directory)
[2013-08-07 16:36:33.278817] W [fuse-bridge.c:1688:fuse_unlink_cbk] 0-glusterfs-fuse: 1155924: UNLINK() <gfid:11daca56-c10d-4d1e-a39b-46e167e7a6e7>/core.c_rename => -1 (No such file or directory)
[2013-08-07 16:36:33.288741] W [fuse-bridge.c:1688:fuse_unlink_cbk] 0-glusterfs-fuse: 1155927: UNLINK() <gfid:11daca56-c10d-4d1e-a39b-46e167e7a6e7>/Kconfig_rename => -1 (No such file or directory)
[2013-08-07 16:36:33.301303] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-slave-client-2: remote operation failed: No such file or directory. Path: <gfid:84501683-829d-401d-aa63-7e6946c50b07>/bnep (11daca56-c10d-4d1e-a39b-46e167e7a6e7)
[2013-08-07 16:36:33.302156] W [client-rpc-fops.c:2252:client3_3_readdir_cbk] 0-slave-client-2: remote operation failed: Operation not permitted remote_fd = -2
[2013-08-07 16:36:33.302193] I [afr-dir-read.c:117:afr_examine_dir_readdir_cbk] 0-slave-replicate-1: <gfid:84501683-829d-401d-aa63-7e6946c50b07>/bnep: failed to do opendir on slave-client-2
[2013-08-07 16:36:33.303324] W [client-rpc-fops.c:2316:client3_3_readdirp_cbk] 0-slave-client-2: remote operation failed: Operation not permitted



I have attached this auxiliary mount log file from the slave.

I will archive the sosreport + geo-rep working dir from all the relevant nodes.

Comment 1 M S Vishwanath Bhat 2013-08-08 10:00:12 UTC
Unable to add attachment because of size limits. Will archive all the logs.

Comment 3 M S Vishwanath Bhat 2013-08-13 10:42:31 UTC
I have hit this once again.

This time I started rm -rf. Then stopped the session and re-started immediately. Now the deletes are not being synced at all. Even the status detail shows 0 in the deletes pending.


IIRC this used to work before. Should I set the regression flag?

Comment 4 Venky Shankar 2013-08-13 16:51:03 UTC
MS,

Let me have a look at this now. As you said, this used to work without any issues.
Can you provide the machine names where I can have a look. Looks like VK also ran into a similar issue.

Comment 5 Venky Shankar 2013-08-19 02:56:16 UTC

*** This bug has been marked as a duplicate of bug 996132 ***


Note You need to log in before you can comment on or make changes to this bug.