Description of problem: I was running some tests for the gfid verification and after I'm done with the tests, I executed rm -rf * on the master volume. And these are not sycned to slave even after about day. The status detail always shows the same number of deletes pending. They aren't really propagated to the slave. Version-Release number of selected component (if applicable): glusterfs-3.4.0.17rhs-1.el6rhs.x86_64 How reproducible: Hit once. Not sure if 100% reproducible. Steps to Reproduce: 1. Create and start a geo-rep session between 2*2 dist-rep master volume and 2*2 dist-rep slave volume. 2. I did some of tests for gfid-verification. Which included lot of create/delete/rename create and delete and recreate with the same name. rename to new location, mv to a existing file etc etc 3. Then after all that, I deleted all contents of master. "rm -rf /mnt/master/*" Actual results: The deletes are not propagated/synced to the slave. They seem to be in a deadlock. Status detail always keeps showing the same number of deletes pending. Expected results: Deletes should be propagated and synced to the slave. Additional info: I don't see any error messages in the geo-replication logs. But on the slave where the syncing is happening, I saw lot of these warning messages in the auxiliary mount log file. [2013-08-07 16:36:33.248637] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 0-slave-client-2: remote operation failed: No such file or directory [2013-08-07 16:36:33.248987] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 0-slave-client-3: remote operation failed: No such file or directory [2013-08-07 16:36:33.249042] W [fuse-bridge.c:1688:fuse_unlink_cbk] 0-glusterfs-fuse: 1155916: RMDIR() <gfid:6acc4466-6209-4f2d-ad70-a1eaee5ff53a>/batman-adv => -1 (No such file or directory) [2013-08-07 16:36:33.263543] W [fuse-bridge.c:1688:fuse_unlink_cbk] 0-glusterfs-fuse: 1155921: UNLINK() <gfid:11daca56-c10d-4d1e-a39b-46e167e7a6e7>/bnep.h_rename => -1 (No such file or directory) [2013-08-07 16:36:33.278817] W [fuse-bridge.c:1688:fuse_unlink_cbk] 0-glusterfs-fuse: 1155924: UNLINK() <gfid:11daca56-c10d-4d1e-a39b-46e167e7a6e7>/core.c_rename => -1 (No such file or directory) [2013-08-07 16:36:33.288741] W [fuse-bridge.c:1688:fuse_unlink_cbk] 0-glusterfs-fuse: 1155927: UNLINK() <gfid:11daca56-c10d-4d1e-a39b-46e167e7a6e7>/Kconfig_rename => -1 (No such file or directory) [2013-08-07 16:36:33.301303] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-slave-client-2: remote operation failed: No such file or directory. Path: <gfid:84501683-829d-401d-aa63-7e6946c50b07>/bnep (11daca56-c10d-4d1e-a39b-46e167e7a6e7) [2013-08-07 16:36:33.302156] W [client-rpc-fops.c:2252:client3_3_readdir_cbk] 0-slave-client-2: remote operation failed: Operation not permitted remote_fd = -2 [2013-08-07 16:36:33.302193] I [afr-dir-read.c:117:afr_examine_dir_readdir_cbk] 0-slave-replicate-1: <gfid:84501683-829d-401d-aa63-7e6946c50b07>/bnep: failed to do opendir on slave-client-2 [2013-08-07 16:36:33.303324] W [client-rpc-fops.c:2316:client3_3_readdirp_cbk] 0-slave-client-2: remote operation failed: Operation not permitted I have attached this auxiliary mount log file from the slave. I will archive the sosreport + geo-rep working dir from all the relevant nodes.
Unable to add attachment because of size limits. Will archive all the logs.
I have hit this once again. This time I started rm -rf. Then stopped the session and re-started immediately. Now the deletes are not being synced at all. Even the status detail shows 0 in the deletes pending. IIRC this used to work before. Should I set the regression flag?
MS, Let me have a look at this now. As you said, this used to work without any issues. Can you provide the machine names where I can have a look. Looks like VK also ran into a similar issue.
*** This bug has been marked as a duplicate of bug 996132 ***