Description of problem: ======================= Tried deleting all the files from master using rm -rf *, some of the files on the root of slaves did not get deleted. When issued rm -rf from master there was a switch between active and passive bricks because for unknown reason the active bricks went down and passive bricks became active. Eventually the bricks which went down came back and became passive bricks. Files On Master: ================ [root@wingo master]# pwd /mnt/master [root@wingo master]# ls [root@wingo master]# Files on Slave: =============== [root@wingo slave]# ls 101 37 60 92 environment localtime nsswitch.conf 104 41 71 asound.conf fprintd.conf mail.rc sudo-ldap.conf 16 43 80 cgrules.conf gshadow- motd updatedb.conf 22 56 83 csh.cshrc kdump.conf my.cnf 34 57 89 DIR_COLORS.lightbgcolor krb5.conf networks [root@wingo slave]# Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.7.1-6.el6rhs.x86_64 How reproducible: ================= Tried once Steps to Reproduce: =================== 1. Create Master and Slave Cluster 2. Create and Start Master volume (2x2) 3. Create and Start Slave volume (2x2) 4. Create and Start meta volume (1x2) 5. Create and Start geo-rep session 6. Mount the master and slave volume (Fuse and NFS) 7. Create huge set of date from Master Fuse and NFS mount 8. Sync should complete to slave. Confirm using arequal 9. Add 2 bricks to the master volume (2x2=>3x2) 10. Start Rebalance 11. Once rebalance is completed, check arequal on master and slave. It should be same. 12. Perform rm -rf * from Master. Actual results: =============== Noticed some of the files never got deleted from the slave's root Expected results: ================= All files should be deleted from slaves too.
From the logs we could not concretely say what could have happened. But we could depict the following things from changelogs that were processed and also considering the bricks going offline and coming online. For one of the file which has not removed from slave, the entries are as follows in .processed directory. .processing: 1435675373 UNLINK georep2: b2 normal changelog .processing ========================================================================= georep1: ./764586b145d7206a154a778f64bd2f50/.processed/CHANGELOG.1435606117:E 988e518b-0766-473a-990c-4427577cc413 CREATE 384 0 0 00000000-0000-0000-0000-000000000001%2Fasound.conf ./764586b145d7206a154a778f64bd2f50/.processed/CHANGELOG.1435675375:E c2edfe25-3ba4-4d09-9f27-c9cd5e4c0bfe UNLINK 00000000-0000-0000-0000-000000000001%2Fasound.conf georep2: [root@georep2 ssh%3A%2F%2Froot%4010.70.46.101%3Agluster%3A%2F%2F127.0.0.1%3Aslave]# find . | xargs grep 00000000-0000-0000-0000-000000000001%2Fasound 2>/dev/null ./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1435675373:E c2edfe25-3ba4-4d09-9f27-c9cd5e4c0bfe UNLINK 00000000-0000-0000-0000-000000000001%2Fasound.conf georep3: [root@georep3 ssh%3A%2F%2Froot%4010.70.46.101%3Agluster%3A%2F%2F127.0.0.1%3Aslave]# find . | xargs grep 00000000-0000-0000-0000-000000000001%2Fasound 2>/dev/null Binary file ./764586b145d7206a154a778f64bd2f50/xsync/archive_201506.tar matches ./764586b145d7206a154a778f64bd2f50/xsync/XSYNC-CHANGELOG.1435666036:E c2edfe25-3ba4-4d09-9f27-c9cd5e4c0bfe MKNOD 33188 0 0 00000000-0000-0000-0000-000000000001%2Fasound.conf georep4: [root@georep4 ssh%3A%2F%2Froot%4010.70.46.101%3Agluster%3A%2F%2F127.0.0.1%3Aslave]# find . | xargs grep 00000000-0000-0000-0000-000000000001%2Fasound 2>/dev/null Binary file ./c19b89ac45352ab8c894d210d136dd56/.history/.processed/archive_201506.tar matches ./c19b89ac45352ab8c894d210d136dd56/.history/.processed/CHANGELOG.1435675319:E c2edfe25-3ba4-4d09-9f27-c9cd5e4c0bfe CREATE 384 0 0 00000000-0000-0000-0000-000000000001%2Fasound.conf ======================================================== .processed: CHANGELOG.1435606117 CREATE georep1: b2 normal changelog XSYNC-CHANGELOG.1435666036 MKNOD georep3: b2 xsync (Could be after replace brick. This was added as part of replace brick) The flip between gerep3:/rhs/brick2/b2 to georep4:/rhs/brick1/b1. So history picked and created the file CHANGELOG.1435675319 CREATE georep4: b1 history CHANGELOG.1435675375 DEL georep1: b2 normal changelog Because of ping pong nature of bricks going down and becoming online, and also stime query from mount point which gives max of replica and min of distribute, there could be overlaps of changelog processsing and it could process changelog which is already processed in other brick which went down. By the time it is processing already processed changelog, if this brick also goes down. There could be chances of inconsistent sync. Say in above, if CHANGELOG.1435675319 is processed last, the file would remain. which is what possibly could have happened.
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions
*** This bug has been marked as a duplicate of bug 1400198 ***