Description of problem: =============== rm -rf is taking very long time Version-Release number of selected component (if applicable): ================ glusterfs-server-3.7.5-9 How reproducible: Steps to Reproduce: ============== 1. Create 2x2 volume and then mount it on client using FUSE and create directory and then create 50k files 2. Attach 2x2 hot bricks to the volume and then create new directory and create around 20k files 3. Kill all the brick process and restart the volume using force option 4. While files are getting demoted run rm -rf * but it took more than 2 hours time Actual results: Expected results: ========== Should not take this much time Additional info: ============= [root@rhs-client19 test_tier-tier-dht]# gluster vol info test_tier Volume Name: test_tier Type: Tier Volume ID: 9bca8ffb-d47c-4636-95ab-2cfc58da422e Status: Started Number of Bricks: 8 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick1: rhs-client19.lab.eng.blr.redhat.com:/rhs/brick5/test_tier_hot4 Brick2: rhs-client18.lab.eng.blr.redhat.com:/rhs/brick5/test_tier_hot4 Brick3: rhs-client19.lab.eng.blr.redhat.com:/rhs/brick4/test_tier_hot3 Brick4: rhs-client18.lab.eng.blr.redhat.com:/rhs/brick4/test_tier_hot3 Cold Tier: Cold Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick5: rhs-client18.lab.eng.blr.redhat.com:/rhs/brick7/test_tier_hot1 Brick6: rhs-client19.lab.eng.blr.redhat.com:/rhs/brick7/test_tier_hot1 Brick7: rhs-client18.lab.eng.blr.redhat.com:/rhs/brick6/test_tier_hot2 Brick8: rhs-client19.lab.eng.blr.redhat.com:/rhs/brick6/test_tier_hot2 Options Reconfigured: cluster.tier-mode: test features.ctr-enabled: on performance.readdir-ahead: on Client name:vertigo.lab.eng.blr.redhat.com Mount:/mnt/test_tier
sosreport available @/home/repo/sosreports/bug.1288509 on rhsqe-repo.lab.eng.blr.redhat.com
Able to reproduce the issue without Step no 3 ( Kill all the brick process and restart the volume using force option)
I was not running rename operations
http://review.gluster.org/12972
https://code.engineering.redhat.com/gerrit/#/c/64372/1 Options provided in for good performance gluster vol set features.ctr-sql-db-wal-autocheckpoint 25000 gluster vol set features.ctr-sql-db-cachesize 12500 gluster vol set help for details
To delete 50k files took more than two hour
Can you turn off ctr and rerun the test? gluster set volume <vol?> features.ctr-enabled off
Tested with build glusterfs-server-3.7.5-13 and after setting features.ctr-sql-db-cachesize: 12500 and features.ctr-sql-db-wal-autocheckpoint: 25000 and removal of 50k files took 6m and clearly giving good performance As i need to repeat the same tests with build which having above settings as default values so please move this bug to ON_QA once build having proper default values
Sure. Thanks :)
Sure
https://code.engineering.redhat.com/gerrit/64642
Tested with 3.7.5-14 build and after creation of new tiered volume verified the features.ctr-sql-db-wal-autocheckpoint and features.ctr-sql-db-cachesize default values and those values are not modified so marking this bug as failed QA [root@tettnang afr1x2_tier_bug]# rpm -qa | grep glusterfs glusterfs-3.7.5-14.el7rhgs.x86_64 glusterfs-geo-replication-3.7.5-14.el7rhgs.x86_64 glusterfs-fuse-3.7.5-14.el7rhgs.x86_64 glusterfs-debuginfo-3.7.5-14.el7rhgs.x86_64 glusterfs-api-3.7.5-14.el7rhgs.x86_64 glusterfs-rdma-3.7.5-14.el7rhgs.x86_64 glusterfs-client-xlators-3.7.5-14.el7rhgs.x86_64 glusterfs-server-3.7.5-14.el7rhgs.x86_64 glusterfs-ganesha-3.7.5-14.el7rhgs.x86_64 glusterfs-cli-3.7.5-14.el7rhgs.x86_64 glusterfs-libs-3.7.5-14.el7rhgs.x86_64 glusterfs-api-devel-3.7.5-14.el7rhgs.x86_64 glusterfs-devel-3.7.5-14.el7rhgs.x86_64 [root@tettnang afr1x2_tier_bug]# gluster vol get perform_create all | grep cachesize features.ctr-sql-db-cachesize 1000 [root@tettnang afr1x2_tier_bug]# gluster vol get perform_create all | grep ctr-sql-db-wal-autocheckpoint features.ctr-sql-db-wal-autocheckpoint 1000 [root@tettnang afr1x2_tier_bug]#
https://code.engineering.redhat.com/gerrit/64971
Deletion of 50K files took 2m and by default features.ctr-sql-db-cachesize: and features.ctr-sql-db-wal-autocheckpoint set to 12500 & 25000 respectively so marking this bug as verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html