Created attachment 1610643 [details] Detailed step, volume info, option and log Description of problem: Rebalance incomplete when volume option performance.parallel-readdir on , directory doesn't sync after rebalance cmd status is complete Version-Release number of selected component (if applicable): Release:4.1.8 How reproducible: if a volume is set as one is in the attachment (option list), This bug can be 100% duplicated. Steps to Reproduce: 1.Create a distribute volume (with 5 Bricks) 2.Set some specific options (options in attachment) 3.make directory and files ex: mkdir /mnt/volume_01/dir_1 mkdir /mnt/volume_01/dir_1/dir_2 mkdir /mnt/volume_01/dir_1/dir_2/dir_3 mkdir /mnt/volume_01/dir_1/dir_2/dir_3/dir_4 touch /mnt/volume_01/dir_1/dir_2/file{1..100} touch /mnt/volume_01/dir_1/dir_2/dir_3/file{101..200} touch /mnt/volume_01/dir_1/dir_2/dir_3/dir_4/a{201..300} 4.add-brick to volume (add 5 Bricks) 5.Rebalance this volume 6.check Rebalance status:Complete (use gluster v status check) 7.check every bricks's directory and files Actual results: Only dir_1 and dir_2 are sync-ed onto the 5 Newly-add Bricks (doesn't sync dir_3 and dir_4) Expected results: Should sync all four directory onto 5 Newly-add Bricks Additional info: Detailed step, volume info, option and log is in the attachment [root@K1 glusterfs]# gluster v get volume_01 all Option Value ------ ----- cluster.lookup-unhashed on cluster.lookup-optimize on cluster.min-free-disk 10% cluster.min-free-inodes 5% cluster.rebalance-stats off cluster.subvols-per-directory (null) cluster.readdir-optimize off cluster.rsync-hash-regex (null) cluster.extra-hash-regex (null) cluster.dht-xattr-name trusted.glusterfs.dht cluster.randomize-hash-range-by-gfid off cluster.rebal-throttle normal cluster.lock-migration off cluster.force-migration off cluster.local-volume-name (null) cluster.weighted-rebalance on cluster.switch-pattern (null) cluster.entry-change-log on cluster.read-subvolume (null) cluster.read-subvolume-index -1 cluster.read-hash-mode 1 cluster.background-self-heal-count 8 cluster.metadata-self-heal on cluster.data-self-heal on cluster.entry-self-heal on cluster.self-heal-daemon on cluster.heal-timeout 600 cluster.self-heal-window-size 1 cluster.data-change-log on cluster.metadata-change-log on cluster.data-self-heal-algorithm (null) cluster.eager-lock on disperse.eager-lock on disperse.other-eager-lock on disperse.eager-lock-timeout 1 disperse.other-eager-lock-timeout 1 cluster.quorum-type none cluster.quorum-count (null) cluster.choose-local true cluster.self-heal-readdir-size 1KB cluster.post-op-delay-secs 1 cluster.ensure-durability on cluster.consistent-metadata no cluster.heal-wait-queue-length 128 cluster.favorite-child-policy none cluster.full-lock yes cluster.stripe-block-size 128KB cluster.stripe-coalesce true diagnostics.latency-measurement on diagnostics.dump-fd-stats off diagnostics.count-fop-hits on diagnostics.brick-log-level ERROR diagnostics.client-log-level ERROR diagnostics.brick-sys-log-level CRITICAL diagnostics.client-sys-log-level CRITICAL diagnostics.brick-logger (null) diagnostics.client-logger (null) diagnostics.brick-log-format (null) diagnostics.client-log-format (null) diagnostics.brick-log-buf-size 5 diagnostics.client-log-buf-size 5 diagnostics.brick-log-flush-timeout 120 diagnostics.client-log-flush-timeout 120 diagnostics.stats-dump-interval 0 diagnostics.fop-sample-interval 0 diagnostics.stats-dump-format json diagnostics.fop-sample-buf-size 65535 diagnostics.stats-dnscache-ttl-sec 86400 performance.cache-max-file-size 0 performance.cache-min-file-size 0 performance.cache-refresh-timeout 1 performance.cache-priority performance.cache-size 32MB performance.io-thread-count 64 performance.high-prio-threads 16 performance.normal-prio-threads 16 performance.low-prio-threads 16 performance.least-prio-threads 1 performance.enable-least-priority on performance.iot-watchdog-secs (null) performance.iot-cleanup-disconnected-reqsoff performance.iot-pass-through false performance.io-cache-pass-through false performance.cache-size 128MB performance.qr-cache-timeout 1 performance.cache-invalidation true performance.flush-behind on performance.nfs.flush-behind off performance.write-behind-window-size 1MB performance.resync-failed-syncs-after-fsyncoff performance.nfs.write-behind-window-size1MB performance.strict-o-direct off performance.nfs.strict-o-direct off performance.strict-write-ordering off performance.nfs.strict-write-ordering off performance.write-behind-trickling-writeson performance.aggregate-size 128KB performance.nfs.write-behind-trickling-writeson performance.lazy-open yes performance.read-after-open no performance.open-behind-pass-through false performance.read-ahead-page-count 4 performance.read-ahead-pass-through false performance.readdir-ahead-pass-through false performance.md-cache-pass-through false performance.md-cache-timeout 1 performance.cache-swift-metadata true performance.cache-samba-metadata false performance.cache-capability-xattrs true performance.cache-ima-xattrs true performance.md-cache-statfs off performance.xattr-cache-list performance.nl-cache-pass-through false features.encryption off encryption.master-key (null) encryption.data-key-size 256 encryption.block-size 4096 network.frame-timeout 1800 network.ping-timeout 42 network.tcp-window-size (null) network.remote-dio disable client.event-threads 8 client.tcp-user-timeout 0 client.keepalive-time 20 client.keepalive-interval 2 client.keepalive-count 9 network.tcp-window-size (null) network.inode-lru-limit 16384 auth.allow * auth.reject (null) transport.keepalive 1 server.allow-insecure on server.root-squash off server.anonuid 65534 server.anongid 65534 server.statedump-path /var/run/gluster server.outstanding-rpc-limit 64 server.ssl (null) auth.ssl-allow * server.manage-gids off server.dynamic-auth on client.send-gids on server.gid-timeout 300 server.own-thread (null) server.event-threads 8 server.tcp-user-timeout 0 server.keepalive-time 20 server.keepalive-interval 2 server.keepalive-count 9 transport.listen-backlog 1024 ssl.own-cert (null) ssl.private-key (null) ssl.ca-list (null) ssl.crl-path (null) ssl.certificate-depth (null) ssl.cipher-list (null) ssl.dh-param (null) ssl.ec-curve (null) transport.address-family inet performance.write-behind off performance.read-ahead off performance.readdir-ahead on performance.io-cache off performance.quick-read off performance.open-behind off performance.nl-cache on performance.stat-prefetch on performance.client-io-threads on performance.nfs.write-behind off performance.nfs.read-ahead off performance.nfs.io-cache off performance.nfs.quick-read off performance.nfs.stat-prefetch off performance.nfs.io-threads off performance.force-readdirp true performance.cache-invalidation true features.uss off features.snapshot-directory .snaps features.show-snapshot-directory off features.tag-namespaces off network.compression off network.compression.window-size -15 network.compression.mem-level 8 network.compression.min-size 0 network.compression.compression-level -1 network.compression.debug false features.default-soft-limit 80% features.soft-timeout 60 features.hard-timeout 5 features.alert-time 86400 features.quota-deem-statfs off geo-replication.indexing off geo-replication.indexing off geo-replication.ignore-pid-check off geo-replication.ignore-pid-check off features.quota off features.inode-quota off features.bitrot disable debug.trace off debug.log-history no debug.log-file no debug.exclude-ops (null) debug.include-ops (null) debug.error-gen off debug.error-failure (null) debug.error-number (null) debug.random-failure off debug.error-fops (null) nfs.enable-ino32 no nfs.mem-factor 15 nfs.export-dirs on nfs.export-volumes on nfs.addr-namelookup off nfs.dynamic-volumes off nfs.register-with-portmap on nfs.outstanding-rpc-limit 16 nfs.port 2049 nfs.rpc-auth-unix on nfs.rpc-auth-null on nfs.rpc-auth-allow all nfs.rpc-auth-reject none nfs.ports-insecure off nfs.trusted-sync off nfs.trusted-write off nfs.volume-access read-write nfs.export-dir nfs.disable off nfs.nlm on nfs.acl on nfs.mount-udp off nfs.mount-rmtab /var/lib/glusterd/nfs/rmtab nfs.rpc-statd /sbin/rpc.statd nfs.server-aux-gids off nfs.drc off nfs.drc-size 0x20000 nfs.read-size (1 * 1048576ULL) nfs.write-size (1 * 1048576ULL) nfs.readdir-size (1 * 1048576ULL) nfs.rdirplus on nfs.event-threads 1 nfs.exports-auth-enable off nfs.auth-refresh-interval-sec 30 nfs.auth-cache-ttl-sec 30 features.read-only off features.worm off features.worm-file-level disable features.worm-files-deletable on features.default-retention-period 2147483647 features.retention-mode enterprise features.auto-commit-period 7200 storage.linux-aio off storage.batch-fsync-mode reverse-fsync storage.batch-fsync-delay-usec 0 storage.owner-uid -1 storage.owner-gid -1 storage.node-uuid-pathinfo off storage.health-check-interval 30 storage.build-pgfid off storage.gfid2path on storage.gfid2path-separator : storage.reserve 1 storage.health-check-timeout 10 storage.fips-mode-rchecksum off storage.force-create-mode 0000 storage.force-directory-mode 0000 storage.create-mask 0777 storage.create-directory-mask 0777 storage.max-hardlinks 100 storage.ctime off config.gfproxyd off cluster.server-quorum-type off cluster.server-quorum-ratio 0 changelog.changelog off changelog.changelog-dir {{ brick.path }}/.glusterfs/changelogs changelog.encoding ascii changelog.rollover-time 15 changelog.fsync-interval 5 changelog.changelog-barrier-timeout 120 changelog.capture-del-path off features.barrier disable features.barrier-timeout 120 features.trash off features.trash-dir .trashcan features.trash-eliminate-path (null) features.trash-max-filesize 5MB features.trash-internal-op off cluster.enable-shared-storage disable locks.trace off locks.mandatory-locking off cluster.disperse-self-heal-daemon enable cluster.quorum-reads no client.bind-insecure (null) features.timeout 45 features.failover-hosts (null) features.shard off features.shard-block-size 64MB features.scrub-throttle lazy features.scrub-freq biweekly features.scrub false features.expiry-time 120 features.cache-invalidation on features.cache-invalidation-timeout 600 features.leases off features.lease-lock-recall-timeout 60 disperse.background-heals 8 disperse.heal-wait-qlength 128 cluster.heal-timeout 600 dht.force-readdirp on disperse.read-policy gfid-hash cluster.shd-max-threads 1 cluster.shd-wait-qlength 1024 cluster.locking-scheme full cluster.granular-entry-heal no features.locks-revocation-secs 0 features.locks-revocation-clear-all false features.locks-revocation-max-blocked 0 features.locks-monkey-unlocking false features.locks-notify-contention no features.locks-notify-contention-delay 5 disperse.shd-max-threads 1 disperse.shd-wait-qlength 1024 disperse.cpu-extensions auto disperse.self-heal-window-size 1 cluster.use-compound-fops off performance.parallel-readdir on performance.rda-request-size 131072 performance.rda-low-wmark 4096 performance.rda-high-wmark 128KB performance.rda-cache-limit 40MB performance.nl-cache-positive-entry false performance.nl-cache-limit 10MB performance.nl-cache-timeout 60 cluster.brick-multiplex off cluster.max-bricks-per-process 0 disperse.optimistic-change-log on disperse.stripe-cache 4 cluster.halo-enabled False cluster.halo-shd-max-latency 99999 cluster.halo-nfsd-max-latency 5 cluster.halo-max-latency 5 cluster.halo-max-replicas 99999 cluster.halo-min-replicas 2 debug.delay-gen off delay-gen.delay-percentage 10% delay-gen.delay-duration 100000 delay-gen.enable disperse.parallel-writes on features.sdfs off features.cloudsync off features.utime off [root@K1 glusterfs]# gluster v info Volume Name: volume_01 Type: Distribute Volume ID: 140b35e4-c095-457f-8f15-0095a10ad83d Status: Started Snapshot Count: 0 Number of Bricks: 10 Transport-type: tcp Bricks: Brick1: testk1:/mnt/brick01/bk Brick2: testk1:/mnt/brick02/bk Brick3: testk1:/mnt/brick03/bk Brick4: testk1:/mnt/brick04/bk Brick5: testk1:/mnt/brick05/bk Brick6: testk1:/mnt/brick06/bk Brick7: testk1:/mnt/brick07/bk Brick8: testk1:/mnt/brick08/bk Brick9: testk1:/mnt/brick09/bk Brick10: testk1:/mnt/brick10/bk Options Reconfigured: performance.rda-cache-limit: 40MB performance.parallel-readdir: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on features.auto-commit-period: 7200 features.retention-mode: enterprise features.default-retention-period: 2147483647 features.worm-file-level: disable nfs.auth-cache-ttl-sec: 30 nfs.auth-refresh-interval-sec: 30 nfs.exports-auth-enable: off performance.nfs.write-behind: off performance.nl-cache: on performance.open-behind: off performance.quick-read: off performance.io-cache: off performance.read-ahead: off performance.write-behind: off server.event-threads: 8 client.event-threads: 8 performance.nfs.flush-behind: off performance.cache-invalidation: true performance.io-thread-count: 64 diagnostics.client-log-level: ERROR diagnostics.brick-log-level: ERROR diagnostics.count-fop-hits: on diagnostics.latency-measurement: on transport.address-family: inet nfs.disable: off [root@K1 glusterfs]# gluster v status Status of volume: volume_01 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick testk1:/mnt/brick01/bk 49152 0 Y 3223 Brick testk1:/mnt/brick02/bk 49153 0 Y 3253 Brick testk1:/mnt/brick03/bk 49154 0 Y 3283 Brick testk1:/mnt/brick04/bk 49155 0 Y 3313 Brick testk1:/mnt/brick05/bk 49156 0 Y 3343 Brick testk1:/mnt/brick06/bk 49157 0 Y 3570 Brick testk1:/mnt/brick07/bk 49158 0 Y 3600 Brick testk1:/mnt/brick08/bk 49159 0 Y 3630 Brick testk1:/mnt/brick09/bk 49160 0 Y 3660 Brick testk1:/mnt/brick10/bk 49161 0 Y 3690 NFS Server on localhost 2049 0 Y 3842 Task Status of Volume volume_01 ------------------------------------------------------------------------------ Task : Rebalance ID : 5afe22d8-9906-4a76-93f3-40b8c699cb34 Status : completed [root@K1 glusterfs]# gluster --version glusterfs 4.1.8 Repository revision: git://git.gluster.org/glusterfs.git Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/> GlusterFS comes with ABSOLUTELY NO WARRANTY. It is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation.
I'll take a look and get back to you.
Hi, Apologies for the delay but I finally managed to spend some time on this. Here is what I have so far: Release 4 is EOL so I tried with release-5. I used Fuse not NFS and could not reproduce the issue with rebalance - the contents of all directories were being migrated to the new bricks. I did however see an issue where I could not list the directories from the fuse mount immediately after they were created. This issue was not seen with parallel-readdir off. [root@rhgs313-7 ~]# glusterd; gluster v create test 192.168.122.7:/bricks/brick1/t-{1..5} ; gluster v set test readdir-ahead on; gluster v set test parallel-readdir on; gluster v start test; volume create: test: success: please start the volume to access data volume set: success volume set: success volume start: test: success [root@rhgs313-7 ~]# mount -t glusterfs -s 192.168.122.7:/test /mnt/fuse1 [root@rhgs313-7 ~]# cd /mnt/fuse1/; mkdir dir_1; mkdir dir_1/dir_2; mkdir dir_1/dir_2/dir_3; mkdir dir_1/dir_2/dir_3/dir_4 [root@rhgs313-7 fuse1]# ll total 0 On further analysis, this was happening because the stat information for the dirs received in dht_readdirp_cbk was invalid because of which dht will strip those entries out of the listing. This was fixed by https://review.gluster.org/#/c/glusterfs/+/21811/ and is available from release-6 onwards. It is possible that the same issue occurred on your volume so rebalance never processed these dirs. As the log-level as been set to ERROR, there are no messages in the rebalance log which can be used to figure out what happened. Please do the following: 1. Enable info level logging for client-log-level, reproduce the issue and send me the rebalance log. 2. Upgrade to release 6.x and see if you can still see the issue.
(In reply to Nithya Balachandran from comment #2) > Hi, > > Apologies for the delay but I finally managed to spend some time on this. > Here is what I have so far: > > Release 4 is EOL so I tried with release-5. Apologies - 4 is not EOL yet. I retried the test above with the latest release-4.1 code and could not reproduce the rebalance problem. Please send the logs requested earlier and I will look into it.
I'm closing this with WorksForMe. Please reopen if you still see this in the latest releases.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days