Bug 1243786
| Summary: | DHT-rebalance: IO hangs on Fuse mount while rebalance is happening with a brick down and multiple add-bricks | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Triveni Rao <trao> |
| Component: | distribute | Assignee: | Nithya Balachandran <nbalacha> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | storage-qa-internal <storage-qa-internal> |
| Severity: | urgent | Docs Contact: | |
| Priority: | high | ||
| Version: | rhgs-3.1 | CC: | nbalacha, rcyriac, sashinde, tdesala |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | Rebalance | ||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-07-22 04:51:23 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1243358 | ||
| Bug Blocks: | |||
This issue is not seen in gluster version 3.7.9-10. Here are the steps that were followed, 1. created distrep volume 2x2. 2. mount the volume with nfs protocol. 3. Started untarring kernel package on nfs mount. 4. started lookup on nfs mount using find <file_name> in loop. 5. Added new bricks continuously to the volume one after the other lets say;brick-1,brick-2,brick-3. 6. Brought down one of the bricks from replica pair by killing its process. 7. started rebalance force. 8. Rebalance was started and no issues were seen on infinite lookup and untarrring. Rebalance was completed successfully. Followed the same steps mounting the volume with fuse protocol and no issue is seen. |
Problem statement: ======================== DHT-rebalance: IO hangs on Fuse mount while rebalance is happening with a brick down and multiple add-bricks Procedure: =========== 1. create distrep volume 2x2 2. mount the same volume with fuse and nfs protocol. 3. Run untarring of kernel on fuse. 4. run lookup on nfs mount like find <file_name> in loop 5. start adding bricks continuously one after the other.like addbrick b1 b1 addbrick b2 b2 addbrick b3 b3 6. bring down one of the bricks from replica pair. 7. start rebalance force. 8. observe the rebalance status and lookup and untarring. Actual result: ==================== NFS lookup was happening- no issues Untarring hanged after some time of rebalance start. Expected result: ================= Io should continue without any issues. Output: =========== root@rhsqa14-vm1 ~]# gluster v status Status of volume: good Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.47.165:/rhs/brick1/g0 49152 0 Y 2720 Brick 10.70.47.163:/rhs/brick1/g0 49152 0 Y 545 Brick 10.70.47.165:/rhs/brick2/g0 49153 0 Y 2738 Brick 10.70.47.163:/rhs/brick2/g0 49153 0 Y 563 Snapshot Daemon on localhost 49154 0 Y 2919 NFS Server on localhost 2049 0 Y 2929 Self-heal Daemon on localhost N/A N/A Y 2765 Quota Daemon on localhost N/A N/A Y 2855 Snapshot Daemon on 10.70.47.163 49154 0 Y 711 NFS Server on 10.70.47.163 2049 0 Y 720 Self-heal Daemon on 10.70.47.163 N/A N/A Y 591 Quota Daemon on 10.70.47.163 N/A N/A Y 662 Task Status of Volume good ------------------------------------------------------------------------------ There are no active volume tasks [root@rhsqa14-vm1 ~]# [root@rhsqa14-vm1 ~]# gluster v add-brick good 10.70.47.165:/rhs/brick3/g0 10.70.47.163:/rhs/brick3/g0 volume add-brick: success [root@rhsqa14-vm1 ~]# gluster v add-brick good 10.70.47.165:/rhs/brick4/g0 10.70.47.163:/rhs/brick4/g0 volume add-brick: success [root@rhsqa14-vm1 ~]# gluster v add-brick good 10.70.47.165:/rhs/brick5/g0 10.70.47.163:/rhs/brick5/g0 volume add-brick: success [root@rhsqa14-vm1 ~]# gluster v add-brick good 10.70.47.165:/rhs/brick6/g0 10.70.47.163:/rhs/brick6/g0 volume add-brick: success [root@rhsqa14-vm1 ~]# gluster v info Volume Name: good Type: Distributed-Replicate Volume ID: f16949f1-9bf8-4926-9828-405c76a084da Status: Started Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: 10.70.47.165:/rhs/brick1/g0 Brick2: 10.70.47.163:/rhs/brick1/g0 Brick3: 10.70.47.165:/rhs/brick2/g0 Brick4: 10.70.47.163:/rhs/brick2/g0 Brick5: 10.70.47.165:/rhs/brick3/g0 Brick6: 10.70.47.163:/rhs/brick3/g0 Brick7: 10.70.47.165:/rhs/brick4/g0 Brick8: 10.70.47.163:/rhs/brick4/g0 Brick9: 10.70.47.165:/rhs/brick5/g0 Brick10: 10.70.47.163:/rhs/brick5/g0 Brick11: 10.70.47.165:/rhs/brick6/g0 Brick12: 10.70.47.163:/rhs/brick6/g0 Options Reconfigured: features.uss: enable features.quota-deem-statfs: on features.inode-quota: on features.quota: on cluster.min-free-disk: 10 performance.readdir-ahead: on [root@rhsqa14-vm1 ~]# gluster v status Status of volume: good Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.47.165:/rhs/brick1/g0 49152 0 Y 2720 Brick 10.70.47.163:/rhs/brick1/g0 49152 0 Y 545 Brick 10.70.47.165:/rhs/brick2/g0 49153 0 Y 2738 Brick 10.70.47.163:/rhs/brick2/g0 49153 0 Y 563 Brick 10.70.47.165:/rhs/brick3/g0 49155 0 Y 3441 Brick 10.70.47.163:/rhs/brick3/g0 49155 0 Y 1912 Brick 10.70.47.165:/rhs/brick4/g0 49156 0 Y 3577 Brick 10.70.47.163:/rhs/brick4/g0 49156 0 Y 2090 Brick 10.70.47.165:/rhs/brick5/g0 49157 0 Y 3689 Brick 10.70.47.163:/rhs/brick5/g0 49157 0 Y 2210 Brick 10.70.47.165:/rhs/brick6/g0 49158 0 Y 3803 Brick 10.70.47.163:/rhs/brick6/g0 49158 0 Y 2413 Snapshot Daemon on localhost 49154 0 Y 2919 NFS Server on localhost 2049 0 Y 3823 Self-heal Daemon on localhost N/A N/A Y 3831 Quota Daemon on localhost N/A N/A Y 3850 Snapshot Daemon on 10.70.47.163 49154 0 Y 711 NFS Server on 10.70.47.163 2049 0 Y 2437 Self-heal Daemon on 10.70.47.163 N/A N/A Y 2445 Quota Daemon on 10.70.47.163 N/A N/A Y 2455 Task Status of Volume good ------------------------------------------------------------------------------ There are no active volume tasks [root@rhsqa14-vm1 ~]# kill -9 2720 [root@rhsqa14-vm1 ~]# [root@rhsqa14-vm1 ~]# gluster v status Status of volume: good Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.47.165:/rhs/brick1/g0 N/A N/A N N/A Brick 10.70.47.163:/rhs/brick1/g0 49152 0 Y 545 Brick 10.70.47.165:/rhs/brick2/g0 49153 0 Y 2738 Brick 10.70.47.163:/rhs/brick2/g0 49153 0 Y 563 Brick 10.70.47.165:/rhs/brick3/g0 49155 0 Y 3441 Brick 10.70.47.163:/rhs/brick3/g0 49155 0 Y 1912 Brick 10.70.47.165:/rhs/brick4/g0 49156 0 Y 3577 Brick 10.70.47.163:/rhs/brick4/g0 49156 0 Y 2090 Brick 10.70.47.165:/rhs/brick5/g0 49157 0 Y 3689 Brick 10.70.47.163:/rhs/brick5/g0 49157 0 Y 2210 Brick 10.70.47.165:/rhs/brick6/g0 49158 0 Y 3803 Brick 10.70.47.163:/rhs/brick6/g0 49158 0 Y 2413 Snapshot Daemon on localhost 49154 0 Y 2919 NFS Server on localhost 2049 0 Y 3823 Self-heal Daemon on localhost N/A N/A Y 3831 Quota Daemon on localhost N/A N/A Y 3850 Snapshot Daemon on 10.70.47.163 49154 0 Y 711 NFS Server on 10.70.47.163 2049 0 Y 2437 Self-heal Daemon on 10.70.47.163 N/A N/A Y 2445 Quota Daemon on 10.70.47.163 N/A N/A Y 2455 Task Status of Volume good ------------------------------------------------------------------------------ There are no active volume tasks [root@rhsqa14-vm1 ~]# gluster v rebalance good start force volume rebalance: good: success: Rebalance on good has been started successfully. Use rebalance status command to check status of the rebalance process. ID: 1088d13b-aefb-4ea6-ba0f-87c339fec1bf [root@rhsqa14-vm1 ~]# gluster v rebalance good status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 27 644.2KB 109 0 0 in progress 8.00 10.70.47.163 0 0Bytes 113 0 60 in progress 8.00 volume rebalance: good: success: [root@rhsqa14-vm1 ~]# gluster v rebalance good status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 118 1.0MB 450 0 0 in progress 30.00 10.70.47.163 0 0Bytes 369 0 214 in progress 30.00 volume rebalance: good: success: [root@rhsqa14-vm1 ~]# gluster v rebalance good status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 293 2.7MB 846 0 0 in progress 74.00 10.70.47.163 0 0Bytes 610 0 374 in progress 74.00 volume rebalance: good: success: [root@rhsqa14-vm1 ~]# gluster v rebalance good status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 529 3.1MB 1409 0 0 in progress 132.00 10.70.47.163 0 0Bytes 1041 0 621 in progress 133.00 volume rebalance: good: success: [root@rhsqa14-vm1 ~]# gluster v rebalance good status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 793 3.5MB 2199 0 0 in progress 199.00 10.70.47.163 0 0Bytes 1572 0 905 in progress 199.00 volume rebalance: good: success: [root@rhsqa14-vm1 ~]# gluster v rebalance good status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 1520 8.5MB 5080 0 0 in progress 372.00 10.70.47.163 1 0Bytes 3079 0 1514 in progress 372.00 volume rebalance: good: success: [root@rhsqa14-vm1 ~]# [root@rhsqa14-vm1 ~]# gluster v rebalance good status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 1766 9.6MB 6166 0 0 in progress 420.00 10.70.47.163 1 0Bytes 3503 0 1692 in progress 420.00 volume rebalance: good: success: [root@rhsqa14-vm1 ~]# gluster v rebalance good status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 4043 23.5MB 24452 0 0 completed 905.00 10.70.47.163 1 0Bytes 7452 0 2221 completed 713.00 volume rebalance: good: success: [root@rhsqa14-vm1 ~]# linux-4.1-rc7/drivers/net/ethernet/intel/e1000/e1000_osdep.h linux-4.1-rc7/drivers/net/ethernet/intel/e1000/e1000_param.c linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/ linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/80003es2lan.c linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/80003es2lan.h linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/82571.c linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/82571.h linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/Makefile linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/defines.h linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/e1000.h linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/ethtool.c linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/hw.h linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/ich8lan.c linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/ich8lan.h linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/mac.c linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/mac.h linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/manage.c linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/manage.h linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/netdev.c linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/nvm.c linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/nvm.h linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/param.c linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/phy.c linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/phy.h linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/ptp.c linux-4.1-rc7/drivers/net/ethernet/intel/e1000e/regs.h linux-4.1-rc7/drivers/net/ethernet/intel/fm10k/ linux-4.1-rc7/drivers/net/ethernet/intel/fm10k/Makefile linux-4.1-rc7/drivers/net/ethernet/intel/fm10k/fm10k.h linux-4.1-rc7/drivers/net/ethernet/intel/fm10k/fm10k_common.c linux-4.1-rc7/drivers/net/ethernet/intel/fm10k/fm10k_common.h linux-4.1-rc7/drivers/net/ethernet/intel/fm10k/fm10k_dcbnl.c linux-4.1-rc7/drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c linux-4.1-rc7/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c linux-4.1-rc7/drivers/net/ethernet/intel/fm10k/fm10k_iov.c linux-4.1-rc7/drivers/net/ethernet/intel/fm10k/fm10k_main.c [root@rhsqa14-vm1 ~]# less /var/log/glusterfs/good-rebalance.log [2015-07-16 08:16:17.948316] I [MSGID: 100030] [glusterfsd.c:2301:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.1 (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/good --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *replicate*.readdir-failover=off --xlator-option *dht.readdir-optimize=on --xlator-option *dht.rebalance-cmd=5 --xlator-option *dht.node-uuid=c2746cfb-5132-4289-bb04-d118619630c1 --xlator-option *dht.commit-hash=2906342028 --socket-file /var/run/gluster/gluster-rebalance-f16949f1-9bf8-4926-9828-405c76a084da.sock --pid-file /var/lib/glusterd/vols/good/rebalance/c2746cfb-5132-4289-bb04-d118619630c1.pid -l /var/log/glusterfs/good-rebalance.log) [2015-07-16 08:16:17.977821] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-07-16 08:16:22.976641] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-dht: adding option 'commit-hash' for volume 'good-dht' with value '2906342028' [2015-07-16 08:16:22.976668] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-dht: adding option 'node-uuid' for volume 'good-dht' with value 'c2746cfb-5132-4289-bb04-d118619630c1' [2015-07-16 08:16:22.976686] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-dht: adding option 'rebalance-cmd' for volume 'good-dht' with value '5' [2015-07-16 08:16:22.976699] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-dht: adding option 'readdir-optimize' for volume 'good-dht' with value 'on' [2015-07-16 08:16:22.976714] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-dht: adding option 'assert-no-child-down' for volume 'good-dht' with value 'yes' [2015-07-16 08:16:22.976726] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-dht: adding option 'lookup-unhashed' for volume 'good-dht' with value 'yes' [2015-07-16 08:16:22.976737] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-dht: adding option 'use-readdirp' for volume 'good-dht' with value 'yes' [2015-07-16 08:16:22.976752] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-5: adding option 'readdir-failover' for volume 'good-replicate-5' with value 'off' [2015-07-16 08:16:22.976774] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-5: adding option 'entry-self-heal' for volume 'good-replicate-5' with value 'off' [2015-07-16 08:16:22.976788] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-5: adding option 'metadata-self-heal' for volume 'good-replicate-5' with value 'off' [2015-07-16 08:16:22.976800] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-5: adding option 'data-self-heal' for volume 'good-replicate-5' with value 'off' [2015-07-16 08:16:22.976814] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-4: adding option 'readdir-failover' for volume 'good-replicate-4' with value 'off' [2015-07-16 08:16:22.976826] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-4: adding option 'entry-self-heal' for volume 'good-replicate-4' with value 'off' [2015-07-16 08:16:22.976837] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-4: adding option 'metadata-self-heal' for volume 'good-replicate-4' with value 'off' [2015-07-16 08:16:22.976849] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-4: adding option 'data-self-heal' for volume 'good-replicate-4' with value 'off' [2015-07-16 08:16:22.976863] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-3: adding option 'readdir-failover' for volume 'good-replicate-3' with value 'off' [2015-07-16 08:16:22.976875] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-3: adding option 'entry-self-heal' for volume 'good-replicate-3' with value 'off' [2015-07-16 08:16:22.976893] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-3: adding option 'metadata-self-heal' for volume 'good-replicate-3' with value 'off' [2015-07-16 08:16:22.976905] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-3: adding option 'data-self-heal' for volume 'good-replicate-3' with value 'off' [2015-07-16 08:16:22.976919] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-2: adding option 'readdir-failover' for volume 'good-replicate-2' with value 'off' [2015-07-16 08:16:22.976931] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-2: adding option 'entry-self-heal' for volume 'good-replicate-2' with value 'off' [2015-07-16 08:16:22.976943] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-2: adding option 'metadata-self-heal' for volume 'good-replicate-2' with value 'off' [2015-07-16 08:16:22.976954] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-2: adding option 'data-self-heal' for volume 'good-replicate-2' with value 'off' [2015-07-16 08:16:22.976968] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-1: adding option 'readdir-failover' for volume 'good-replicate-1' with value 'off' [2015-07-16 08:16:22.976980] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-1: adding option 'entry-self-heal' for volume 'good-replicate-1' with value 'off' [2015-07-16 08:16:22.976992] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-1: adding option 'metadata-self-heal' for volume 'good-replicate-1' with value 'off' [2015-07-16 08:16:22.977004] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-1: adding option 'data-self-heal' for volume 'good-replicate-1' with value 'off' [2015-07-16 08:16:22.977018] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-0: adding option 'readdir-failover' for volume 'good-replicate-0' with value 'off' [2015-07-16 08:16:22.977029] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-0: adding option 'entry-self-heal' for volume 'good-replicate-0' with value 'off' [2015-07-16 08:16:22.977041] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-0: adding option 'metadata-self-heal' for volume 'good-replicate-0' with value 'off' [2015-07-16 08:16:22.977053] I [MSGID: 101173] [graph.c:271:gf_add_cmdline_options] 0-good-replicate-0: adding option 'data-self-heal' for volume 'good-replicate-0' with value 'off' [2015-07-16 08:16:22.977826] I [MSGID: 109104] [dht-shared.c:607:dht_init] 0-good-dht: dht_init using commit hash 2906342028 [2015-07-16 08:16:23.002575] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2015-07-16 08:16:23.008072] W [MSGID: 101174] [graph.c:362:_log_if_unknown_option] 0-good-replicate-5: option 'readdir-failover' is not recognized [2015-07-16 08:16:23.008117] W [MSGID: 101174] [graph.c:362:_log_if_unknown_option] 0-good-replicate-4: option 'readdir-failover' is not recognized [2015-07-16 08:16:23.008157] W [MSGID: 101174] [graph.c:362:_log_if_unknown_option] 0-good-replicate-3: option 'readdir-failover' is not recognized [2015-07-16 08:16:23.008186] W [MSGID: 101174] [graph.c:362:_log_if_unknown_option] 0-good-replicate-2: option 'readdir-failover' is not recognized [2015-07-16 08:16:23.008213] W [MSGID: 101174] [graph.c:362:_log_if_unknown_option] 0-good-replicate-1: option 'readdir-failover' is not recognized [2015-07-16 08:16:23.008239] W [MSGID: 101174] [graph.c:362:_log_if_unknown_option] 0-good-replicate-0: option 'readdir-failover' is not recognized [2015-07-16 08:16:23.008386] I [MSGID: 114020] [client.c:2118:notify] 0-good-client-0: parent translators are ready, attempting connect on transport [2015-07-16 08:16:23.012139] I [MSGID: 114020] [client.c:2118:notify] 0-good-client-1: parent translators are ready, attempting connect on transport [2015-07-16 08:16:23.012655] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-good-client-0: changing port to 49152 (from 0) [2015-07-16 08:16:23.015625] I [MSGID: 114020] [client.c:2118:notify] 0-good-client-2: parent translators are ready, attempting connect on transport [2015-07-16 08:16:23.019025] E [socket.c:2332:socket_connect_finish] 0-good-client-0: connection to 10.70.47.165:49152 failed (Connection refused) [2015-07-16 08:16:23.022314] I [MSGID: 114020] [client.c:2118:notify] 0-good-client-3: parent translators are ready, attempting connect on transport [2015-07-16 08:16:23.022574] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-good-client-2: changing port to 49153 (from 0) [2015-07-16 08:16:23.025647] I [MSGID: 114020] [client.c:2118:notify] 0-good-client-4: parent translators are ready, attempting connect on transport [2015-07-16 08:16:23.029788] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-good-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330) ...skipping... 1 [2015-07-16 08:31:34.094943] I [MSGID: 109018] [dht-common.c:814:dht_revalidate_cbk] 0-good-dht: Mismatching layouts for /linux-4.1-rc7/drivers/net/ethernet/intel/e1000e, gfid = 7fe6d32f-c742-4447-ab25-70a7fa89f72a [2015-07-16 08:31:34.096168] I [MSGID: 109064] [dht-layout.c:823:dht_layout_dir_mismatch] 0-good-dht: subvol: good-replicate-4; inode layout - 3400826014 - 4294967295 - 1; disk layout - 3578999630 - 4294967295 - 1 [2015-07-16 08:31:34.096259] I [MSGID: 109018] [dht-common.c:814:dht_revalidate_cbk] 0-good-dht: Mismatching layouts for /linux-4.1-rc7/drivers/net/ethernet/intel/e1000e, gfid = 7fe6d32f-c742-4447-ab25-70a7fa89f72a [2015-07-16 08:31:34.096373] I [MSGID: 109064] [dht-layout.c:823:dht_layout_dir_mismatch] 0-good-dht: subvol: good-replicate-3; inode layout - 2506928916 - 3400826013 - 1; disk layout - 2863199704 - 3578999629 - 1 [2015-07-16 08:31:34.096407] I [MSGID: 109018] [dht-common.c:814:dht_revalidate_cbk] 0-good-dht: Mismatching layouts for /linux-4.1-rc7/drivers/net/ethernet/intel/e1000e, gfid = 7fe6d32f-c742-4447-ab25-70a7fa89f72a [2015-07-16 08:31:34.096837] I [MSGID: 109064] [dht-layout.c:823:dht_layout_dir_mismatch] 0-good-dht: subvol: good-replicate-5; inode layout - 0 - 893897097 - 1; disk layout - 0 - 715799925 - 1 [2015-07-16 08:31:34.096879] I [MSGID: 109018] [dht-common.c:814:dht_revalidate_cbk] 0-good-dht: Mismatching layouts for /linux-4.1-rc7/drivers/net/ethernet/intel/e1000e, gfid = 7fe6d32f-c742-4447-ab25-70a7fa89f72a [2015-07-16 08:31:34.103251] I [dht-rebalance.c:2119:gf_defrag_process_dir] 0-good-dht: migrate data called on /linux-4.1-rc7/drivers/net/ethernet/intel/e1000e [2015-07-16 08:31:34.207424] I [dht-rebalance.c:2328:gf_defrag_process_dir] 0-good-dht: Migration operation on dir /linux-4.1-rc7/drivers/net/ethernet/intel/e1000e took 0.10 secs [2015-07-16 08:31:34.250153] I [MSGID: 109081] [dht-common.c:3783:dht_setxattr] 0-good-dht: fixing the layout of /linux-4.1-rc7/drivers/net/ethernet/intel/fm10k [2015-07-16 08:31:34.250204] I [MSGID: 109045] [dht-selfheal.c:1496:dht_fix_layout_of_directory] 0-good-dht: subvolume 0 (good-replicate-0): 20480 chunks [2015-07-16 08:31:34.250284] I [MSGID: 109045] [dht-selfheal.c:1496:dht_fix_layout_of_directory] 0-good-dht: subvolume 1 (good-replicate-1): 20480 chunks [2015-07-16 08:31:34.250308] I [MSGID: 109045] [dht-selfheal.c:1496:dht_fix_layout_of_directory] 0-good-dht: subvolume 2 (good-replicate-2): 50914 chunks [2015-07-16 08:31:34.250331] I [MSGID: 109045] [dht-selfheal.c:1496:dht_fix_layout_of_directory] 0-good-dht: subvolume 3 (good-replicate-3): 50914 chunks [2015-07-16 08:31:34.250364] I [MSGID: 109045] [dht-selfheal.c:1496:dht_fix_layout_of_directory] 0-good-dht: subvolume 4 (good-replicate-4): 50914 chunks [2015-07-16 08:31:34.250387] I [MSGID: 109045] [dht-selfheal.c:1496:dht_fix_layout_of_directory] 0-good-dht: subvolume 5 (good-replicate-5): 50914 chunks [2015-07-16 08:31:34.266859] I [MSGID: 109064] [dht-layout.c:823:dht_layout_dir_mismatch] 0-good-dht: subvol: good-replicate-0; inode layout - 2681691294 - 3041258653 - 1; disk layout - 2147399778 - 2863199703 - 1 [2015-07-16 08:31:34.266907] I [MSGID: 109018] [dht-common.c:814:dht_revalidate_cbk] 0-good-dht: Mismatching layouts for /linux-4.1-rc7/drivers/net/ethernet/intel/fm10k, gfid = 34abab2b-80d2-45e3-a293-d52f65ad4e5f [2015-07-16 08:31:34.268559] I [MSGID: 109064] [dht-layout.c:823:dht_layout_dir_mismatch] 0-good-dht: subvol: good-replicate-2; inode layout - 3400826014 - 4294967295 - 1; disk layout - 3578999630 - 4294967295 - 1 [2015-07-16 08:31:34.268604] I [MSGID: 109018] [dht-common.c:814:dht_revalidate_cbk] 0-good-dht: Mismatching layouts for /linux-4.1-rc7/drivers/net/ethernet/intel/fm10k, gfid = 34abab2b-80d2-45e3-a293-d52f65ad4e5f [2015-07-16 08:31:34.268733] I [MSGID: 109064] [dht-layout.c:823:dht_layout_dir_mismatch] 0-good-dht: subvol: good-replicate-1; inode layout - 3041258654 - 3400826013 - 1; disk layout - 2863199704 - 3578999629 - 1 [2015-07-16 08:31:34.268766] I [MSGID: 109018] [dht-common.c:814:dht_revalidate_cbk] 0-good-dht: Mismatching layouts for /linux-4.1-rc7/drivers/net/ethernet/intel/fm10k, gfid = 34abab2b-80d2-45e3-a293-d52f65ad4e5f [2015-07-16 08:31:34.268860] I [MSGID: 109064] [dht-layout.c:823:dht_layout_dir_mismatch] 0-good-dht: subvol: good-replicate-4; inode layout - 893897098 - 1787794195 - 1; disk layout - 715799926 - 1431599851 - 1 [2015-07-16 08:31:34.268891] I [MSGID: 109018] [dht-common.c:814:dht_revalidate_cbk] 0-good-dht: Mismatching layouts for /linux-4.1-rc7/drivers/net/ethernet/intel/fm10k, gfid = 34abab2b-80d2-45e3-a293-d52f65ad4e5f [2015-07-16 08:31:34.269633] I [MSGID: 109064] [dht-layout.c:823:dht_layout_dir_mismatch] 0-good-dht: subvol: good-replicate-3; inode layout - 0 - 893897097 - 1; disk layout - 0 - 715799925 - 1 [2015-07-16 08:31:34.269688] I [MSGID: 109018] [dht-common.c:814:dht_revalidate_cbk] 0-good-dht: Mismatching layouts for /linux-4.1-rc7/drivers/net/ethernet/intel/fm10k, gfid = 34abab2b-80d2-45e3-a293-d52f65ad4e5f [2015-07-16 08:31:34.269851] I [MSGID: 109064] [dht-layout.c:823:dht_layout_dir_mismatch] 0-good-dht: subvol: good-replicate-5; inode layout - 1787794196 - 2681691293 - 1; disk layout - 1431599852 - 2147399777 - 1 [2015-07-16 08:31:34.269896] I [MSGID: 109018] [dht-common.c:814:dht_revalidate_cbk] 0-good-dht: Mismatching layouts for /linux-4.1-rc7/drivers/net/ethernet/intel/fm10k, gfid = 34abab2b-80d2-45e3-a293-d52f65ad4e5f [2015-07-16 08:31:34.274803] I [dht-rebalance.c:2119:gf_defrag_process_dir] 0-good-dht: migrate data called on /linux-4.1-rc7/drivers/net/ethernet/intel/fm10k [2015-07-16 08:31:34.319042] I [dht-rebalance.c:2328:gf_defrag_process_dir] 0-good-dht: Migration operation on dir /linux-4.1-rc7/drivers/net/ethernet/intel/fm10k took 0.04 secs [2015-07-16 08:31:34.416991] I [dht-rebalance.c:2809:gf_defrag_start_crawl] 0-DHT: crawling file-system completed [2015-07-16 08:31:34.417240] I [dht-rebalance.c:1764:gf_defrag_task] 0-DHT: Thread wokeup. defrag->current_thread_count: 3 [2015-07-16 08:31:34.417714] I [dht-rebalance.c:1764:gf_defrag_task] 0-DHT: Thread wokeup. defrag->current_thread_count: 4 [2015-07-16 08:31:34.417927] I [MSGID: 109028] [dht-rebalance.c:3029:gf_defrag_status_get] 0-good-dht: Rebalance is completed. Time taken is 905.00 secs [2015-07-16 08:31:34.417968] I [MSGID: 109028] [dht-rebalance.c:3033:gf_defrag_status_get] 0-good-dht: Files migrated: 4043, size: 24647589, lookups: 24452, failures: 0, skipped: 0 [2015-07-16 08:31:34.418667] W [glusterfsd.c:1219:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x3fbcc07a51) [0x7f92bcd37a51] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7f92be116075] -->/usr/sbin/glusterfs(cleanup_and_exit+0x71) [0x7f92be115ba1] ) 0-: received signum (15), shutting down (gdb) info threads 9 Thread 0x7ffb87423700 (LWP 2440) 0x00007ffb8f200fbd in nanosleep () from /lib64/libpthread.so.0 8 Thread 0x7ffb86a22700 (LWP 2441) 0x00007ffb8f201535 in sigwait () from /lib64/libpthread.so.0 7 Thread 0x7ffb86021700 (LWP 2442) 0x00007ffb8f1fda0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 6 Thread 0x7ffb85620700 (LWP 2443) 0x00007ffb8eb730ce in __lll_lock_wait_private () from /lib64/libc.so.6 5 Thread 0x7ffb833ed700 (LWP 2445) 0x00007ffb8eb730ce in __lll_lock_wait_private () from /lib64/libc.so.6 4 Thread 0x7ffb80e48700 (LWP 2446) 0x00007ffb8eb63f63 in epoll_wait () from /lib64/libc.so.6 3 Thread 0x7ffb7aaa3700 (LWP 2449) 0x00007ffb8eb5b9b7 in readv () from /lib64/libc.so.6 2 Thread 0x7ffb7a0a2700 (LWP 2450) 0x00007ffb8f2007dd in read () from /lib64/libpthread.so.0 * 1 Thread 0x7ffb905bc740 (LWP 2439) 0x00007ffb8f1fa2ad in pthread_join () from /lib64/libpthread.so.0 (gdb) t 6 [Switching to thread 6 (Thread 0x7ffb85620700 (LWP 2443))]#0 0x00007ffb8eb730ce in __lll_lock_wait_private () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffb8eb730ce in __lll_lock_wait_private () from /lib64/libc.so.6 #1 0x00007ffb8eaf7f7e in _L_lock_5746 () from /lib64/libc.so.6 #2 0x00007ffb8eaf3a8b in _int_free () from /lib64/libc.so.6 #3 0x00007ffb9015a578 in synctask_destroy () from /usr/lib64/libglusterfs.so.0 #4 0x00007ffb9015aa90 in syncenv_processor () from /usr/lib64/libglusterfs.so.0 #5 0x00007ffb8f1f9a51 in start_thread () from /lib64/libpthread.so.0 #6 0x00007ffb8eb6396d in clone () from /lib64/libc.so.6 (gdb) t 7 [Switching to thread 7 (Thread 0x7ffb86021700 (LWP 2442))]#0 0x00007ffb8f1fda0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007ffb8f1fda0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007ffb90155c8b in syncenv_task () from /usr/lib64/libglusterfs.so.0 #2 0x00007ffb9015aaa0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0 #3 0x00007ffb8f1f9a51 in start_thread () from /lib64/libpthread.so.0 #4 0x00007ffb8eb6396d in clone () from /lib64/libc.so.6 (gdb) t 5 [Switching to thread 5 (Thread 0x7ffb833ed700 (LWP 2445))]#0 0x00007ffb8eb730ce in __lll_lock_wait_private () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffb8eb730ce in __lll_lock_wait_private () from /lib64/libc.so.6 #1 0x00007ffb8eaf829f in _L_lock_9730 () from /lib64/libc.so.6 #2 0x00007ffb8eaf588b in calloc () from /lib64/libc.so.6 #3 0x00007ffb9015669e in ?? () from /usr/lib64/libglusterfs.so.0 #4 0x00007ffb90156cc8 in synctask_create () from /usr/lib64/libglusterfs.so.0 #5 0x00007ffb9015a749 in synctask_new1 () from /usr/lib64/libglusterfs.so.0 #6 0x00007ffb8258ee32 in afr_inode_refresh_done () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/replicate.so #7 0x00007ffb8258f901 in afr_inode_refresh_subvol_cbk () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/replicate.so #8 0x00007ffb827cd0c7 in client3_3_lookup_cbk () from /usr/lib64/glusterfs/3.7.1/xlator/protocol/client.so #9 0x00007ffb8fedd445 in rpc_clnt_handle_reply () from /usr/lib64/libgfrpc.so.0 #10 0x00007ffb8fede8f2 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0 #11 0x00007ffb8fed9ad8 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0 #12 0x00007ffb84a14255 in ?? () from /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so #13 0x00007ffb84a15e2d in ?? () from /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so #14 0x00007ffb901729b0 in ?? () from /usr/lib64/libglusterfs.so.0 #15 0x00007ffb8f1f9a51 in start_thread () from /lib64/libpthread.so.0 #16 0x00007ffb8eb6396d in clone () from /lib64/libc.so.6 (gdb)