+++ This bug was initially created as a clone of Bug #1254137 +++ Description of problem: After adding a node to a 2-node setup, I performed a `gluster volume rebalance live fix-layout start`, and after some time, the fix-layout fails with the following error. How reproducible: We tried 3-4 times and it happened everytime. Additional info: Log entries from the 3 nodes: stor106: [2015-08-13 15:49:26.557880] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'commit-hash' for volume 'live-dht' with value '2925913100' stor106: [2015-08-13 15:49:26.557908] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'node-uuid' for volume 'live-dht' with value '97952e77-bf0a-4227-a529-2d1ae787ec3e' stor106: [2015-08-13 15:49:26.557924] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'rebalance-cmd' for volume 'live-dht' with value '4' stor106: [2015-08-13 15:49:26.557939] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'readdir-optimize' for volume 'live-dht' with value 'on' stor106: [2015-08-13 15:49:26.557955] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'assert-no-child-down' for volume 'live-dht' with value 'yes' stor106: [2015-08-13 15:49:26.557969] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'lookup-unhashed' for volume 'live-dht' with value 'yes' stor106: [2015-08-13 15:49:26.557983] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'use-readdirp' for volume 'live-dht' with value 'yes' stor106: [2015-08-13 21:36:05.535730] C [rpc-clnt-ping.c:161:rpc_clnt_ping_timer_expired] 0-live-client-4: server 192.168.123.106:49156 has not responded in the last 42 seconds, disconnecting. stor106: [2015-08-13 21:36:05.536519] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7f69b957b9e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f69b93469be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f69b9346ace] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7f69b934847c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f69b9348c38] ))))) 0-live-client-4: forced unwinding frame type(GlusterFS 3.3) op(READDIRP(40)) called at 2015-08-13 21:35:06.999805 (xid=0x245a8f) stor106: [2015-08-13 21:36:05.536943] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7f69b957b9e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f69b93469be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f69b9346ace] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7f69b934847c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f69b9348c38] ))))) 0-live-client-4: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-08-13 21:35:23.532921 (xid=0x245a90) stor106: [2015-08-13 21:36:05.864351] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/databases/marker_dil/rnpB/0.3-B1-thermo stor106: [2015-08-13 21:36:05.864594] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/databases/marker_dil/rnpB stor106: [2015-08-13 21:36:05.864783] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/databases/marker_dil stor106: [2015-08-13 21:36:05.865190] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/databases stor106: [2015-08-13 21:36:05.865351] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology stor106: [2015-08-13 21:36:05.865674] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia stor106: [2015-08-13 21:36:05.865854] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb stor106: [2015-08-14 09:38:17.498230] I [MSGID: 100030] [glusterfsd.c:2301:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.3 (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/live --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on --xlator-option *dht.rebalance-cmd=4 --xlator-option *dht.node-uuid=97952e77-bf0a-4227-a529-2d1ae787ec3e --xlator-option *dht.commit-hash=2926426186 --socket-file /var/run/gluster/gluster-rebalance-1328637d-7730-4627-8945-bbe43626d527.sock --pid-file /var/lib/glusterd/vols/live/rebalance/97952e77-bf0a-4227-a529-2d1ae787ec3e.pid -l /var/log/glusterfs/live-rebalance.log) stor106: [2015-08-14 09:38:17.519827] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 stor106: [2015-08-14 09:38:22.493300] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'commit-hash' for volume 'live-dht' with value '2926426186' stor106: [2015-08-14 09:38:22.493331] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'node-uuid' for volume 'live-dht' with value '97952e77-bf0a-4227-a529-2d1ae787ec3e' stor106: [2015-08-14 09:38:22.493350] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'rebalance-cmd' for volume 'live-dht' with value '4' stor106: [2015-08-14 09:38:22.493367] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'readdir-optimize' for volume 'live-dht' with value 'on' stor106: [2015-08-14 09:38:22.493384] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'assert-no-child-down' for volume 'live-dht' with value 'yes' stor106: [2015-08-14 09:38:22.493400] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'lookup-unhashed' for volume 'live-dht' with value 'yes' stor106: [2015-08-14 09:38:22.493415] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'use-readdirp' for volume 'live-dht' with value 'yes' stor106: [2015-08-14 09:42:47.621544] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs/OperaArchiveCol/CT 20131204_Fluo4 1 Well other Fields/Meas_01(2013-12-04_20-16-03) stor106: [2015-08-14 09:42:47.621710] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs/OperaArchiveCol/CT 20131204_Fluo4 1 Well other Fields stor106: [2015-08-14 09:42:47.622220] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs/OperaArchiveCol stor106: [2015-08-14 09:42:47.622341] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs stor106: [2015-08-14 09:42:47.622457] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs stor106: [2015-08-14 12:11:27.121727] I [MSGID: 100030] [glusterfsd.c:2301:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.3 (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/live --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on --xlator-option *dht.rebalance-cmd=4 --xlator-option *dht.node-uuid=97952e77-bf0a-4227-a529-2d1ae787ec3e --xlator-option *dht.commit-hash=2926499706 --socket-file /var/run/gluster/gluster-rebalance-1328637d-7730-4627-8945-bbe43626d527.sock --pid-file /var/lib/glusterd/vols/live/rebalance/97952e77-bf0a-4227-a529-2d1ae787ec3e.pid -l /var/log/glusterfs/live-rebalance.log) stor106: [2015-08-14 12:11:27.143975] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 stor106: [2015-08-14 12:11:32.122635] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'commit-hash' for volume 'live-dht' with value '2926499706' stor106: [2015-08-14 12:11:32.129824] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'node-uuid' for volume 'live-dht' with value '97952e77-bf0a-4227-a529-2d1ae787ec3e' stor106: [2015-08-14 12:11:32.129868] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'rebalance-cmd' for volume 'live-dht' with value '4' stor106: [2015-08-14 12:11:32.129889] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'readdir-optimize' for volume 'live-dht' with value 'on' stor106: [2015-08-14 12:11:32.129909] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'assert-no-child-down' for volume 'live-dht' with value 'yes' stor106: [2015-08-14 12:11:32.129926] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'lookup-unhashed' for volume 'live-dht' with value 'yes' stor106: [2015-08-14 12:11:32.129942] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'use-readdirp' for volume 'live-dht' with value 'yes' stor104: [2015-08-13 15:49:26.561286] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'lookup-unhashed' for volume 'live-dht' with value 'yes' stor104: [2015-08-13 15:49:26.561301] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'use-readdirp' for volume 'live-dht' with value 'yes' stor104: [2015-08-13 20:22:32.106903] C [rpc-clnt-ping.c:161:rpc_clnt_ping_timer_expired] 0-live-client-5: server 192.168.123.106:49157 has not responded in the last 42 seconds, disconnecting. stor104: [2015-08-13 20:22:32.107488] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7fede0b249e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fede08ef9be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fede08eface] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7fede08f147c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7fede08f1c38] ))))) 0-live-client-5: forced unwinding frame type(GlusterFS 3.3) op(READDIRP(40)) called at 2015-08-13 20:21:19.267201 (xid=0x1a21ae) stor104: [2015-08-13 20:22:32.108236] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7fede0b249e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fede08ef9be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fede08eface] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7fede08f147c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7fede08f1c38] ))))) 0-live-client-5: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-08-13 20:21:50.102656 (xid=0x1a21af) stor104: [2015-08-13 20:22:32.111687] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/IMDyn/metaG/Analysis/mOTUs/withHuman/P1_2/motu.profiles/mOTU.v1.padded/reads.processed.solexaqa.solexaqa.allbest.l45.p97 stor104: [2015-08-13 20:22:32.112024] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/IMDyn/metaG/Analysis/mOTUs/withHuman/P1_2/motu.profiles/mOTU.v1.padded stor104: [2015-08-13 20:22:32.112365] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/IMDyn/metaG/Analysis/mOTUs/withHuman/P1_2/motu.profiles stor104: [2015-08-13 20:22:32.112735] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/IMDyn/metaG/Analysis/mOTUs/withHuman/P1_2 stor104: [2015-08-13 20:22:32.113071] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/IMDyn/metaG/Analysis/mOTUs/withHuman stor104: [2015-08-13 20:22:32.113405] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/IMDyn/metaG/Analysis/mOTUs stor104: [2015-08-13 20:22:32.113764] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/IMDyn/metaG/Analysis stor104: [2015-08-13 20:22:32.114065] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/IMDyn/metaG stor104: [2015-08-13 20:22:32.114386] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/IMDyn stor104: [2015-08-13 20:22:32.114731] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology stor104: [2015-08-13 20:22:32.115051] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia stor104: [2015-08-13 20:22:32.115433] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb stor104: [2015-08-14 09:38:17.500715] I [MSGID: 100030] [glusterfsd.c:2301:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.3 (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/live --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on --xlator-option *dht.rebalance-cmd=4 --xlator-option *dht.node-uuid=305c0f00-0f11-4da3-a470-50b6e6c14976 --xlator-option *dht.commit-hash=2926426186 --socket-file /var/run/gluster/gluster-rebalance-1328637d-7730-4627-8945-bbe43626d527.sock --pid-file /var/lib/glusterd/vols/live/rebalance/305c0f00-0f11-4da3-a470-50b6e6c14976.pid -l /var/log/glusterfs/live-rebalance.log) stor104: [2015-08-14 09:38:17.514523] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 stor104: [2015-08-14 09:38:22.498426] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'commit-hash' for volume 'live-dht' with value '2926426186' stor104: [2015-08-14 09:38:22.498446] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'node-uuid' for volume 'live-dht' with value '305c0f00-0f11-4da3-a470-50b6e6c14976' stor104: [2015-08-14 09:38:22.498456] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'rebalance-cmd' for volume 'live-dht' with value '4' stor104: [2015-08-14 09:38:22.498465] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'readdir-optimize' for volume 'live-dht' with value 'on' stor104: [2015-08-14 09:38:22.498475] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'assert-no-child-down' for volume 'live-dht' with value 'yes' stor104: [2015-08-14 09:38:22.498483] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'lookup-unhashed' for volume 'live-dht' with value 'yes' stor104: [2015-08-14 09:38:22.498492] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'use-readdirp' for volume 'live-dht' with value 'yes' stor104: [2015-08-14 09:42:47.642919] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs/OperaArchiveCol/CT 20131204_Fluo4 1 Well other Fields/Meas_01(2013-12-04_20-16-03) stor104: [2015-08-14 09:42:47.643189] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs/OperaArchiveCol/CT 20131204_Fluo4 1 Well other Fields stor104: [2015-08-14 09:42:47.644244] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs/OperaArchiveCol stor104: [2015-08-14 09:42:47.644404] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs stor104: [2015-08-14 09:42:47.644549] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs stor104: [2015-08-14 12:11:27.130086] I [MSGID: 100030] [glusterfsd.c:2301:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.3 (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/live --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on --xlator-option *dht.rebalance-cmd=4 --xlator-option *dht.node-uuid=305c0f00-0f11-4da3-a470-50b6e6c14976 --xlator-option *dht.commit-hash=2926499706 --socket-file /var/run/gluster/gluster-rebalance-1328637d-7730-4627-8945-bbe43626d527.sock --pid-file /var/lib/glusterd/vols/live/rebalance/305c0f00-0f11-4da3-a470-50b6e6c14976.pid -l /var/log/glusterfs/live-rebalance.log) stor104: [2015-08-14 12:11:27.152078] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 stor104: [2015-08-14 12:11:32.145533] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'commit-hash' for volume 'live-dht' with value '2926499706' stor104: [2015-08-14 12:11:32.145593] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'node-uuid' for volume 'live-dht' with value '305c0f00-0f11-4da3-a470-50b6e6c14976' stor104: [2015-08-14 12:11:32.145612] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'rebalance-cmd' for volume 'live-dht' with value '4' stor104: [2015-08-14 12:11:32.145629] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'readdir-optimize' for volume 'live-dht' with value 'on' stor104: [2015-08-14 12:11:32.145646] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'assert-no-child-down' for volume 'live-dht' with value 'yes' stor104: [2015-08-14 12:11:32.145661] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'lookup-unhashed' for volume 'live-dht' with value 'yes' stor104: [2015-08-14 12:11:32.145675] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'use-readdirp' for volume 'live-dht' with value 'yes' stor105: [2015-08-14 01:18:06.680271] C [rpc-clnt-ping.c:161:rpc_clnt_ping_timer_expired] 0-live-client-5: server 192.168.123.106:49157 has not responded in the last 42 seconds, disconnecting. stor105: [2015-08-14 01:18:06.680939] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7f8549c409e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f8549a0b9be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f8549a0bace] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7f8549a0d47c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f8549a0dc38] ))))) 0-live-client-5: forced unwinding frame type(GlusterFS 3.3) op(READDIRP(40)) called at 2015-08-14 01:17:17.396967 (xid=0x3b966a) stor105: [2015-08-14 01:18:06.681340] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7f8549c409e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f8549a0b9be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f8549a0bace] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7f8549a0d47c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f8549a0dc38] ))))) 0-live-client-5: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-08-14 01:17:24.676542 (xid=0x3b966b) stor105: [2015-08-14 01:18:06.684419] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia/2014/11/10/10:32/access-gaia.uni.lu:/work/projects/ecosystem_biology/local_tools/specI_v1.0/sequences stor105: [2015-08-14 01:18:06.684710] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia/2014/11/10/10:32/access-gaia.uni.lu:/work/projects/ecosystem_biology/local_tools/specI_v1.0 stor105: [2015-08-14 01:18:06.685996] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia/2014/11/10/10:32/access-gaia.uni.lu:/work/projects/ecosystem_biology/local_tools stor105: [2015-08-14 01:18:06.686349] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia/2014/11/10/10:32/access-gaia.uni.lu:/work/projects/ecosystem_biology stor105: [2015-08-14 01:18:06.686602] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia/2014/11/10/10:32/access-gaia.uni.lu:/work/projects stor105: [2015-08-14 01:18:06.686946] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia/2014/11/10/10:32/access-gaia.uni.lu:/work stor105: [2015-08-14 01:18:06.687147] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia/2014/11/10/10:32/access-gaia.uni.lu: stor105: [2015-08-14 01:18:06.687372] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia/2014/11/10/10:32 stor105: [2015-08-14 01:18:06.687619] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia/2014/11/10 stor105: [2015-08-14 01:18:06.687848] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia/2014/11 stor105: [2015-08-14 01:18:06.688077] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia/2014 stor105: [2015-08-14 01:18:06.688290] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia stor105: [2015-08-14 01:18:06.688505] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup stor105: [2015-08-14 01:18:06.688752] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb stor105: [2015-08-14 09:38:17.485415] I [MSGID: 100030] [glusterfsd.c:2301:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.3 (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/live --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on --xlator-option *dht.rebalance-cmd=4 --xlator-option *dht.node-uuid=b80cca53-bfd7-442a-a0a2-c846c978ea30 --xlator-option *dht.commit-hash=2926426186 --socket-file /var/run/gluster/gluster-rebalance-1328637d-7730-4627-8945-bbe43626d527.sock --pid-file /var/lib/glusterd/vols/live/rebalance/b80cca53-bfd7-442a-a0a2-c846c978ea30.pid -l /var/log/glusterfs/live-rebalance.log) stor105: [2015-08-14 09:38:17.507689] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 stor105: [2015-08-14 09:38:22.480316] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'commit-hash' for volume 'live-dht' with value '2926426186' stor105: [2015-08-14 09:38:22.480333] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'node-uuid' for volume 'live-dht' with value 'b80cca53-bfd7-442a-a0a2-c846c978ea30' stor105: [2015-08-14 09:38:22.480342] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'rebalance-cmd' for volume 'live-dht' with value '4' stor105: [2015-08-14 09:38:22.480351] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'readdir-optimize' for volume 'live-dht' with value 'on' stor105: [2015-08-14 09:38:22.480360] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'assert-no-child-down' for volume 'live-dht' with value 'yes' stor105: [2015-08-14 09:38:22.480368] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'lookup-unhashed' for volume 'live-dht' with value 'yes' stor105: [2015-08-14 09:38:22.480375] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'use-readdirp' for volume 'live-dht' with value 'yes' stor105: [2015-08-14 09:42:47.631749] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs/OperaArchiveCol/LB 20130307_MitoSox_exp_conc_incub_24h/Meas_01(2013-03-07_14-07-12) stor105: [2015-08-14 09:42:47.632020] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs/OperaArchiveCol/LB 20130307_MitoSox_exp_conc_incub_24h stor105: [2015-08-14 09:42:47.633021] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs/OperaArchiveCol stor105: [2015-08-14 09:42:47.633226] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs stor105: [2015-08-14 09:42:47.633439] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs stor105: [2015-08-14 12:11:27.085008] I [MSGID: 100030] [glusterfsd.c:2301:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.3 (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/live --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on --xlator-option *dht.rebalance-cmd=4 --xlator-option *dht.node-uuid=b80cca53-bfd7-442a-a0a2-c846c978ea30 --xlator-option *dht.commit-hash=2926499706 --socket-file /var/run/gluster/gluster-rebalance-1328637d-7730-4627-8945-bbe43626d527.sock --pid-file /var/lib/glusterd/vols/live/rebalance/b80cca53-bfd7-442a-a0a2-c846c978ea30.pid -l /var/log/glusterfs/live-rebalance.log) stor105: [2015-08-14 12:11:27.107815] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 stor105: [2015-08-14 12:11:32.086382] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'commit-hash' for volume 'live-dht' with value '2926499706' stor105: [2015-08-14 12:11:32.092152] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'node-uuid' for volume 'live-dht' with value 'b80cca53-bfd7-442a-a0a2-c846c978ea30' stor105: [2015-08-14 12:11:32.092187] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'rebalance-cmd' for volume 'live-dht' with value '4' stor105: [2015-08-14 12:11:32.092220] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'readdir-optimize' for volume 'live-dht' with value 'on' stor105: [2015-08-14 12:11:32.092239] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'assert-no-child-down' for volume 'live-dht' with value 'yes' stor105: [2015-08-14 12:11:32.092256] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'lookup-unhashed' for volume 'live-dht' with value 'yes' stor105: [2015-08-14 12:11:32.092291] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'use-readdirp' for volume 'live-dht' with value 'yes'
Steps to reproduce: 1. Add a new node to a 2-node setup (already full with files) 2. Start a `gluster volume rebalance live fix-layout start` 3. Wait 20k-40k seconds 4. rebalance failed
upstream bug : http://review.gluster.org/11935
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions
REVIEW: http://review.gluster.org/11935 (socket: Add ping packets into beginning of ioq list) posted (#15) for review on master by mohammed rafi kc (rkavunga)
REVIEW: http://review.gluster.org/11935 (socket: Add ping packets into beginning of ioq list) posted (#16) for review on master by mohammed rafi kc (rkavunga)
*** This bug has been marked as a duplicate of bug 1421938 ***