Bug 1254138

Summary: Rebalance fix-layout fails after some time with a timeout
Product: [Community] GlusterFS Reporter: Mohammed Rafi KC <rkavunga>
Component: rpcAssignee: Mohammed Rafi KC <rkavunga>
Status: CLOSED DUPLICATE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, rgowdapp, trefex
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1254137 Environment:
Last Closed: 2017-09-06 11:02:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1254137    
Bug Blocks:    

Description Mohammed Rafi KC 2015-08-17 09:20:17 UTC
+++ This bug was initially created as a clone of Bug #1254137 +++

Description of problem:

After adding a node to a 2-node setup, I performed a `gluster volume rebalance live fix-layout start`, and after some time, the fix-layout fails with the following error.

How reproducible:

We tried 3-4 times and it happened everytime.

Additional info:

Log entries from the 3 nodes:

stor106: [2015-08-13 15:49:26.557880] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'commit-hash' for volume 'live-dht' with value '2925913100'
stor106: [2015-08-13 15:49:26.557908] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'node-uuid' for volume 'live-dht' with value '97952e77-bf0a-4227-a529-2d1ae787ec3e'
stor106: [2015-08-13 15:49:26.557924] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'rebalance-cmd' for volume 'live-dht' with value '4'
stor106: [2015-08-13 15:49:26.557939] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'readdir-optimize' for volume 'live-dht' with value 'on'
stor106: [2015-08-13 15:49:26.557955] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'assert-no-child-down' for volume 'live-dht' with value 'yes'
stor106: [2015-08-13 15:49:26.557969] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'lookup-unhashed' for volume 'live-dht' with value 'yes'
stor106: [2015-08-13 15:49:26.557983] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'use-readdirp' for volume 'live-dht' with value 'yes'
stor106: [2015-08-13 21:36:05.535730] C [rpc-clnt-ping.c:161:rpc_clnt_ping_timer_expired] 0-live-client-4: server 192.168.123.106:49156 has not responded in the last 42 seconds, disconnecting.
stor106: [2015-08-13 21:36:05.536519] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7f69b957b9e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f69b93469be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f69b9346ace] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7f69b934847c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f69b9348c38] ))))) 0-live-client-4: forced unwinding frame type(GlusterFS 3.3) op(READDIRP(40)) called at 2015-08-13 21:35:06.999805 (xid=0x245a8f)
stor106: [2015-08-13 21:36:05.536943] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7f69b957b9e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f69b93469be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f69b9346ace] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7f69b934847c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f69b9348c38] ))))) 0-live-client-4: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-08-13 21:35:23.532921 (xid=0x245a90)
stor106: [2015-08-13 21:36:05.864351] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/databases/marker_dil/rnpB/0.3-B1-thermo
stor106: [2015-08-13 21:36:05.864594] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/databases/marker_dil/rnpB
stor106: [2015-08-13 21:36:05.864783] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/databases/marker_dil
stor106: [2015-08-13 21:36:05.865190] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/databases
stor106: [2015-08-13 21:36:05.865351] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology
stor106: [2015-08-13 21:36:05.865674] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia
stor106: [2015-08-13 21:36:05.865854] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb
stor106: [2015-08-14 09:38:17.498230] I [MSGID: 100030] [glusterfsd.c:2301:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.3 (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/live --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on --xlator-option *dht.rebalance-cmd=4 --xlator-option *dht.node-uuid=97952e77-bf0a-4227-a529-2d1ae787ec3e --xlator-option *dht.commit-hash=2926426186 --socket-file /var/run/gluster/gluster-rebalance-1328637d-7730-4627-8945-bbe43626d527.sock --pid-file /var/lib/glusterd/vols/live/rebalance/97952e77-bf0a-4227-a529-2d1ae787ec3e.pid -l /var/log/glusterfs/live-rebalance.log)
stor106: [2015-08-14 09:38:17.519827] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
stor106: [2015-08-14 09:38:22.493300] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'commit-hash' for volume 'live-dht' with value '2926426186'
stor106: [2015-08-14 09:38:22.493331] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'node-uuid' for volume 'live-dht' with value '97952e77-bf0a-4227-a529-2d1ae787ec3e'
stor106: [2015-08-14 09:38:22.493350] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'rebalance-cmd' for volume 'live-dht' with value '4'
stor106: [2015-08-14 09:38:22.493367] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'readdir-optimize' for volume 'live-dht' with value 'on'
stor106: [2015-08-14 09:38:22.493384] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'assert-no-child-down' for volume 'live-dht' with value 'yes'
stor106: [2015-08-14 09:38:22.493400] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'lookup-unhashed' for volume 'live-dht' with value 'yes'
stor106: [2015-08-14 09:38:22.493415] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'use-readdirp' for volume 'live-dht' with value 'yes'
stor106: [2015-08-14 09:42:47.621544] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs/OperaArchiveCol/CT 20131204_Fluo4 1 Well other Fields/Meas_01(2013-12-04_20-16-03)
stor106: [2015-08-14 09:42:47.621710] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs/OperaArchiveCol/CT 20131204_Fluo4 1 Well other Fields
stor106: [2015-08-14 09:42:47.622220] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs/OperaArchiveCol
stor106: [2015-08-14 09:42:47.622341] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs
stor106: [2015-08-14 09:42:47.622457] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs
stor106: [2015-08-14 12:11:27.121727] I [MSGID: 100030] [glusterfsd.c:2301:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.3 (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/live --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on --xlator-option *dht.rebalance-cmd=4 --xlator-option *dht.node-uuid=97952e77-bf0a-4227-a529-2d1ae787ec3e --xlator-option *dht.commit-hash=2926499706 --socket-file /var/run/gluster/gluster-rebalance-1328637d-7730-4627-8945-bbe43626d527.sock --pid-file /var/lib/glusterd/vols/live/rebalance/97952e77-bf0a-4227-a529-2d1ae787ec3e.pid -l /var/log/glusterfs/live-rebalance.log)
stor106: [2015-08-14 12:11:27.143975] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
stor106: [2015-08-14 12:11:32.122635] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'commit-hash' for volume 'live-dht' with value '2926499706'
stor106: [2015-08-14 12:11:32.129824] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'node-uuid' for volume 'live-dht' with value '97952e77-bf0a-4227-a529-2d1ae787ec3e'
stor106: [2015-08-14 12:11:32.129868] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'rebalance-cmd' for volume 'live-dht' with value '4'
stor106: [2015-08-14 12:11:32.129889] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'readdir-optimize' for volume 'live-dht' with value 'on'
stor106: [2015-08-14 12:11:32.129909] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'assert-no-child-down' for volume 'live-dht' with value 'yes'
stor106: [2015-08-14 12:11:32.129926] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'lookup-unhashed' for volume 'live-dht' with value 'yes'
stor106: [2015-08-14 12:11:32.129942] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'use-readdirp' for volume 'live-dht' with value 'yes'
stor104: [2015-08-13 15:49:26.561286] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'lookup-unhashed' for volume 'live-dht' with value 'yes'
stor104: [2015-08-13 15:49:26.561301] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'use-readdirp' for volume 'live-dht' with value 'yes'
stor104: [2015-08-13 20:22:32.106903] C [rpc-clnt-ping.c:161:rpc_clnt_ping_timer_expired] 0-live-client-5: server 192.168.123.106:49157 has not responded in the last 42 seconds, disconnecting.
stor104: [2015-08-13 20:22:32.107488] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7fede0b249e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fede08ef9be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fede08eface] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7fede08f147c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7fede08f1c38] ))))) 0-live-client-5: forced unwinding frame type(GlusterFS 3.3) op(READDIRP(40)) called at 2015-08-13 20:21:19.267201 (xid=0x1a21ae)
stor104: [2015-08-13 20:22:32.108236] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7fede0b249e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fede08ef9be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fede08eface] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7fede08f147c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7fede08f1c38] ))))) 0-live-client-5: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-08-13 20:21:50.102656 (xid=0x1a21af)
stor104: [2015-08-13 20:22:32.111687] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/IMDyn/metaG/Analysis/mOTUs/withHuman/P1_2/motu.profiles/mOTU.v1.padded/reads.processed.solexaqa.solexaqa.allbest.l45.p97
stor104: [2015-08-13 20:22:32.112024] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/IMDyn/metaG/Analysis/mOTUs/withHuman/P1_2/motu.profiles/mOTU.v1.padded
stor104: [2015-08-13 20:22:32.112365] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/IMDyn/metaG/Analysis/mOTUs/withHuman/P1_2/motu.profiles
stor104: [2015-08-13 20:22:32.112735] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/IMDyn/metaG/Analysis/mOTUs/withHuman/P1_2
stor104: [2015-08-13 20:22:32.113071] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/IMDyn/metaG/Analysis/mOTUs/withHuman
stor104: [2015-08-13 20:22:32.113405] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/IMDyn/metaG/Analysis/mOTUs
stor104: [2015-08-13 20:22:32.113764] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/IMDyn/metaG/Analysis
stor104: [2015-08-13 20:22:32.114065] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/IMDyn/metaG
stor104: [2015-08-13 20:22:32.114386] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology/IMDyn
stor104: [2015-08-13 20:22:32.114731] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia/ecosystem_biology
stor104: [2015-08-13 20:22:32.115051] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/gaia
stor104: [2015-08-13 20:22:32.115433] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb
stor104: [2015-08-14 09:38:17.500715] I [MSGID: 100030] [glusterfsd.c:2301:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.3 (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/live --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on --xlator-option *dht.rebalance-cmd=4 --xlator-option *dht.node-uuid=305c0f00-0f11-4da3-a470-50b6e6c14976 --xlator-option *dht.commit-hash=2926426186 --socket-file /var/run/gluster/gluster-rebalance-1328637d-7730-4627-8945-bbe43626d527.sock --pid-file /var/lib/glusterd/vols/live/rebalance/305c0f00-0f11-4da3-a470-50b6e6c14976.pid -l /var/log/glusterfs/live-rebalance.log)
stor104: [2015-08-14 09:38:17.514523] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
stor104: [2015-08-14 09:38:22.498426] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'commit-hash' for volume 'live-dht' with value '2926426186'
stor104: [2015-08-14 09:38:22.498446] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'node-uuid' for volume 'live-dht' with value '305c0f00-0f11-4da3-a470-50b6e6c14976'
stor104: [2015-08-14 09:38:22.498456] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'rebalance-cmd' for volume 'live-dht' with value '4'
stor104: [2015-08-14 09:38:22.498465] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'readdir-optimize' for volume 'live-dht' with value 'on'
stor104: [2015-08-14 09:38:22.498475] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'assert-no-child-down' for volume 'live-dht' with value 'yes'
stor104: [2015-08-14 09:38:22.498483] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'lookup-unhashed' for volume 'live-dht' with value 'yes'
stor104: [2015-08-14 09:38:22.498492] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'use-readdirp' for volume 'live-dht' with value 'yes'
stor104: [2015-08-14 09:42:47.642919] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs/OperaArchiveCol/CT 20131204_Fluo4 1 Well other Fields/Meas_01(2013-12-04_20-16-03)
stor104: [2015-08-14 09:42:47.643189] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs/OperaArchiveCol/CT 20131204_Fluo4 1 Well other Fields
stor104: [2015-08-14 09:42:47.644244] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs/OperaArchiveCol
stor104: [2015-08-14 09:42:47.644404] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs
stor104: [2015-08-14 09:42:47.644549] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs
stor104: [2015-08-14 12:11:27.130086] I [MSGID: 100030] [glusterfsd.c:2301:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.3 (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/live --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on --xlator-option *dht.rebalance-cmd=4 --xlator-option *dht.node-uuid=305c0f00-0f11-4da3-a470-50b6e6c14976 --xlator-option *dht.commit-hash=2926499706 --socket-file /var/run/gluster/gluster-rebalance-1328637d-7730-4627-8945-bbe43626d527.sock --pid-file /var/lib/glusterd/vols/live/rebalance/305c0f00-0f11-4da3-a470-50b6e6c14976.pid -l /var/log/glusterfs/live-rebalance.log)
stor104: [2015-08-14 12:11:27.152078] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
stor104: [2015-08-14 12:11:32.145533] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'commit-hash' for volume 'live-dht' with value '2926499706'
stor104: [2015-08-14 12:11:32.145593] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'node-uuid' for volume 'live-dht' with value '305c0f00-0f11-4da3-a470-50b6e6c14976'
stor104: [2015-08-14 12:11:32.145612] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'rebalance-cmd' for volume 'live-dht' with value '4'
stor104: [2015-08-14 12:11:32.145629] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'readdir-optimize' for volume 'live-dht' with value 'on'
stor104: [2015-08-14 12:11:32.145646] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'assert-no-child-down' for volume 'live-dht' with value 'yes'
stor104: [2015-08-14 12:11:32.145661] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'lookup-unhashed' for volume 'live-dht' with value 'yes'
stor104: [2015-08-14 12:11:32.145675] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'use-readdirp' for volume 'live-dht' with value 'yes'
stor105: [2015-08-14 01:18:06.680271] C [rpc-clnt-ping.c:161:rpc_clnt_ping_timer_expired] 0-live-client-5: server 192.168.123.106:49157 has not responded in the last 42 seconds, disconnecting.
stor105: [2015-08-14 01:18:06.680939] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7f8549c409e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f8549a0b9be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f8549a0bace] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7f8549a0d47c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f8549a0dc38] ))))) 0-live-client-5: forced unwinding frame type(GlusterFS 3.3) op(READDIRP(40)) called at 2015-08-14 01:17:17.396967 (xid=0x3b966a)
stor105: [2015-08-14 01:18:06.681340] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7f8549c409e6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f8549a0b9be] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f8549a0bace] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7f8549a0d47c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f8549a0dc38] ))))) 0-live-client-5: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-08-14 01:17:24.676542 (xid=0x3b966b)
stor105: [2015-08-14 01:18:06.684419] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia/2014/11/10/10:32/access-gaia.uni.lu:/work/projects/ecosystem_biology/local_tools/specI_v1.0/sequences
stor105: [2015-08-14 01:18:06.684710] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia/2014/11/10/10:32/access-gaia.uni.lu:/work/projects/ecosystem_biology/local_tools/specI_v1.0
stor105: [2015-08-14 01:18:06.685996] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia/2014/11/10/10:32/access-gaia.uni.lu:/work/projects/ecosystem_biology/local_tools
stor105: [2015-08-14 01:18:06.686349] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia/2014/11/10/10:32/access-gaia.uni.lu:/work/projects/ecosystem_biology
stor105: [2015-08-14 01:18:06.686602] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia/2014/11/10/10:32/access-gaia.uni.lu:/work/projects
stor105: [2015-08-14 01:18:06.686946] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia/2014/11/10/10:32/access-gaia.uni.lu:/work
stor105: [2015-08-14 01:18:06.687147] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia/2014/11/10/10:32/access-gaia.uni.lu:
stor105: [2015-08-14 01:18:06.687372] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia/2014/11/10/10:32
stor105: [2015-08-14 01:18:06.687619] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia/2014/11/10
stor105: [2015-08-14 01:18:06.687848] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia/2014/11
stor105: [2015-08-14 01:18:06.688077] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia/2014
stor105: [2015-08-14 01:18:06.688290] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup/gaia
stor105: [2015-08-14 01:18:06.688505] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb/esb_backup
stor105: [2015-08-14 01:18:06.688752] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /esb
stor105: [2015-08-14 09:38:17.485415] I [MSGID: 100030] [glusterfsd.c:2301:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.3 (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/live --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on --xlator-option *dht.rebalance-cmd=4 --xlator-option *dht.node-uuid=b80cca53-bfd7-442a-a0a2-c846c978ea30 --xlator-option *dht.commit-hash=2926426186 --socket-file /var/run/gluster/gluster-rebalance-1328637d-7730-4627-8945-bbe43626d527.sock --pid-file /var/lib/glusterd/vols/live/rebalance/b80cca53-bfd7-442a-a0a2-c846c978ea30.pid -l /var/log/glusterfs/live-rebalance.log)
stor105: [2015-08-14 09:38:17.507689] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
stor105: [2015-08-14 09:38:22.480316] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'commit-hash' for volume 'live-dht' with value '2926426186'
stor105: [2015-08-14 09:38:22.480333] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'node-uuid' for volume 'live-dht' with value 'b80cca53-bfd7-442a-a0a2-c846c978ea30'
stor105: [2015-08-14 09:38:22.480342] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'rebalance-cmd' for volume 'live-dht' with value '4'
stor105: [2015-08-14 09:38:22.480351] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'readdir-optimize' for volume 'live-dht' with value 'on'
stor105: [2015-08-14 09:38:22.480360] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'assert-no-child-down' for volume 'live-dht' with value 'yes'
stor105: [2015-08-14 09:38:22.480368] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'lookup-unhashed' for volume 'live-dht' with value 'yes'
stor105: [2015-08-14 09:38:22.480375] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'use-readdirp' for volume 'live-dht' with value 'yes'
stor105: [2015-08-14 09:42:47.631749] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs/OperaArchiveCol/LB 20130307_MitoSox_exp_conc_incub_24h/Meas_01(2013-03-07_14-07-12)
stor105: [2015-08-14 09:42:47.632020] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs/OperaArchiveCol/LB 20130307_MitoSox_exp_conc_incub_24h
stor105: [2015-08-14 09:42:47.633021] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs/OperaArchiveCol
stor105: [2015-08-14 09:42:47.633226] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs/hcs
stor105: [2015-08-14 09:42:47.633439] E [MSGID: 109016] [dht-rebalance.c:2554:gf_defrag_fix_layout] 0-live-dht: Fix layout failed for /hcs
stor105: [2015-08-14 12:11:27.085008] I [MSGID: 100030] [glusterfsd.c:2301:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.3 (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/live --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on --xlator-option *dht.rebalance-cmd=4 --xlator-option *dht.node-uuid=b80cca53-bfd7-442a-a0a2-c846c978ea30 --xlator-option *dht.commit-hash=2926499706 --socket-file /var/run/gluster/gluster-rebalance-1328637d-7730-4627-8945-bbe43626d527.sock --pid-file /var/lib/glusterd/vols/live/rebalance/b80cca53-bfd7-442a-a0a2-c846c978ea30.pid -l /var/log/glusterfs/live-rebalance.log)
stor105: [2015-08-14 12:11:27.107815] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
stor105: [2015-08-14 12:11:32.086382] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'commit-hash' for volume 'live-dht' with value '2926499706'
stor105: [2015-08-14 12:11:32.092152] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'node-uuid' for volume 'live-dht' with value 'b80cca53-bfd7-442a-a0a2-c846c978ea30'
stor105: [2015-08-14 12:11:32.092187] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'rebalance-cmd' for volume 'live-dht' with value '4'
stor105: [2015-08-14 12:11:32.092220] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'readdir-optimize' for volume 'live-dht' with value 'on'
stor105: [2015-08-14 12:11:32.092239] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'assert-no-child-down' for volume 'live-dht' with value 'yes'
stor105: [2015-08-14 12:11:32.092256] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'lookup-unhashed' for volume 'live-dht' with value 'yes'
stor105: [2015-08-14 12:11:32.092291] I [graph.c:269:gf_add_cmdline_options] 0-live-dht: adding option 'use-readdirp' for volume 'live-dht' with value 'yes'

Comment 1 Mohammed Rafi KC 2015-08-17 09:29:35 UTC
Steps to reproduce:

1. Add a new node to a 2-node setup (already full with files)
2. Start a `gluster volume rebalance live fix-layout start`
3. Wait 20k-40k seconds
4. rebalance failed

Comment 2 Mohammed Rafi KC 2015-08-17 09:31:00 UTC
upstream bug : http://review.gluster.org/11935

Comment 3 Mike McCune 2016-03-28 23:28:22 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 4 Worker Ant 2016-11-17 09:18:32 UTC
REVIEW: http://review.gluster.org/11935 (socket: Add ping packets into beginning of ioq list) posted (#15) for review on master by mohammed rafi  kc (rkavunga)

Comment 5 Worker Ant 2017-01-17 07:07:39 UTC
REVIEW: http://review.gluster.org/11935 (socket: Add ping packets into beginning of ioq list) posted (#16) for review on master by mohammed rafi  kc (rkavunga)

Comment 6 Raghavendra G 2017-09-06 11:02:05 UTC

*** This bug has been marked as a duplicate of bug 1421938 ***