Description of problem: If you create and start a geo-rep session, all the gsyncd of the master nodes do aux mount on a single node, ideally each master node should pick a different slave node for a aux mount. Version-Release number of selected component (if applicable): glusterfs-3.4.0.12rhs.beta1-1.el6rhs.x86_64 How reproducible: Always Steps to Reproduce: 1.Create and start geo-rep session between master and slave. 2.Check for aux mounts on all slave nodes Actual results: All gsyncd of master nodes do aux mount on a single slave node. Expected results:Each gsyncd of master nodes should pick up different slave nodes. Additional info:
If the slave node to which all master node do aux mount goes down, all the geo-rep status becomes faulty and it is not recoverable even if you stop and start geo-rep session. This is a serious problem.
[2013-08-28 00:50:57.479891] I [monitor(monitor):238:distribute] <top>: [{'host': 'supernova', 'dir': '/data/export/perf-r10'}, {'host': 'ganaka', 'dir': '/data/export/perf-r11'}, {'host': '127.1.1.1', 'dir': '/data/export/perf-r12'}, {'host': '127.1.2.1', 'dir': '/data/export/perf-r13'}] [2013-08-28 00:50:57.480031] I [monitor(monitor):241:distribute] <top>: slave bricks: [{'host': 'supernova', 'dir': '/data/export/perf-r10'}, {'host': 'ganaka', 'dir': '/data/export/perf-r11'}, {'host': '127.1.1.1', 'dir': '/data/export/perf-r12'}, {'host': '127.1.2.1', 'dir': '/data/export/perf-r13'}] [2013-08-28 00:50:57.480288] I [monitor(monitor):260:distribute] <top>: worker specs: [('/data/export/r1', 'ssh://root.1.1:gluster://localhost:perf2'), ('/data/export/r2', 'ssh://root.2.1:gluster://localhost:perf2'), ('/data/export/r3', 'ssh://root@ganaka:gluster://localhost:perf2'), ('/data/export/r4', 'ssh://root@supernova:gluster://localhost:perf2'), ('/data/export/r5', 'ssh://root.1.1:gluster://localhost:perf2'), ('/data/export/r6', 'ssh://root.2.1:gluster://localhost:perf2'), ('/data/export/r7', 'ssh://root@ganaka:gluster://localhost:perf2'), ('/data/export/r8', 'ssh://root@supernova:gluster://localhost:perf2')] At least the worker specs show that the client node distribution is happening. Need to see what is the issue for not having multiple SSH sessions to different targets
[root@supernova glusterfs]# ps aux | grep ssh\ root 8585 0.0 0.0 75984 3912 ? S 00:51 0:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-dNFc8i/gsycnd-ssh-%r@%h:%p root@supernova /nonexistent/gsyncd --session-owner 6ca86851-f44e-4528-a840-0a3875f1f6ec -N --listen --timeout 120 gluster://localhost:perf2 root 8596 0.0 0.0 75984 3908 ? S 00:51 0:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-IHJl59/gsycnd-ssh-%r@%h:%p root@supernova /nonexistent/gsyncd --session-owner 6ca86851-f44e-4528-a840-0a3875f1f6ec -N --listen --timeout 120 gluster://localhost:perf2 root 8622 0.0 0.0 75984 3912 ? S 00:51 0:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-kFJsXp/gsycnd-ssh-%r@%h:%p root@ganaka /nonexistent/gsyncd --session-owner 6ca86851-f44e-4528-a840-0a3875f1f6ec -N --listen --timeout 120 gluster://localhost:perf2 root 8766 0.0 0.0 78060 3888 ? S 00:51 0:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-bxRC0k/gsycnd-ssh-%r@%h:%p root.2.1 /nonexistent/gsyncd --session-owner 6ca86851-f44e-4528-a840-0a3875f1f6ec -N --listen --timeout 120 gluster://localhost:perf2 root 8778 0.0 0.0 78060 3892 ? S 00:51 0:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-wbVrIp/gsycnd-ssh-%r@%h:%p root.2.1 /nonexistent/gsyncd --session-owner 6ca86851-f44e-4528-a840-0a3875f1f6ec -N --listen --timeout 120 gluster://localhost:perf2 root 8789 0.0 0.0 78060 3892 ? S 00:51 0:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-y7FU8v/gsycnd-ssh-%r@%h:%p root.1.1 /nonexistent/gsyncd --session-owner 6ca86851-f44e-4528-a840-0a3875f1f6ec -N --listen --timeout 120 gluster://localhost:perf2 root 8796 0.0 0.0 78060 3888 ? S 00:51 0:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-WDcHCw/gsycnd-ssh-%r@%h:%p root.1.1 /nonexistent/gsyncd --session-owner 6ca86851-f44e-4528-a840-0a3875f1f6ec -N --listen --timeout 120 gluster://localhost:perf2 root 9261 0.0 0.0 75984 3908 ? S 00:51 0:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-t5c8xd/gsycnd-ssh-%r@%h:%p root@ganaka /nonexistent/gsyncd --session-owner 6ca86851-f44e-4528-a840-0a3875f1f6ec -N --listen --timeout 120 gluster://localhost:perf2 Shows that the ssh session between different slave nodes is already being setup. Need to understand whats the issue.
Ok, figured out the issue. from geo-replication/monitor.py: ---- locmbricks.sort() slaves.sort() workerspex = [] for i in range(len(locmbricks)): workerspex.append((locmbricks[i], slaves[i % len(slaves)])) logging.info('worker specs: ' + repr(workerspex)) ---- What it means is, for every local brick in the master volume, take the remote brick from slave volume in sorted order, and get a worker spec in order. Problem with this is, in a normal setup, where there is just 1 brick on each node for a volume, we end up syncing to only the first brick of slave, instead of distributing the load. Need to brick some logic to handle this predictably saying, i already consumed one of the brick, so lets take the next or something like that. Anyways, will have a fix soon for this.
requesting blocker flag as this would cause both performance degradation in geo-replication and also fails to support high availability.
https://code.engineering.redhat.com/gerrit/12054
Verified on glusterfs-3.4.0.30rhs-2.el6rhs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html