Red Hat Bugzilla – Bug 810477
Inter-node communication flaw, serious performance issue
Last modified: 2015-11-03 18:06:36 EST
If I add two 1G ethernet ports to a node, each with it's own address, I can run two bricks, one on each address and use a stripe to achieve 2G throughput to the stripe. This is *very* desirable and works well. However, once I do this, communication between peers breaks because each peer seems to have one UUID which appears to be tagged to just one address, and on modifying said stripe, other nodes complain that one of the addresses on one of the bricks is not a friend.
Data node, 2 ethernet ports on 10.1.0.1 and 10.2.0.1, create a stripe from two bricks as;
gluster volume create test stripe 2 10.1.0.1:/vols/bricks 10.2.0.1:/vols/brick2
Take another node with two ports and addresses 10.1.0.2 and 10.2.0.2 and from the first node;
gluster peer probe 10.2.0.2
mount -t glusterfs locahost:/test /mnt/test
dd if=/dev/zero of=/mnt/test/bigfile bs=1M count=2000 fdatasync
dd if=/mnt/test/bigfile of=/dev/null bs=1M
This works fine, then on the second node you can mount "test" and dd will show throughput of 199MB/sec.
If you then attempt to modify the volume in any way, it will tell you either that the operation failed on the second node, or that 10.1.0.2 "is not a friend", depending on where you try to make the change from.
If you then detach the second node, you can then make changes to "test", then you can re-attach it with "probe", but this is a horribly messy way of trying to work.
Just to clarify, if on the server I do "gluster peer detach" for all other nodes, then for example add a new volume, then "gluster peer probe" for all other nodes, this seems to work fine. It's just a very messy process every time you want to perform an operation on a volume (!)
KP/Kaushal, need a resolution on this.
Difference to performance is orders of magnitude based on the number of network cards available per machine. Does seem like a fairly notable issue ?!
Anyone looking at this? Current solution is to run a VPS per network card and serve data from with the VPS .. but this is horribly inefficient and complicates management no-end.
Ok, if anyone is looking at this, I've just installed the released version of 3.3.0 on Ubuntu 12.04 and the issue still exists. Does anyone have a solution other than virtualising the whole lot and running one VM per NIC?
Gareth, we have now made lot more VM hosting related enhancements to the product compared to earlier. Can you run a round of tests with 3.4.0qa releases (qa6 is the lastest now) ?
Sorry, Gluster didn't cut it for me re; VM hosting and I now have a solution that renders Gluster completely obsolete (for VM hosting) in every respect. For what it's worth; I think the Gluster concept is good, but releasing such a hopelessly unstable product without the promised VM support .. not good.
Well, we can't be all things to all people. However, we might get closer if you could tell us in what ways you consider Gluster obsolete or unstable. There's nothing about that in this particular bug report, which is limited to one specific network configuration which most people would consider inferior to bonding or split-horizon DNS anyway. Are you willing to be more constructive?
Specifically, my current solution offers network RAID10 with local LFU caching on SSD, this outperforms any other shared storage product I've tried by many orders of magnitude. Is this something Gluster is likely to add?
Feature requests make most sense against the 'mainline' release, there is no ETA for an implementation and requests might get forgotten when filed against a particular version.
because of the large number of bugs filed against mainline version\ is ambiguous and about to be removed as a choice.
If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.