Bug 810477 - Inter-node communication flaw, serious performance issue
Summary: Inter-node communication flaw, serious performance issue
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: All
OS: All
medium
high
Target Milestone: ---
Assignee: krishnan parthasarathi
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-04-06 10:30 UTC by Gareth Bult
Modified: 2015-11-03 23:06 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-10-22 15:46:38 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Gareth Bult 2012-04-06 10:30:22 UTC
If I add two 1G ethernet ports to a node, each with it's own address, I can run two bricks, one on each address and use a stripe to achieve 2G throughput to the stripe. This is *very* desirable and works well. However, once I do this, communication between peers breaks because each peer seems to have one UUID which appears to be tagged to just one address, and on modifying said stripe, other nodes complain that one of the addresses on one of the bricks is not a friend.

Practical example;

Data node, 2 ethernet ports on 10.1.0.1 and 10.2.0.1, create a stripe from two bricks as;

gluster volume create test stripe 2 10.1.0.1:/vols/bricks 10.2.0.1:/vols/brick2

Take another node with two ports and addresses 10.1.0.2 and 10.2.0.2 and from the first node;

gluster peer probe 10.2.0.2
mount -t glusterfs locahost:/test /mnt/test
dd if=/dev/zero of=/mnt/test/bigfile bs=1M count=2000 fdatasync
dd if=/mnt/test/bigfile of=/dev/null bs=1M

This works fine, then on the second node you can mount "test" and dd will show throughput of 199MB/sec.

If you then attempt to modify the volume in any way, it will tell you either that the operation failed on the second node, or that 10.1.0.2 "is not a friend", depending on where you try to make the change from.

If you then detach the second node, you can then make changes to "test", then you can re-attach it with "probe", but this is a horribly messy way of trying to work.

Comment 1 Gareth Bult 2012-04-10 10:00:07 UTC
Just to clarify, if on the server I do "gluster peer detach" for all other nodes, then for example add a new volume, then "gluster peer probe" for all other nodes, this seems to work fine. It's just a very messy process every time you want to perform an operation on a volume (!)

Comment 2 Amar Tumballi 2012-04-17 11:03:18 UTC
KP/Kaushal, need a resolution on this.

Comment 3 Gareth Bult 2012-04-17 11:11:22 UTC
Difference to performance is orders of magnitude based on the number of network cards available per machine. Does seem like a fairly notable issue ?!

Comment 4 Gareth Bult 2012-04-23 14:20:44 UTC
Anyone looking at this? Current solution is to run a VPS per network card and serve data from with the VPS .. but this is horribly inefficient and complicates management no-end.

Comment 5 Gareth Bult 2012-07-28 16:35:14 UTC
Ok, if anyone is looking at this, I've just installed the released version of 3.3.0 on Ubuntu 12.04 and the issue still exists. Does anyone have a solution other than virtualising the whole lot and running one VM per NIC?

Comment 6 Amar Tumballi 2012-12-21 11:19:22 UTC
Gareth, we have now made lot more VM hosting related enhancements to the product compared to earlier. Can you run a round of tests with 3.4.0qa releases (qa6 is the lastest now) ?

Comment 7 Gareth Bult 2013-02-24 23:36:36 UTC
Sorry, Gluster didn't cut it for me re; VM hosting and I now have a solution that renders Gluster completely obsolete (for VM hosting) in every respect. For what it's worth; I think the Gluster concept is good, but releasing such a hopelessly unstable product without the promised VM support .. not good.

Comment 8 Jeff Darcy 2013-02-25 00:56:20 UTC
Well, we can't be all things to all people.  However, we might get closer if you could tell us in what ways you consider Gluster obsolete or unstable.  There's nothing about that in this particular bug report, which is limited to one specific network configuration which most people would consider inferior to bonding or split-horizon DNS anyway.  Are you willing to be more constructive?

Comment 9 Gareth Bult 2013-02-25 02:21:51 UTC
Specifically, my current solution offers network RAID10 with local LFU caching on SSD, this outperforms any other shared storage product I've tried by many orders of magnitude. Is this something Gluster is likely to add?

Comment 10 Niels de Vos 2014-11-27 14:45:08 UTC
Feature requests make most sense against the 'mainline' release, there is no ETA for an implementation and requests might get forgotten when filed against a particular version.

Comment 11 Kaleb KEITHLEY 2015-10-22 15:46:38 UTC
because of the large number of bugs filed against mainline version\ is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.


Note You need to log in before you can comment on or make changes to this bug.