Description of problem: cluster info:- 4 physical machines gluster volume info: distribute-replicate(2x2) type ============================================================================== Aug 8 10:07:12 gqac028 object-server ERROR container update failed with 127.0.0.1:6011/sdb1 (saving for async update later): ConnectionTimeout (5.0s) (txn: tx695077467d7f4f5d89a9b9c1783b2f7f) Aug 8 10:07:14 gqac028 object-server ERROR container update failed with 127.0.0.1:6011/sdb1 (saving for async update later): ConnectionTimeout (5.0s) (txn: tx8aee3d4d2cae44cdaa9b6ca3597a7898) Aug 8 10:07:14 gqac028 object-server ERROR container update failed with 127.0.0.1:6011/sdb1 (saving for async update later): ConnectionTimeout (5.0s) (txn: txf534ac48abc74b7fada908d64ece4c0f) Aug 8 10:07:14 gqac028 object-server ERROR container update failed with 127.0.0.1:6011/sdb1 (saving for async update later): ConnectionTimeout (5.0s) (txn: tx4cbc78f252f444c7bc45400003445d7c) Aug 8 10:07:19 gqac028 object-server ERROR container update failed with 127.0.0.1:6011/sdb1 (saving for async update later): ConnectionTimeout (5.0s) (txn: txcf42532334ee4c5e966e441c31202821) Aug 8 10:07:20 gqac028 object-server ERROR container update failed with 127.0.0.1:6011/sdb1 (saving for async update later): ConnectionTimeout (5.0s) (txn: tx60bcb4674afd4a748d91c7f59c66159d) Version-Release number of selected component (if applicable): [root@gqac028 cont5]# glusterfs -V glusterfs 3.3.0rhs built on Jul 25 2012 11:21:57 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public Licen How reproducible: happening this time Steps to Reproduce: 1. send curl request to upload files to 2000 different containers 2. 3. Actual results: the error Expected results: All the objects should uploaded without any error. Additional info: [root@gqac028 cont5]# gluster volume info test2 Volume Name: test2 Type: Distributed-Replicate Volume ID: f049d836-8cc8-47fd-bab1-975be12bf0fd Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.16.157.81:/home/test2-dr Brick2: 10.16.157.75:/home/test2-drr Brick3: 10.16.157.78:/home/test2-ddr Brick4: 10.16.157.21:/home/test2-ddrr
This seems to be a performance issue. Can you verify the same in gluster-swift-1.7.4 rpms.
This BZ has been verified using catalyst workload on RHS2.1.It seems to be fixed, as new PDQ performance related changes are merged to RHS2.1. [root@dhcp207-9 ~]# rpm -qa|grep gluster gluster-swift-object-1.8.0-6.3.el6rhs.noarch vdsm-gluster-4.10.2-22.7.el6rhs.noarch gluster-swift-plugin-1.8.0-2.el6rhs.noarch glusterfs-geo-replication-3.4.0.12rhs.beta3-1.el6rhs.x86_64 glusterfs-3.4.0.12rhs.beta3-1.el6rhs.x86_64 gluster-swift-1.8.0-6.3.el6rhs.noarch glusterfs-server-3.4.0.12rhs.beta3-1.el6rhs.x86_64 gluster-swift-proxy-1.8.0-6.3.el6rhs.noarch gluster-swift-account-1.8.0-6.3.el6rhs.noarch glusterfs-rdma-3.4.0.12rhs.beta3-1.el6rhs.x86_64 glusterfs-fuse-3.4.0.12rhs.beta3-1.el6rhs.x86_64 gluster-swift-container-1.8.0-6.3.el6rhs.noarch All performance related tests(From QE perspective) will be done using catalyst workload(If required in future may be ssbench).Which has 15 runs of 10000 requests(PUT/GET/HEAD/DELETE) each distributed among 10 threads.These comprehensive test include all file formats and varied sizes.These test executed on a machine with following configuration:- RAM:- 7500Gb CPU:- 1 Volume Info:- All bricks are created as a logical volume(on localhost) of 10G each, and each volume has 4 of such bricks. [root@dhcp207-9 ~]# gluster volume info Volume Name: test Type: Distribute Volume ID: 440fdac0-a3bd-4ab1-a70c-f4c390d97100 Status: Started Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: localhost:/mnt/lv1/lv1 Brick2: localhost:/mnt/lv2/lv2 Brick3: localhost:/mnt/lv3/lv3 Brick4: localhost:/mnt/lv4/lv4 Volume Name: test2 Type: Distribute Volume ID: 6d922203-6657-4ed3-897a-069ef6c396bf Status: Started Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: localhost:/mnt/lv5/lv5 Brick2: localhost:/mnt/lv6/lv6 Brick3: localhost:/mnt/lv7/lv7 Brick4: localhost:/mnt/lv8/lv8 PS: Performance Engineering will be responsible for all large scale test , which will be done on BAGL cluster.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html