846968 – gluster-object: object creation fails when large number of objects are created in parallel

Bug 846968 - gluster-object: object creation fails when large number of objects are created in parallel

Summary: gluster-object: object creation fails when large number of objects are create...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	gluster-swift
Sub Component:
Version:	2.0
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	---
Assignee:	Luis Pabón
QA Contact:	pushpesh sharma
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	858440
TreeView+	depends on / blocked

Reported:	2012-08-09 10:19 UTC by Saurabh
Modified:	2016-11-08 22:25 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	858440 (view as bug list)
Environment:
Last Closed:	2013-09-23 22:32:20 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Saurabh 2012-08-09 10:19:01 UTC

Description of problem:
I tried to create 2000 objects in 2000 containers in parallel. 
out of this only 1763 operations were successfully and other were not.

Gluster volume:-
Volume Name: test2
Type: Distributed-Replicate
Volume ID: f049d836-8cc8-47fd-bab1-975be12bf0fd
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.16.157.81:/home/test2-dr
Brick2: 10.16.157.75:/home/test2-drr
Brick3: 10.16.157.78:/home/test2-ddr
Brick4: 10.16.157.21:/home/test2-ddrr



Version-Release number of selected component (if applicable):

[root@gqac028 ~]# glusterfs -V
glusterfs 3.3.0rhs built on Jul 25 2012 11:21:57
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.

How reproducible:
kind of always, means not all objects gets uploaded though numbers may differ

Steps to Reproduce:
1. create a volume dist-rep (2x2)
2. start curl commands in parallel from a client, 2000 in number
3. all the commands should create objects in different containers.
4. four node cluster, requests are sent from one client to one server.
5. file size is 2MB
  
Actual results:
the parallel execution fails.

Expected results:
We expect the number of parallel requests should be of higher number.

Additional info:
system config,

[root@gqac028 ~]# cat /proc/sys/net/core/somaxconn
10000



[root@gqac028 ~]# cat /etc/swift/proxy-server.conf
[DEFAULT]
bind_ip = 10.16.157.81
bind_port = 8080
user = root
log_facility = LOG_LOCAL0
log_level = INFO
client_timeout = 150
node_timeout = 120
conn_timeout = 5
workers = 12
backlog = 10000

[pipeline:main]
#pipeline = healthcheck cache tempauth proxy-server
pipeline = healthcheck cache proxy-server

[app:proxy-server]
use = egg:swift#proxy
allow_account_management = true
account_autocreate = true

# [filter:tempauth]
# use = egg:swift#tempauth

[filter:healthcheck]
use = egg:swift#healthcheck

[filter:cache]
use = egg:swift#memcache
[root@gqac028 ~]# 



[root@gqac028 ~]# cat /etc/swift/object-server/1.conf 
[DEFAULT]
devices = /srv/1/node
mount_check = false
bind_port = 6010
user = root
log_facility = LOG_LOCAL2
log_level = WARN
workers = 12
node_timeout = 120
conn_timeout = 5



[root@gqac028 ~]# cat /etc/swift/container-server/1.conf 
[DEFAULT]
devices = /srv/1/node
mount_check = false
bind_port = 6011
user = root
log_facility = LOG_LOCAL2
log_level = WARN
workers = 4
node_timeout = 120
conn_timeout = 5



[root@gqac028 ~]# cat /etc/swift/account-server/1.conf 
[DEFAULT]
devices = /srv/1/node
mount_check = false
bind_port = 6012
user = root
log_facility = LOG_LOCAL2
log_level = WARN
workers = 4
node_timeout = 120
conn_timeout = 5


[root@gqac028 ~]# cat /etc/swift/fs.conf 
[DEFAULT]
mount_path = /mnt/gluster-object
auth_account = auth
#ip of the fs server.
mount_ip = localhost
#fs server need not be local, remote server can also be used,
#set remote_cluster=yes for using remote server.
remote_cluster = no
#object_only = no
object_only = yes

Comment 2 Junaid 2012-09-12 05:04:01 UTC

What was the error code for the failed operations? Also, can you paste the snippet of the log file of these errors.

Comment 3 Peter Portante 2012-11-19 20:09:56 UTC

This is likely a combination of the current behavior reported under bug
829913, and bug 876660.

In this case, because the files are distributed over two nodes, it is
likely that 50% of the requests end up creating the temp file on the
wrong node, and that 50% of the time, the node servicing the PUT
operation in the UFO (proxy and object server combination) is not the
local node. In the worse case, in the worse case, the final home of the
PUT is servicing the request, but the temp file lands on the other node
only to have it moved to the local node again.

Comment 5 Luis Pabón 2013-07-17 01:01:04 UTC

RHS 2.0 UFO Bugs are being set to low priority.

Comment 9 Scott Haines 2013-09-23 22:32:20 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.