Description of problem: I tried to create 2000 objects in 2000 containers in parallel. out of this only 1763 operations were successfully and other were not. Gluster volume:- Volume Name: test2 Type: Distributed-Replicate Volume ID: f049d836-8cc8-47fd-bab1-975be12bf0fd Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.16.157.81:/home/test2-dr Brick2: 10.16.157.75:/home/test2-drr Brick3: 10.16.157.78:/home/test2-ddr Brick4: 10.16.157.21:/home/test2-ddrr Version-Release number of selected component (if applicable): [root@gqac028 ~]# glusterfs -V glusterfs 3.3.0rhs built on Jul 25 2012 11:21:57 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. How reproducible: kind of always, means not all objects gets uploaded though numbers may differ Steps to Reproduce: 1. create a volume dist-rep (2x2) 2. start curl commands in parallel from a client, 2000 in number 3. all the commands should create objects in different containers. 4. four node cluster, requests are sent from one client to one server. 5. file size is 2MB Actual results: the parallel execution fails. Expected results: We expect the number of parallel requests should be of higher number. Additional info: system config, [root@gqac028 ~]# cat /proc/sys/net/core/somaxconn 10000 [root@gqac028 ~]# cat /etc/swift/proxy-server.conf [DEFAULT] bind_ip = 10.16.157.81 bind_port = 8080 user = root log_facility = LOG_LOCAL0 log_level = INFO client_timeout = 150 node_timeout = 120 conn_timeout = 5 workers = 12 backlog = 10000 [pipeline:main] #pipeline = healthcheck cache tempauth proxy-server pipeline = healthcheck cache proxy-server [app:proxy-server] use = egg:swift#proxy allow_account_management = true account_autocreate = true # [filter:tempauth] # use = egg:swift#tempauth [filter:healthcheck] use = egg:swift#healthcheck [filter:cache] use = egg:swift#memcache [root@gqac028 ~]# [root@gqac028 ~]# cat /etc/swift/object-server/1.conf [DEFAULT] devices = /srv/1/node mount_check = false bind_port = 6010 user = root log_facility = LOG_LOCAL2 log_level = WARN workers = 12 node_timeout = 120 conn_timeout = 5 [root@gqac028 ~]# cat /etc/swift/container-server/1.conf [DEFAULT] devices = /srv/1/node mount_check = false bind_port = 6011 user = root log_facility = LOG_LOCAL2 log_level = WARN workers = 4 node_timeout = 120 conn_timeout = 5 [root@gqac028 ~]# cat /etc/swift/account-server/1.conf [DEFAULT] devices = /srv/1/node mount_check = false bind_port = 6012 user = root log_facility = LOG_LOCAL2 log_level = WARN workers = 4 node_timeout = 120 conn_timeout = 5 [root@gqac028 ~]# cat /etc/swift/fs.conf [DEFAULT] mount_path = /mnt/gluster-object auth_account = auth #ip of the fs server. mount_ip = localhost #fs server need not be local, remote server can also be used, #set remote_cluster=yes for using remote server. remote_cluster = no #object_only = no object_only = yes
What was the error code for the failed operations? Also, can you paste the snippet of the log file of these errors.
This is likely a combination of the current behavior reported under bug 829913, and bug 876660. In this case, because the files are distributed over two nodes, it is likely that 50% of the requests end up creating the temp file on the wrong node, and that 50% of the time, the node servicing the PUT operation in the UFO (proxy and object server combination) is not the local node. In the worse case, in the worse case, the final home of the PUT is servicing the request, but the temp file lands on the other node only to have it moved to the local node again.
RHS 2.0 UFO Bugs are being set to low priority.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html