823600 – occasional timeouts in openswift services when enough concurrent requests

Bug 823600 - occasional timeouts in openswift services when enough concurrent requests

Summary: occasional timeouts in openswift services when enough concurrent requests

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	gluster-swift
Sub Component:
Version:	2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Luis Pabón
QA Contact:	Saurabh
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-05-21 15:49 UTC by Ben England
Modified:	2018-12-01 18:28 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-10-10 10:54:40 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
generates traffic from multiple clients to servers using curl (5.87 KB, text/x-python) 2012-05-21 15:49 UTC, Ben England	no flags	Details
View All

Description Ben England 2012-05-21 15:49:39 UTC

Created attachment 585861 [details]
generates traffic from multiple clients to servers using curl

Description of problem:

with very large object sizes, I still can get HTTP 503 Internal Server Error on PUT requests.  This seems to be related to a "chunk write timeout" in the logs.  This may be related to fact that proxy server has been configured to have 8 worker threads but the other servers have not been similarly configured.

On other runs not recorded here, I have seen timeouts from account-server requests as well.

I think this problem is different from previous HTTP 503 error condition, which happened much more easily.

Note that this error did not occur until this size threshold and thread count was reached in the run.

Workload:1 GB/object, 8 clients, 16 threads (2/client),  4 objects/thread.

Configuration: 4 servers, 2-replica volume, swift configuration changes are:



Version-Release number of selected component (if applicable):

 -- gprfs005 -- 
gluster-swift-object-1.4.8-3.el6.noarch
gluster-swift-1.4.8-3.el6.noarch
gluster-swift-container-1.4.8-3.el6.noarch
gluster-swift-proxy-1.4.8-3.el6.noarch
gluster-swift-account-1.4.8-3.el6.noarch
gluster-swift-plugin-1.0-3.noarch
glusterfs-3.3.0qa38-1.el6.x86_64
glusterfs-fuse-3.3.0qa38-1.el6.x86_64
glusterfs-server-3.3.0qa38-1.el6.x86_64
glusterfs-geo-replication-3.3.0qa38-1.el6.x86_64
glusterfs-rdma-3.3.0qa38-1.el6.x86_64

How reproducible:

use attached python script that generates curl workload (also available at http://perf1.lab.bos.redhat.com/bengland/laptop/matte/scalability-cluster/shared/benchmarks/gluster_test/ufo/parallel_curl.py )

You should see something like this:
  
ERROR: 503
ERROR: 503
ERROR: 503
WARNING: 3 errors found
doing put 16 threads 4 objects/thread 1048576 KB/object
clients: ['gprfc009', 'gprfc010', 'gprfc011', 'gprfc012', 'gprfc013', 'gprfc014', 'gprfc015', 'gprfc016']
servers: ['gprfs005-10ge', 'gprfs006-10ge', 'gprfs007-10ge', 'gprfs008-10ge']
elapsed time =  152.65 sec
throughput =      0.42 objs/sec
transfer rate =    429.33 MB/s


Additional info:

[root@gprfs005 swift]# more proxy-server.conf 
[DEFAULT]
#bind_port = 443
#cert_file = /etc/swift/cert.crt
#key_file = /etc/swift/cert.key
bind_port = 8080
user = root
workers = 8
[pipeline:main]
pipeline = healthcheck cache tempauth proxy-server

[app:proxy-server]
use = egg:swift#proxy
allow_account_management = true
account_autocreate = true
log_facility = LOG_LOCAL1
log_level = WARN
log_headers = False
conn_timeout = 5.0

[filter:tempauth]
use = egg:swift#tempauth
user_admin_admin = admin .admin .reseller_admin
user_testfs_tester = testing .admin
user_test2_tester2 = testing2 .admin
user_test_tester3 = testing3

[filter:healthcheck]
use = egg:swift#healthcheck

[filter:cache]
use = egg:swift#memcache
# THIS MUST BE UPDATED WHENEVER THE CONFIGURATION CHANGES
memcache_servers=gprfs005-10ge,gprfs006-10ge,gprfs007-10ge,gprfs008-10ge


[DEFAULT]
mount_path = /mnt/glusterfs
auth_account = auth
#ip of the fs server.
mount_ip = localhost
#fs server need not be local, remote server can also be used,
#set remote_cluster=yes for using remote server.
remote_cluster = no

errors logged by servers during the run are at:

http://perf1.lab.bos.redhat.com/bengland/laptop/matte/scalability-cluster/shared/benchmarks/gluster_test/ufo/ufo-errs.log

Comment 3 Ben England 2012-06-22 19:00:27 UTC

per Sudhir's instructions I'm marking it verified.

Note You need to log in before you can comment on or make changes to this bug.