Created attachment 585861 [details] generates traffic from multiple clients to servers using curl Description of problem: with very large object sizes, I still can get HTTP 503 Internal Server Error on PUT requests. This seems to be related to a "chunk write timeout" in the logs. This may be related to fact that proxy server has been configured to have 8 worker threads but the other servers have not been similarly configured. On other runs not recorded here, I have seen timeouts from account-server requests as well. I think this problem is different from previous HTTP 503 error condition, which happened much more easily. Note that this error did not occur until this size threshold and thread count was reached in the run. Workload:1 GB/object, 8 clients, 16 threads (2/client), 4 objects/thread. Configuration: 4 servers, 2-replica volume, swift configuration changes are: Version-Release number of selected component (if applicable): -- gprfs005 -- gluster-swift-object-1.4.8-3.el6.noarch gluster-swift-1.4.8-3.el6.noarch gluster-swift-container-1.4.8-3.el6.noarch gluster-swift-proxy-1.4.8-3.el6.noarch gluster-swift-account-1.4.8-3.el6.noarch gluster-swift-plugin-1.0-3.noarch glusterfs-3.3.0qa38-1.el6.x86_64 glusterfs-fuse-3.3.0qa38-1.el6.x86_64 glusterfs-server-3.3.0qa38-1.el6.x86_64 glusterfs-geo-replication-3.3.0qa38-1.el6.x86_64 glusterfs-rdma-3.3.0qa38-1.el6.x86_64 How reproducible: use attached python script that generates curl workload (also available at http://perf1.lab.bos.redhat.com/bengland/laptop/matte/scalability-cluster/shared/benchmarks/gluster_test/ufo/parallel_curl.py ) You should see something like this: ERROR: 503 ERROR: 503 ERROR: 503 WARNING: 3 errors found doing put 16 threads 4 objects/thread 1048576 KB/object clients: ['gprfc009', 'gprfc010', 'gprfc011', 'gprfc012', 'gprfc013', 'gprfc014', 'gprfc015', 'gprfc016'] servers: ['gprfs005-10ge', 'gprfs006-10ge', 'gprfs007-10ge', 'gprfs008-10ge'] elapsed time = 152.65 sec throughput = 0.42 objs/sec transfer rate = 429.33 MB/s Additional info: [root@gprfs005 swift]# more proxy-server.conf [DEFAULT] #bind_port = 443 #cert_file = /etc/swift/cert.crt #key_file = /etc/swift/cert.key bind_port = 8080 user = root workers = 8 [pipeline:main] pipeline = healthcheck cache tempauth proxy-server [app:proxy-server] use = egg:swift#proxy allow_account_management = true account_autocreate = true log_facility = LOG_LOCAL1 log_level = WARN log_headers = False conn_timeout = 5.0 [filter:tempauth] use = egg:swift#tempauth user_admin_admin = admin .admin .reseller_admin user_testfs_tester = testing .admin user_test2_tester2 = testing2 .admin user_test_tester3 = testing3 [filter:healthcheck] use = egg:swift#healthcheck [filter:cache] use = egg:swift#memcache # THIS MUST BE UPDATED WHENEVER THE CONFIGURATION CHANGES memcache_servers=gprfs005-10ge,gprfs006-10ge,gprfs007-10ge,gprfs008-10ge [DEFAULT] mount_path = /mnt/glusterfs auth_account = auth #ip of the fs server. mount_ip = localhost #fs server need not be local, remote server can also be used, #set remote_cluster=yes for using remote server. remote_cluster = no errors logged by servers during the run are at: http://perf1.lab.bos.redhat.com/bengland/laptop/matte/scalability-cluster/shared/benchmarks/gluster_test/ufo/ufo-errs.log
per Sudhir's instructions I'm marking it verified.