The object servers are sending 8KB chunks to the client (in this case the proxy server) instead of using 16KB (the MTU size of the "lo" network adapter) or 64KB (the size of data read from disk). This means there is much more overhead since this increases the number of system calls required to fill the network queues. A perf top shows "copy_user_generic_string" as one of the top routines during large file transfers.
RHS 2.1 is EOL. With RHGS 3.1, I don't see this behavior anymore. From /etc/swift/object-server.conf: disk_chunk_size = 65536 network_chunk_size = 65536 # dd if=/dev/urandom of=./file1 bs=1 count=131072 # curl -v -X PUT http://localhost:8080/v1/AUTH_test/c1/file1 -T ./file1 # curl -v -X GET http://localhost:8080/v1/AUTH_test/c1/file1 -o ./dump # strace -ff -p 8056 -e open,read,sendto,accept [pid 8056] accept(7, {sa_family=AF_INET, sin_port=htons(57894), sin_addr=inet_addr("127.0.0.1")}, [16]) = 8 [pid 8056] accept(7, 0x7ffcbe247c10, [16]) = -1 EAGAIN (Resource temporarily unavailable) [pid 8056] open("/mnt/gluster-object/test/c1/file1", O_RDONLY|O_CLOEXEC) = 10 [pid 8056] read(10, "\21\332\245\204\357\257 w\275i\227\255\202I\267\5tCH\342\234\"\4\204\307\31\3137m\327e\335"..., 65536) = 65536 [pid 8056] sendto(8, "HTTP/1.1 200 OK\r\nX-Timestamp: 14"..., 65536, 0, NULL, 0) = 65536 [pid 8056] sendto(8, "\37\236p\361D\324\31\"\215\337\217\276\310{\30S\367l\f@\211A$xT\334\237\35\v\330\3558"..., 258, 0, NULL, 0) = 258 [pid 8056] read(10, "u\264\372\350\226f\373\304|\17\262\316\313r6\366\374\321\310:\337\252sV\252\317^,\fG\233\231"..., 65536) = 65536 [pid 8056] sendto(8, "u\264\372\350\226f\373\304|\17\262\316\313r6\366\374\321\310:\337\252sV\252\317^,\fG\233\231"..., 65536, 0, NULL, 0) = 65536 [pid 8056] read(10, "", 65536) = 0