Bug 806892 - object-strorage: PUT fails with large number of files even with worker thread enabled
object-strorage: PUT fails with large number of files even with worker thread...
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: object-storage (Show other bugs)
pre-release
x86_64 Linux
high Severity high
: ---
: ---
Assigned To: Junaid
Saurabh
:
: 782003 (view as bug list)
Depends On:
Blocks: 817967
  Show dependency treegraph
 
Reported: 2012-03-26 08:53 EDT by Saurabh
Modified: 2016-01-19 01:10 EST (History)
6 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-07-24 13:59:57 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: DP
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Saurabh 2012-03-26 08:53:53 EDT
Description of problem:

I had tried to PUT 1000 objects inside a container.

The PUT operations were happening in a loop one after the other.

before this I had changed the proxy-server.conf with parameter "workers = 4"

and actually it created 962 files and after that it reported "503 Service Unavailable."

Mar 26 02:37:25 QA-50 object-server ERROR __call__ error with PUT /sdb1/19407/AUTH_test/container3/file1KB_991 : #012Traceback (most recent call last):#012  File "/usr/lib/python2.6/site-packages/swift-1.4.9-py2.6.egg/swift/obj/server.py", line 859, in __call__#012    res = getattr(self, req.method)(req)#012  File "/usr/lib/python2.6/site-packages/swift-1.4.9-py2.6.egg/swift/obj/server.py", line 575, in PUT#012    obj, self.logger, disk_chunk_size=self.disk_chunk_size)#012  File "/usr/lib/python2.6/site-packages/swift-1.4.9-py2.6.egg/swift/obj/server.py", line 399, in get_DiskFile_obj#012    disk_chunk_size, fs_object = self.fs_object);#012  File "/usr/lib/python2.6/site-packages/swift-1.4.9-py2.6.egg/swift/plugins/DiskFile.py", line 76, in __init__#012    check_valid_account(account, fs_object)#012  File "/usr/lib/python2.6/site-packages/swift-1.4.9-py2.6.egg/swift/plugins/utils.py", line 356, in check_valid_account#012    return _check_valid_account(account, fs_object)#012  File "/usr/lib/python2.6/site-packages/swift-1.4.9-py2.6.egg/swift/plugins/utils.py", line 326, in _check_valid_account#012    if not check_account_exists(fs_object.get_export_from_account_id(account), \#012  File "/usr/lib/python2.6/site-packages/swift-1.4.9-py2.6.egg/swift/plugins/Glusterfs.py", line 99, in get_export_from_account_id#012    for export in self.get_export_list():#012  File "/usr/lib/python2.6/site-packages/swift-1.4.9-py2.6.egg/swift/plugins/Glusterfs.py", line 92, in get_export_list#012    return self.get_export_list_local()#012  File "/usr/lib/python2.6/site-packages/swift-1.4.9-py2.6.egg/swift/plugins/Glusterfs.py", line 52, in get_export_list_local#012    raise Exception('Getting volume failed %s', self.name)#012Exception: ('Getting volume failed %s', 'glusterfs') (txn: tx3163382072f4453495bc2693aecc8aff)


Version-Release number of selected component (if applicable):
3.3.0qa30 and swift 1.4.7

How reproducible:
executed the script once, but had seen the same issue at more than one instance

Steps to Reproduce:
1. add workers=4 in the proxy-servers.conf file
2. start creating files
3.
  
Actual results:
after creating 962 files , response is 503 service unavailable


Expected results:

all the files should have got created.


Additional info:
Comment 1 Junaid 2012-04-05 05:10:14 EDT
*** Bug 782003 has been marked as a duplicate of this bug. ***
Comment 2 Saurabh 2012-04-26 09:10:17 EDT
I have executed several tests to find the correct config with worker threads, node_timeout and other variables, 

Still things fail after tweaking several variables, though parallel operations work properly on original as tried out with 100 objects in parallel for both https and http.

Here with swift_plugin the HEAD fails usually as we try to find out the account availability,


Apr 26 05:47:28 QA-91 account-server ERROR __call__ error with HEAD /sdb1/48757/AUTH_test2 : #012Traceback (most recent call last):#012  File "/usr/lib/python2.6/site-packages/swift/account/server.py", line 361, in __call__#012    res = getattr(self, req.method)(req)#012  File "/usr/lib/python2.6/site-packages/swift/account/server.py", line 163, in HEAD#012    broker = self._get_account_broker(drive, part, account)#012  File "/usr/lib/python2.6/site-packages/swift/account/server.py", line 62, in _get_account_broker#012    return DiskAccount(self.root, account, self.fs_object);#012  File "/usr/lib/python2.6/site-packages/swift/plugins/DiskDir.py", line 403, in __init__#012    check_valid_account(account, fs_object)#012  File "/usr/lib/python2.6/site-packages/swift/plugins/utils.py", line 356, in check_valid_account#012    return _check_valid_account(account, fs_object)#012  File "/usr/lib/python2.6/site-packages/swift/plugins/utils.py", line 326, in _check_valid_account#012    if not check_account_exists(fs_object.get_export_from_account_id(account), \#012  File "/usr/lib/python2.6/site-packages/swift/plugins/Glusterfs.py", line 99, in get_export_from_account_id#012    for export in self.get_export_list():#012  File "/usr/lib/python2.6/site-packages/swift/plugins/Glusterfs.py", line 92, in get_export_list#012    return self.get_export_list_local()#012  File "/usr/lib/python2.6/site-packages/swift/plugins/Glusterfs.py", line 52, in get_export_list_local#012    raise Exception('Getting volume failed %s', self.name)#012Exception: ('Getting volume failed %s', 'glusterfs') (txn: tx5d8fdd3b858c4e11817e70604a0fa9b2)


Hopefully load balancing may help for parallel operations.
Comment 3 Saurabh 2012-04-27 08:41:16 EDT
parallel PUTs operations also fail because of a tmp file not getting created in correct location,
Apr 26 23:32:09 QA-91 object-server ERROR Container update failed (saving for async update later): 500 response from 127.0.0.1:6011/sdb1 (txn: tx7695bbc2e6034d3fbdc89cb1135e8bfb)
Apr 26 23:32:09 QA-91 object-server ERROR __call__ error with PUT /sdb1/79140/AUTH_test2/cont1/zero19 : #012Traceback (most recent call last):#012  File "/usr/lib/python2.6/site-packages/swift/obj/server.py", line 859, in __call__#012    res = getattr(self, req.method)(req)#012  File "/usr/lib/python2.6/site-packages/swift/obj/server.py", line 655, in PUT#012    device)#012  File "/usr/lib/python2.6/site-packages/swift/obj/server.py", line 471, in container_update#012    contdevice, headers_out, objdevice)#012  File "/usr/lib/python2.6/site-packages/swift/obj/server.py", line 449, in async_update#012    os.path.join(self.devices, objdevice, 'tmp'))#012  File "/usr/lib/python2.6/site-packages/swift/common/utils.py", line 860, in write_pickle#012    fd, tmppath = mkstemp(dir=tmp, suffix='.tmp')#012  File "/usr/lib64/python2.6/tempfile.py", line 293, in mkstemp#012    return _mkstemp_inner(dir, prefix, suffix, flags)#012  File "/usr/lib64/python2.6/tempfile.py", line 228, in _mkstemp_inner#012    fd = _os.open(file, flags, 0600)#012OSError: [Errno 2] No such file or directory: '/mnt/gluster-object/sdb1/tmp/tmp5aiFgJ.tmp' (txn: tx7695bbc2e6034d3fbdc89cb1135e8bfb)

the mentioned in the exceptio does not exist, and sdb1 translation should be taken care to replace it with AUTH_test.
Comment 4 Junaid 2012-05-16 12:45:19 EDT
The above error (comment 3) is fixed as part of https://bugzilla.redhat.com/show_bug.cgi?id=821310 bug.
Comment 5 Saurabh 2012-06-07 00:51:02 EDT
as per Junaid's comment, the error mentioned in comment 3 is not seen anymore with parallel PUT requests, though 503 service unavailable issues may still happen, for which a similar is opened i.e. 821310. 


Other issues, like HEAD related problems are getting avoided for sometime only by providing larger values to the varibles like recheck_container_existence and recheck_account_existece. THough the similar issue may happen after the time lapse, setting the larger values for the earlier mentioned variables provides some workaround.

Note You need to log in before you can comment on or make changes to this bug.