Bug 806892
Summary: | object-strorage: PUT fails with large number of files even with worker thread enabled | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Saurabh <saujain> |
Component: | object-storage | Assignee: | Junaid <junaid> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Saurabh <saujain> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | pre-release | CC: | andriusb, divya, gluster-bugs, mzywusko, rfortier, vagarwal |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.4.0 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-07-24 17:59:57 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | DP | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 817967 |
Description
Saurabh
2012-03-26 12:53:53 UTC
*** Bug 782003 has been marked as a duplicate of this bug. *** I have executed several tests to find the correct config with worker threads, node_timeout and other variables, Still things fail after tweaking several variables, though parallel operations work properly on original as tried out with 100 objects in parallel for both https and http. Here with swift_plugin the HEAD fails usually as we try to find out the account availability, Apr 26 05:47:28 QA-91 account-server ERROR __call__ error with HEAD /sdb1/48757/AUTH_test2 : #012Traceback (most recent call last):#012 File "/usr/lib/python2.6/site-packages/swift/account/server.py", line 361, in __call__#012 res = getattr(self, req.method)(req)#012 File "/usr/lib/python2.6/site-packages/swift/account/server.py", line 163, in HEAD#012 broker = self._get_account_broker(drive, part, account)#012 File "/usr/lib/python2.6/site-packages/swift/account/server.py", line 62, in _get_account_broker#012 return DiskAccount(self.root, account, self.fs_object);#012 File "/usr/lib/python2.6/site-packages/swift/plugins/DiskDir.py", line 403, in __init__#012 check_valid_account(account, fs_object)#012 File "/usr/lib/python2.6/site-packages/swift/plugins/utils.py", line 356, in check_valid_account#012 return _check_valid_account(account, fs_object)#012 File "/usr/lib/python2.6/site-packages/swift/plugins/utils.py", line 326, in _check_valid_account#012 if not check_account_exists(fs_object.get_export_from_account_id(account), \#012 File "/usr/lib/python2.6/site-packages/swift/plugins/Glusterfs.py", line 99, in get_export_from_account_id#012 for export in self.get_export_list():#012 File "/usr/lib/python2.6/site-packages/swift/plugins/Glusterfs.py", line 92, in get_export_list#012 return self.get_export_list_local()#012 File "/usr/lib/python2.6/site-packages/swift/plugins/Glusterfs.py", line 52, in get_export_list_local#012 raise Exception('Getting volume failed %s', self.name)#012Exception: ('Getting volume failed %s', 'glusterfs') (txn: tx5d8fdd3b858c4e11817e70604a0fa9b2) Hopefully load balancing may help for parallel operations. parallel PUTs operations also fail because of a tmp file not getting created in correct location, Apr 26 23:32:09 QA-91 object-server ERROR Container update failed (saving for async update later): 500 response from 127.0.0.1:6011/sdb1 (txn: tx7695bbc2e6034d3fbdc89cb1135e8bfb) Apr 26 23:32:09 QA-91 object-server ERROR __call__ error with PUT /sdb1/79140/AUTH_test2/cont1/zero19 : #012Traceback (most recent call last):#012 File "/usr/lib/python2.6/site-packages/swift/obj/server.py", line 859, in __call__#012 res = getattr(self, req.method)(req)#012 File "/usr/lib/python2.6/site-packages/swift/obj/server.py", line 655, in PUT#012 device)#012 File "/usr/lib/python2.6/site-packages/swift/obj/server.py", line 471, in container_update#012 contdevice, headers_out, objdevice)#012 File "/usr/lib/python2.6/site-packages/swift/obj/server.py", line 449, in async_update#012 os.path.join(self.devices, objdevice, 'tmp'))#012 File "/usr/lib/python2.6/site-packages/swift/common/utils.py", line 860, in write_pickle#012 fd, tmppath = mkstemp(dir=tmp, suffix='.tmp')#012 File "/usr/lib64/python2.6/tempfile.py", line 293, in mkstemp#012 return _mkstemp_inner(dir, prefix, suffix, flags)#012 File "/usr/lib64/python2.6/tempfile.py", line 228, in _mkstemp_inner#012 fd = _os.open(file, flags, 0600)#012OSError: [Errno 2] No such file or directory: '/mnt/gluster-object/sdb1/tmp/tmp5aiFgJ.tmp' (txn: tx7695bbc2e6034d3fbdc89cb1135e8bfb) the mentioned in the exceptio does not exist, and sdb1 translation should be taken care to replace it with AUTH_test. The above error (comment 3) is fixed as part of https://bugzilla.redhat.com/show_bug.cgi?id=821310 bug. as per Junaid's comment, the error mentioned in comment 3 is not seen anymore with parallel PUT requests, though 503 service unavailable issues may still happen, for which a similar is opened i.e. 821310. Other issues, like HEAD related problems are getting avoided for sometime only by providing larger values to the varibles like recheck_container_existence and recheck_account_existece. THough the similar issue may happen after the time lapse, setting the larger values for the earlier mentioned variables provides some workaround. |